What You Learned Before You Were Born

In Plato’s Meno, Socrates makes the somewhat odd claim that the ability of people to learn things without being directly told them proves that somehow they must have learned them or known them in advance. While we can reasonably assume this is wrong in a literal sense, there is some likeness of the truth here.

The whole of a human life is a continuous learning process generally speaking without any sudden jumps. We think of a baby’s learning as different from the learning of a child in school, and the learning of the child as rather different from the learning of an adult. But if you look at that process in itself, there may be sudden jumps in a person’s situation, such as when they graduate from school or when they get married, but there are no sudden jumps from not knowing anything about a topic or an object to suddenly knowing all about it. The learning itself happens gradually. It is the same with the manner in which it takes place; adults do indeed learn in a different manner from that in which children or infants learn. But if you ask how that manner got to be different, it certainly did so gradually, not suddenly.

But in addition to all this, there is a kind of “knowledge” that is not learned at all during one’s life, but is possessed from the beginning. From the beginning people have the ability to interact with the world in such a way that they will survive and go on to learn things. Thus from the beginning they must “know” how to do this. Now one might object that infants have no such knowledge, and that the only reason they survive is that their parents or others keep them alive. But the objection is mistaken: infants know to cry out when they hungry or in pain, and this is part of what keeps them alive. Similarly, an infant knows to drink the milk from its mother rather than refusing it, and this is part of what keeps it alive. Similarly in regard to learning, if an infant did not know the importance of paying close attention to speech sounds, it would never learn a language.

When was this “knowledge” learned? Not in the form of a separated soul, but through the historical process of natural selection.

Selection and Artificial Intelligence

This has significant bearing on our final points in the last post. Is the learning found in AI in its current forms more like the first kind of learning above, or like the kind found in the process of natural selection?

There may be a little of both, but the vast majority of learning in such systems is very much the second kind, and not the first kind. For example, AlphaGo is trained by self-play, where moves and methods of play that tend to lose are eliminated in much the way that in the process of natural selection, manners of life that do not promote survival are eliminated. Likewise a predictive model like GPT-3 is trained, through a vast number of examples, to avoid predictions that turn out to be less accurate and to make predictions that tend to be more accurate.

Now (whether or not this is done in individual cases) you might take a model of this kind and fine tune it based on incoming data, perhaps even in real time, which is a bit more like the first kind of learning. But in our actual situation, the majority of what is known by our AI systems is based on the second kind of learning.

This state of affairs should not be surprising, because the first kind of learning described above is impossible without being preceded by the second. The truth in Socrates’ claim is that if a system does not already “know” how to learn, of course it will not learn anything.

Intelligence and Universality

Elsewhere I have mentioned the argument, often made in great annoyance, that people who take some new accomplishment in AI or machine learning and proclaim that it is “not real intelligence” or that the algorithm is “still fundamentally stupid”, and other things of that kind, are “moving the goalposts,” especially since in many such cases, there really were people who said that something that could do such a thing would be intelligent.

As I said in the linked post, however, there is no problem of moving goalposts unless you originally had them in the wrong place. And attaching intelligence to any particular accomplishment, such as “playing chess well” or even “producing a sensible sounding text,” or anything else with that sort of particularity, is misplacing the goalposts. As we might remember, what excited Francis Bacon was the thought that there were no clear limits, at all, on what science (namely the working out of intelligence) might accomplish. In fact he seems to have believed that there were no limits at all, which is false. Nonetheless, he was correct that those limits are extremely vague, and that much that many assumed to be impossible would turn out to be possible. In other words, human intelligence does not have very meaningful limits on what it can accomplish, and artificial intelligence will be real intelligence (in the same sense that artificial diamonds can be real diamonds) when artificial intelligence has no meaningful limits on what it can accomplish.

I have no time for playing games with objections like, “but humans can’t multiply two 1000 digit numbers in one second, and no amount of thought will give them that ability.” If you have questions of this kind, please answer them for yourself, and if you can’t, sit still and think about it until you can. I have full confidence in your ability to find the answers, given sufficient thought.

What is needed for “real intelligence,” then, is universality. In a sense everyone knew all along that this was the right place for the goalposts. Even if someone said “if a machine can play chess, it will be intelligent,” they almost certainly meant that their expectation was that a machine that could play chess would have no clear limits on what it could accomplish. If you could have told them for a fact that the future would be different: that a machine would be able to play chess but that (that particular machine) would never be able to do anything else, they would have conceded that the machine would not be intelligent.

Training and Universality

Current AI systems are not universal, and clearly have no ability whatsoever to become universal, without first undergoing deep changes in those systems, changes that would have to be initiated by human beings. What is missing?

The problem is the training data. The process of evolution produced the general ability to learn by using the world itself as the training data. In contrast, our AI systems take a very small subset of the world (like a large set of Go games or a large set of internet text), and train a learning system on that subset. Why take a subset? Because the world is too large to fit into a computer, especially if that computer is a small part of the world.

This suggests that going from the current situation to “artificial but real” intelligence is not merely a question of making things better and better little by little. There is a more fundamental problem that would have to be overcome, and it won’t be overcome simply by larger training sets, by faster computing, and things of this kind. This does not mean that the problem is impossible, but it may turn out to be much more difficult than people expected. For example, if there is no direct solution, people might try to create Robin Hanson’s “ems”, where one would more or less copy the learning achieved by natural selection. Or even if that is not done directly, a better understanding of what it means to “know how to learn,” might lead to a solution, although probably one that would not depend on training a model on massive amounts of data.

What happens if there is no solution, or no solution is found? At times people will object to the possibility of such a situation along these times: “this situation is incoherent, since obviously people will be able to keep making better and better machine learning systems, so sooner or later they will be just as good as human intelligence.” But in fact the situation is not incoherent; if it happened, various types of AI system would approach various asymptotes, and this is entirely coherent. We can already see this in the case of GPT-3, where as I noted, there is an absolute bound on its future performance. In general such bounds in their realistic form are more restrictive than their in-principle form; I do not actually expect some successor to GPT-3 to write sensible full length books. Note however that even if this happened (as long as the content itself was not fundamentally better than what humans have done) I would not be “moving the goalposts”; I do not expect that to happen, but its happening would not imply any fundamental difference, since this is still within the “absolute” bounds that we have discussed. In contrast, if a successor to GPT-3 published a cure for cancer, this would prove that I had made some mistake on the level of principle.

Some Remarks on GPT-N

At the end of May, OpenAI published a paper on GPT-3, a language model which is a successor to their previous version, GPT-2. While quite impressive, the reaction from many people interested in artificial intelligence has been seriously exaggerated. Sam Altman, OpenAI’s CEO, has said as much himself:

The GPT-3 hype is way too much. It’s impressive (thanks for the nice compliments!) but it still has serious weaknesses and sometimes makes very silly mistakes. AI is going to change the world, but GPT-3 is just a very early glimpse. We have a lot still to figure out.

I used “GPT-N” in the title here because most of the comments I intend to make are almost completely general, and will apply to any future version that uses sufficiently similar methods.

What it does

GPT-3 is a predictive language model, that is, given an input text it tries to predict what would come next, much in the way that if you read the first few words of this sentence with the rest covered up, you might try to guess what would be likely to come next. To the degree that it does this well, it can be used to generate text from a “prompt,” that is, we give it something like a few words or a few sentences, and then add whatever it predicts should come next. For example, let’s take this very blog post and see what GPT-3 would like to say:

What it doesn’t do

While GPT-3 does seem to be able to generate some pretty interesting results, there are several limitations that need to be taken into account when using it.

First and foremost, and most importantly, it can’t do anything without a large amount of input data. If you want it to write like “a real human,” you need to give it a lot of real human writing. For most people, this means copying and pasting a lot. And while the program is able to read through that and get a feel for the way humans communicate, you can’t exactly use it to write essays or research papers. The best you could do is use it as a “fill in the blank” tool to write stories, and that’s not even very impressive.

While the program does learn from what it reads and is quite good at predicting words and phrases based on what has already been written, this method isn’t very effective at producing realistic prose. The best you could hope for is something like the “Deep Writing Machine” Twitter account, which spits out disconnected phrases in an ominous, but very bland voice.

In addition, the model is limited only to language. It does not understand context or human thought at all, so it has no way of tying anything together. You could use it to generate a massive amount of backstory and other material for a game, but that’s about it.

Finally, the limitations in writing are only reinforced by the limitations in reading. Even with a large library to draw on, the program is only as good as the parameters set for it. Even if you set it to the greatest writers mankind has ever known, without any special parameters, its writing would be just like anyone else’s.

The Model

GPT-3 consists of several layers. The first layer is a “memory network” that involves the program remembering previously entered data and using it when appropriate (i.e. it remembers commonly misspelled words and frequently used words). The next layer is the reasoning network, which involves common sense logic (i.e. if A, then B). The third is the repetition network, which involves pulling previously used material from memory and using it to create new combinations (i.e. using previously used words in new orders).

I added the bold formatting, the rest is as produced by the model. This was also done in one run, without repetitions. This is an important qualification, since many examples on the internet have been produced by deleting something produced by the model and forcing it to generate something new until something sensible resulted. Note that the model does not seem to have understood my line, “let’s take this very blog post and see what GPT-3 would like to say.” That is, rather than trying to “say” anything, it attempted to continue the blog post in the way I might have continued it without the block quote.

Truth vs Probability of Text

If we interpret the above text from GPT-3 “charitably”, much of it is true or close to true. But I use scare quotes here because when we speak of interpreting human speech charitably, we are assuming that someone was trying to speak the truth, and so we think, “What would they have meant if they were trying to say something true?” The situation is different here, because GPT-3 has no intention of producing truth, nor of avoiding it. Insofar as there is any intention, the intention is to produce the text which would be likely to come after the input text; in this case, as the input text was the beginning of this blog post, the intention was to produce the text that would likely follow in such a post. Note that there is an indirect relationship with truth, which explains why there is any truth at all in GPT-3’s remarks. If the input text is true, it is at least somewhat likely that what would follow would also be true, so if the model is good at guessing what would be likely to follow, it will be likely to produce something true in such cases. But it is just as easy to convince it to produce something false, simply by providing an input text that would be likely to be followed by something false.

This results in an absolute upper limit on the quality of the output of a model of this kind, including any successor version, as long as the model works by predicting the probability of the following text. Namely, its best output cannot be substantially better than the best content in its training data, which is in this version is a large quantity of texts from the internet. The reason for this limitation is clear; to the degree that the model has any intention at all, the intention is to reflect the training data, not to surpass it. As an example, consider the difference between Deep Mind’s AlphaGo and AlphaGo Zero. AlphaGo Zero is a better Go player than the original AlphaGo, and this is largely because the original is trained on human play, while AlphaGo Zero is trained from scratch on self play. In other words, the original version is to some extent predicting “what would a Go player play in this situation,” which is not the same as predicting “what move would win in this situation.”

Now I will predict (and perhaps even GPT-3 could predict) that many people will want to jump in and say, “Great. That shows you are wrong. Even the original AlphaGo plays Go much better than a human. So there is no reason that an advanced version of GPT-3 could not be better than humans at saying things that are true.”

The difference, of course, is that AlphaGo was trained in two ways, first on predicting what move would be likely in a human game, and second on what would be likely to win, based on its experience during self play. If you had trained the model only on predicting what would follow in human games, without the second aspect, the model would not have resulted in play that substantially improved upon human performance. But in the case of GPT-3 or any model trained in the same way, there is no selection whatsoever for truth as such; it is trained only to predict what would follow in a human text. So no successor to GPT-3, in the sense of a model of this particular kind, however large, will ever be able to produce output better than human, or in its own words, “its writing would be just like anyone else’s.”

Self Knowledge and Goals

OpenAI originally claimed that GPT-2 was too dangerous to release; ironically, they now intend to sell access to GPT-3. Nonetheless, many people, in large part those influenced by the opinions of Nick Bostrom and Eliezer Yudkowsky, continue to worry that an advanced version might turn out to be a personal agent with nefarious goals, or at least goals that would conflict with the human good. Thus Alexander Kruel:

GPT-2: *writes poems*
Skeptics: Meh
GPT-3: *writes code for a simple but functioning app*
Skeptics: Gimmick.
GPT-4: *proves simple but novel math theorems*
Skeptics: Interesting but not useful.
GPT-5: *creates GPT-6*
Skeptics: Wait! What?
GPT-6: *FOOM*
Skeptics: *dead*

In a sense the argument is moot, since I have explained above why no future version of GPT will ever be able to produce anything better than people can produce themselves. But even if we ignore that fact, GPT-3 is not a personal agent of any kind, and seeks goals in no meaningful sense, and the same will apply to any future version that works in substantially the same way.

The basic reason for this is that GPT-3 is disembodied, in the sense of this earlier post on Nick Bostrom’s orthogonality thesis. The only thing it “knows” is texts, and the only “experience” it can have is receiving an input text. So it does not know that it exists, it cannot learn that it can affect the world, and consequently it cannot engage in goal seeking behavior.

You might object that it can in fact affect the world, since it is in fact in the world. Its predictions cause an output, and that output is in the world. And that output and be reintroduced as input (which is how “conversations” with GPT-3 are produced). Thus it seems it can experience the results of its own activities, and thus should be able to acquire self knowledge and goals. This objection is not ultimately correct, but it is not so far from the truth. You would not need extremely large modifications in order to make something that in principle could acquire self knowledge and seek goals. The main reason that this cannot happen is the “P in “GPT,” that is, the fact that the model is “pre-trained.” The only learning that can happen is the learning that happens while it is reading an input text, and the purpose of that learning is to guess what is happening in the one specific text, for the purpose of guessing what is coming next in this text. All of this learning vanishes upon finishing the prediction task and receiving another input. A secondary reason is that since the only experience it can have is receiving an input text, even if it were given a longer memory, it would probably not be possible for it to notice that its outputs were caused by its predictions, because it likely has no internal mechanism to reflect on the predictions themselves.

Nonetheless, if you “fixed” these two problems, by allowing it to continue to learn, and by allowing its internal representations to be part of its own input, there is nothing in principle that would prevent it from achieving self knowledge, and from seeking goals. Would this be dangerous? Not very likely. As indicated elsewhere, motivation produced in this way and without the biological history that produced human motivation is not likely to be very intense. In this context, if we are speaking of taking a text-predicting model and adding on an ability to learn and reflect on its predictions, it is likely to enjoy doing those things and not much else. For many this argument will seem “hand-wavy,” and very weak. I could go into this at more depth, but I will not do so at this time, and will simply invite the reader to spend more time thinking about it. Dangerous or not, would it be easy to make these modifications? Nothing in this description sounds difficult, but no, it would not be easy. Actually making an artificial intelligence is hard. But this is a story for another time.

Miracles and Anomalies: Or, Your Religion is False

In 2011 there was an apparent observation of neutrinos traveling faster than light. Wikipedia says of this, “Even before the mistake was discovered, the result was considered anomalous because speeds higher than that of light in a vacuum are generally thought to violate special relativity, a cornerstone of the modern understanding of physics for over a century.” In other words, most scientists did not take the result very seriously, even before any specific explanation was found. As I stated here, it is possible to push unreasonably far in this direction, in such a way that one will be reluctant to ever modify one’s current theories. But there is also something reasonable about this attitude.

Alexander Pruss explains why scientists tend to be skeptical of such anomalous results in this post on Bayesianism and anomaly:

One part of the problem of anomaly is this. If a well-established scientific theory seems to predict something contrary to what we observe, we tend to stick to the theory, with barely a change in credence, while being dubious of the auxiliary hypotheses. What, if anything, justifies this procedure?

Here’s my setup. We have a well-established scientific theory T and (conjoined) auxiliary hypotheses A, and T together with A uncontroversially entails the denial of some piece of observational evidence E which we uncontroversially have (“the anomaly”). The auxiliary hypotheses will typically include claims about the experimental setup, the calibration of equipment, the lack of further causal influences, mathematical claims about the derivation of not-E from T and the above, and maybe some final catch-all thesis like the material conditional that if T and all the other auxiliary hypotheses obtain, then E does not obtain.

For simplicity I will suppose that A and T are independent, though of course that simplifying assumption is rarely true.

Here’s a quick and intuitive thought. There is a region of probability space where the conjunction of T and A is false. That area is divided into three sub-regions:

  1. T is true and A is false
  2. T is false and A is true
  3. both are false.

The initial probabilities of the three regions are, respectively, 0.0999, 0.0009999 and 0.0001. We know we are in one of these three regions, and that’s all we now know. Most likely we are in the first one, and the probability that we are in that one given that we are in one of the three is around 0.99. So our credence in T has gone down from three nines (0.999) to two nines (0.99), but it’s still high, so we get to hold on to T.

Still, this answer isn’t optimistic. A move from 0.999 to 0.99 is actually an enormous decrease in confidence.

“This answer isn’t optimistic,” because in the case of the neutrinos, this analysis would imply that scientists should have instantly become ten times more willing to consider the possibility that the theory of special relativity is false. This is surely not what happened.

Pruss therefore presents an alternative calculation:

But there is a much more optimistic thought. Note that the above wasn’t a real Bayesian calculation, just a rough informal intuition. The tip-off is that I said nothing about the conditional probabilities of E on the relevant hypotheses, i.e., the “likelihoods”.

Now setup ensures:

  1. P(E|A ∧ T)=0.

What can we say about the other relevant likelihoods? Well, if some auxiliary hypothesis is false, then E is up for grabs. So, conservatively:

  1. P(E|∼A ∧ T)=0.5
  2. P(E|∼A ∧ ∼T)=0.5

But here is something that I think is really, really interesting. I think that in typical cases where T is a well-established scientific theory and A ∧ T entails the negation of E, the probability P(E|A ∧ ∼T) is still low.

The reason is that all the evidence that we have gathered for T even better confirms the hypothesis that T holds to a high degree of approximation in most cases. Thus, even if T is false, the typical predictions of T, assuming they have conservative error bounds, are likely to still be true. Newtonian physics is false, but even conditionally on its being false we take individual predictions of Newtonian physics to have a high probability. Thus, conservatively:

  1. P(E|A ∧ ∼T)=0.1

Very well, let’s put all our assumptions together, including the ones about A and T being independent and the values of P(A) and P(T). Here’s what we get:

  1. P(E|T)=P(E|A ∧ T)P(A|T)+P(E|∼A ∧ T)P(∼A|T)=0.05
  2. P(E|∼T)=P(E|A ∧ ∼T)P(A|∼T)+P(E|∼A ∧ ∼T)P(∼A|∼T) = 0.14.

Plugging this into Bayes’ theorem, we get P(T|E)=0.997. So our credence has crept down, but only a little: from 0.999 to 0.997. This is much more optimistic (and conservative) than the big move from 0.999 to 0.99 that the intuitive calculation predicted.

So, if I am right, at least one of the reasons why anomalies don’t do much damage to scientific theories is that when the scientific theory T is well-confirmed, the anomaly is not only surprising on the theory, but it is surprising on the denial of the theory—because the background includes the data that makes T “well-confirmed” and would make E surprising even if we knew that T was false.

To make the point without the mathematics (which in any case is only used to illustrate the point, since Pruss is choosing the specific values himself), if you have a theory which would make the anomaly probable, that theory would be strongly supported by the anomaly. But we already know that theories like that are false, because otherwise the anomaly would not be an anomaly. It would be normal and common. Thus all of the actually plausible theories still make the anomaly an improbable observation, and therefore these theories are only weakly supported by the observation of the anomaly. The result is that the new observation makes at most a minor difference to your previous opinion.

We can apply this analysis to the discussion of miracles. David Hume, in his discussion of miracles, seems to desire a conclusive proof against them which is unobtainable, and in this respect he is mistaken. But near the end of his discussion, he brings up the specific topic of religion and says that his argument applies to it in a special way:

Upon the whole, then, it appears, that no testimony for any kind of miracle has ever amounted to a probability, much less to a proof; and that, even supposing it amounted to a proof, it would be opposed by another proof; derived from the very nature of the fact, which it would endeavour to establish. It is experience only, which gives authority to human testimony; and it is the same experience, which assures us of the laws of nature. When, therefore, these two kinds of experience are contrary, we have nothing to do but subtract the one from the other, and embrace an opinion, either on one side or the other, with that assurance which arises from the remainder. But according to the principle here explained, this subtraction, with regard to all popular religions, amounts to an entire annihilation; and therefore we may establish it as a maxim, that no human testimony can have such force as to prove a miracle, and make it a just foundation for any such system of religion.

The idea seems to be something like this: contrary systems of religion put forth miracles in their support, so the supporting evidence for one religion is more or less balanced by the supporting evidence for the other. Likewise, the evidence is weakened even in itself by people’s propensity to lies and delusion in such matters (some of this discussion was quoted in the earlier post on Hume and miracles). But in addition to the fairly balanced evidence we have experience basically supporting the general idea that the miracles do not happen. This is not outweighed by anything in particular, and so it is the only thing that remains after the other evidence balances itself out of the equation. Hume goes on:

I beg the limitations here made may be remarked, when I say, that a miracle can never be proved, so as to be the foundation of a system of religion. For I own, that otherwise, there may possibly be miracles, or violations of the usual course of nature, of such a kind as to admit of proof from human testimony; though, perhaps, it will be impossible to find any such in all the records of history. Thus, suppose, all authors, in all languages, agree, that, from the first of January, 1600, there was a total darkness over the whole earth for eight days: suppose that the tradition of this extraordinary event is still strong and lively among the people: that all travellers, who return from foreign countries, bring us accounts of the same tradition, without the least variation or contradiction: it is evident, that our present philosophers, instead of doubting the fact, ought to receive it as certain, and ought to search for the causes whence it might be derived. The decay, corruption, and dissolution of nature, is an event rendered probable by so many analogies, that any phenomenon, which seems to have a tendency towards that catastrophe, comes within the reach of human testimony, if that testimony be very extensive and uniform.

But suppose, that all the historians who treat of England, should agree, that, on the first of January, 1600, Queen Elizabeth died; that both before and after her death she was seen by her physicians and the whole court, as is usual with persons of her rank; that her successor was acknowledged and proclaimed by the parliament; and that, after being interred a month, she again appeared, resumed the throne, and governed England for three years: I must confess that I should be surprised at the concurrence of so many odd circumstances, but should not have the least inclination to believe so miraculous an event. I should not doubt of her pretended death, and of those other public circumstances that followed it: I should only assert it to have been pretended, and that it neither was, nor possibly could be real. You would in vain object to me the difficulty, and almost impossibility of deceiving the world in an affair of such consequence; the wisdom and solid judgment of that renowned queen; with the little or no advantage which she could reap from so poor an artifice: all this might astonish me; but I would still reply, that the knavery and folly of men are such common phenomena, that I should rather believe the most extraordinary events to arise from their concurrence, than admit of so signal a violation of the laws of nature.

But should this miracle be ascribed to any new system of religion; men, in all ages, have been so much imposed on by ridiculous stories of that kind, that this very circumstance would be a full proof of a cheat, and sufficient, with all men of sense, not only to make them reject the fact, but even reject it without farther examination. Though the Being to whom the miracle is ascribed, be, in this case, Almighty, it does not, upon that account, become a whit more probable; since it is impossible for us to know the attributes or actions of such a Being, otherwise than from the experience which we have of his productions, in the usual course of nature. This still reduces us to past observation, and obliges us to compare the instances of the violation of truth in the testimony of men, with those of the violation of the laws of nature by miracles, in order to judge which of them is most likely and probable. As the violations of truth are more common in the testimony concerning religious miracles, than in that concerning any other matter of fact; this must diminish very much the authority of the former testimony, and make us form a general resolution, never to lend any attention to it, with whatever specious pretence it may be covered.

Notice how “unfair” this seems to religion, so to speak. What is the difference between the eight days of darkness, which Hume would accept, under those conditions, and the resurrection of the queen of England, which he would not? Hume’s reaction to the two situations is more consistent than first appears. Hume would accept the historical accounts about England in the same way that he would accept the accounts about the eight days of darkness. The difference is in how he would explain the accounts. He says of the darkness, “It is evident, that our present philosophers, instead of doubting the fact, ought to receive it as certain, and ought to search for the causes whence it might be derived.” Likewise, he would accept the historical accounts as certain insofar as they say the a burial ceremony took place, the queen was absent from public life, and so on. But he would not accept that the queen was dead and came back to life. Why? The “search for the causes” seems to explain this. It is plausible to Hume that causes of eight days of darkness might be found, but not plausible to him that causes of a resurrection might be found. He hints at this in the words, “The decay, corruption, and dissolution of nature, is an event rendered probable by so many analogies,” while in contrast a resurrection would be “so signal a violation of the laws of nature.”

It is clear that Hume excludes certain miracles, such as resurrection, from the possibility of being established by the evidence of testimony. But he makes the additional point that even if he did not exclude them, he would not find it reasonable to establish a “system of religion” on such testimony, given that “violations of truth are more common in the testimony concerning religious miracles, than in that concerning any other matter of fact.”

It is hard to argue with the claim that “violations of truth” are especially common in testimony about miracles. But does any of this justify Hume’s negative attitude to miracles as establishing “systems of religion,” or is this all just prejudice?  There might well be a good deal of prejudice involved here in his opinions. Nonetheless, Alexander Pruss’s discussion of anomaly allows one to formalize Hume’s idea here as actual insight as well.

One way to look at truth in religion is to look at it as a way of life or as membership in a community. And in this way, asking whether miracles can establish a system of religion is just asking whether a person can be moved to a way of life or to join a community through such things. And clearly this is possible, and often happens. But another way to consider truth in religion is to look at a doctrinal system as a set of claims about how the world is. Looked at in this way, we should look at a doctrinal system as presenting a proposed larger context of our place in the world, one that we would be unaware of without the religion. This implies that one should have a prior probability (namely prior to consideration of arguments in its favor) strongly against the system considered as such, for reasons very much like the reasons we should have a prior probability strongly against Ron Conte’s predictions.

We can thus apply Alexander Pruss’s framework. Let us take Mormonism as the “system of religion” in question. Then taken as a set of claims about the world, our initial probability would be that it is very unlikely that the world is set up this way. Then let us take a purported miracle establishing this system: Joseph Smith finds his golden plates. In principle, if this cashed out in a certain way, it could actually establish his system. But it doesn’t cash out that way. We know very little about the plates, the circumstances of their discovery (if there was any), and their actual content. Instead, what we are left with is an anomaly: something unusual happened, and it might be able to be described as “finding golden plates,” but that’s pretty much all we know.

Then we have the theory, T, which has a high prior probability: Mormonism is almost certainly false. We have the observation : Joseph Smith discovered his golden plates (in one sense or another.) And we have the auxiliary hypotheses which imply that he could not have discovered the plates if Mormonism is false. The Bayesian updates in Pruss’s scheme imply that our conclusion is this: Mormonism is almost certainly false, and there is almost certainly an error in the auxiliary hypotheses that imply he could not have discovered them if it were false.

Thus Hume’s attitude is roughly justified: he should not change his opinion about religious systems in any significant way based on testimony about miracles.

To make you feel better, this does not prove that your religion is false. It just nearly proves that. In particular, this does not take into an account an update based on the fact that “many people accept this set of claims.” This is a different fact, and it is not an anomaly. If you update on this fact and end up with a non-trivial probability that your set of claims is true, testimony about miracles might well strengthen this into conviction.

I will respond to one particular objection, however. Some will take this argument to be stubborn and wicked, because it seems to imply that people shouldn’t be “convinced even if someone rises from the dead.” And this does in fact follow, more or less. An anomalous occurrence in most cases will have a perfectly ordinary explanation in terms of things that are already a part of our ordinary understanding of the world, without having to add some larger context. For example, suppose you heard your fan (as a piece of furniture, not as a person) talking to you. You might suppose that you were hallucinating. But suppose it turns out that you are definitely not hallucinating. Should you conclude that there is some special source from outside the normal world that is communicating with you? No: the fan scenario can happen, and it turns out to have a perfectly everyday explanation. We might agree with Hume that it would be much more implausible that a resurrection would have an everyday explanation. Nonetheless, even if we end up concluding to the existence of some larger context, and that the miracle has no such everyday explanation, there is no good reason for it to be such and such a specific system of doctrine. Consider again Ron Conte’s predictions for the future. Most likely the things that happen between now and 2040, and even the things that happen in the 2400s, are likely to be perfectly ordinary (although the things in the 2400s might differ from current events in fairly radical ways). But even if they are not, and even if apocalyptic, miraculous occurrences are common in those days, this does not raise the probability of Conte’s specific predictions above any trivial level. In the same way, the anomalous occurrences involved in the accounts of miracles will not lend any significant probability to a religious system.

The objection here is that this seems unfair to God, so to speak. What if God wanted to reveal something to the world? What could he do, besides work miracles? I won’t propose a specific answer to this, because I am not God. But I will illustrate the situation with a little story to show that there is nothing unfair to God about it.

Suppose human beings created an artificial intelligence and raised it in a simulated environment. Wanting things to work themselves out “naturally,” so to speak, because it would be less work, and because it would probably be necessary to the learning process, they institute “natural laws” in the simulated world which are followed in an exceptionless way. Once the AI is “grown up”, so to speak, they decide to start communicating with it. In the AI’s world, this will surely show up as some kind of miracle: something will happen that was utterly unpredictable to it, and which is completely inconsistent with the natural laws as it knew them.

Will the AI be forced by the reasoning of this post to ignore the communication? Well, that depends on what exactly occurs and how. At the end of his post, Pruss discusses situations where anomalous occurrences should change your mind:

Note that this argument works less well if the anomalous case is significantly different from the cases that went into the confirmation of T. In such a case, there might be much less reason to think E won’t occur if T is false. And that means that anomalies are more powerful as evidence against a theory the more distant they are from the situations we explored before when we were confirming T. This, I think, matches our intuitions: We would put almost no weight in someone finding an anomaly in the course of an undergraduate physics lab—not just because an undergraduate student is likely doing it (it could be the professor testing the equipment, though), but because this is ground well-gone over, where we expect the theory’s predictions to hold even if the theory is false. But if new observations of the center of our galaxy don’t fit our theory, that is much more compelling—in a regime so different from many of our previous observations, we might well expect that things would be different if our theory were false.

And this helps with the second half of the problem of anomaly: How do we keep from holding on to T too long in the light of contrary evidence, how do we allow anomalies to have a rightful place in undermining theories? The answer is: To undermine a theory effectively, we need anomalies that occur in situations significantly different from those that have already been explored.

If the AI finds itself in an entirely new situation, e.g. rather than hearing an obscure voice from a fan, it is consistently able to talk to the newly discovered occupant of the world on a regular basis, it will have no trouble realizing that its situation has changed, and no difficulty concluding that it is receiving communication from its author. This does, sort of, give one particular method that could be used to communicate a revelation. But there might well be many others.

Our objector will continue. This is still not fair. Now you are saying that God could give a revelation but that if he did, the world would be very different from the actual world. But what if he wanted to give a revelation in the actual world, without it being any different from the way it is? How could he convince you in that case?

Let me respond with an analogy. What if the sky were actually red like the sky of Mars, but looked blue like it is? What would convince you that it was red? The fact that there is no way to convince you that it is red in our actual situation means you are unfairly prejudiced against the redness of the sky.

In other words, indeed, I am unwilling to be convinced that the sky is red except in situations where it is actually red, and those situations are quite different from our actual situation. And indeed, I am unwilling to be convinced of a revelation except in situations where there is actually a revelation, and those are quite different from our actual situation.

Hard Problem of Consciousness

We have touched on this in various places, and in particular in this discussion of zombies, but we are now in a position to give a more precise answer.

Bill Vallicella has a discussion of Thomas Nagel on this issue:

Nagel replies in the pages of NYRB (8 June 2017; HT: Dave Lull) to one Roy Black, a professor of bioengineering:

The mind-body problem that exercises both Daniel Dennett and me is a problem about what experience is, not how it is caused. The difficulty is that conscious experience has an essentially subjective character—what it is like for its subject, from the inside—that purely physical processes do not share. Physical concepts describe the world as it is in itself, and not for any conscious subject. That includes dark energy, the strong force, and the development of an organism from the egg, to cite Black’s examples. But if subjective experience is not an illusion, the real world includes more than can be described in this way.

I agree with Black that “we need to determine what ‘thing,’ what activity of neurons beyond activating other neurons, was amplified to the point that consciousness arose.” But I believe this will require that we attribute to neurons, and perhaps to still more basic physical things and processes, some properties that in the right combination are capable of constituting subjects of experience like ourselves, to whom sunsets and chocolate and violins look and taste and sound as they do. These, if they are ever discovered, will not be physical properties, because physical properties, however sophisticated and complex, characterize only the order of the world extended in space and time, not how things appear from any particular point of view.

The problem might be condensed into an aporetic triad:

1) Conscious experience is not an illusion.

2) Conscious experience has an essentially subjective character that purely physical processes do not share.

3) The only acceptable explanation of conscious experience is in terms of physical properties alone.

Take a little time to savor this problem. Note first that the three propositions are collectively inconsistent: they cannot all be true.  Any two limbs entail the negation of the remaining one. Note second that each limb exerts a strong pull on our acceptance.  But we cannot accept them all because they are logically incompatible.

Which proposition should we reject? Dennett, I take it, would reject (1). But that’s a lunatic solution as Professor Black seems to appreciate, though he puts the point more politely. When I call Dennett a sophist, as I have on several occasions, I am not abusing him; I am underscoring what is obvious, namely, that the smell of cooked onions, for example, is a genuine datum of experience, and that such phenomenological data trump scientistic theories.

Sophistry aside, we either reject (2) or we reject (3).  Nagel and I accept (1) and (2) and reject (3). Black, and others of the scientistic stripe, accept (1) and (3) and reject (2).

In order to see the answer to this, we can construct a Parmenidean parallel to Vallicella’s aporetic triad:

1) Distinction is not an illusion.

2) Being has an essentially objective character of actually being that distinction does not share (considering that distinction consists in the fact of not being something.)

3) The only acceptable explanation of distinction is in terms of being alone (since there is nothing but being to explain things with.)

Parmenides rejects (1) here. What approach would Vallicella take? If he wishes to take a similarly analogous approach, he should accept (1) and (2), and deny (3). And this would be a pretty commonsense approach, and perhaps the one that most people implicitly adopt if they ever think about the problem.

At the same time, it is easy to see that (3) is approximately just as obviously true as (1); and it is for this reason that Parmenides sees rejecting (1) and accepting (2) and (3) as reasonable.

The correct answer, of course, is that the three are not inconsistent despite appearances. In fact, we have effectively answered this in recent posts. Distinction is not an illusion, but a way that we understand things, as such. And being a way of understanding, it is not (as such) a way of being mistaken, and thus it is not an illusion, and thus the first point is correct. Again, being a way of understanding, it is not a way of being as such, and thus the second point is correct. And yet distinction can be explained by being, since there is something (namely relationship) which explains why it is reasonable to think in terms of distinctions.

Vallicella’s triad mentions “purely physical processes” and “physical properties,” but the idea of “physical” here is a distraction, and is not really relevant to the problem. Consider the following from another post by Vallicella:

If I understand Galen Strawson’s view, it is the first.  Conscious experience is fully real but wholly material in nature despite the fact that on current physics we cannot account for its reality: we cannot understand how it is possible for qualia and thoughts to be wholly material.   Here is a characteristic passage from Strawson:

Serious materialists have to be outright realists about the experiential. So they are obliged to hold that experiential phenomena just are physical phenomena, although current physics cannot account for them.  As an acting materialist, I accept this, and assume that experiential phenomena are “based in” or “realized in” the brain (to stick to the human case).  But this assumption does not solve any problems for materialists.  Instead it obliges them to admit ignorance of the nature of the physical, to admit that they don’t have a fully adequate idea of what the physical is, and hence of what the brain is.  (“The Experiential and the Non-Experiential” in Warner and Szubka, p. 77)

Strawson and I agree on two important points.  One is that what he calls experiential phenomena are as real as anything and cannot be eliminated or reduced to anything non-experiential. Dennett denied! The other is that there is no accounting for experiential items in terms of current physics.

I disagree on whether his mysterian solution is a genuine solution to the problem. What he is saying is that, given the obvious reality of conscious states, and given the truth of naturalism, experiential phenomena must be material in nature, and that this is so whether or not we are able to understand how it could be so.  At present we cannot understand how it could be so. It is at present a mystery. But the mystery will dissipate when we have a better understanding of matter.

This strikes me as bluster.

An experiential item such as a twinge of pain or a rush of elation is essentially subjective; it is something whose appearing just is its reality.  For qualia, esse = percipi.  If I am told that someday items like this will be exhaustively understood from a third-person point of view as objects of physics, I have no idea what this means.  The notion strikes me as absurd.  We are being told in effect that what is essentially subjective will one day be exhaustively understood as both essentially subjective and wholly objective.  And that makes no sense. If you tell me that understanding in physics need not be objectifying understanding, I don’t know what that means either.

Here Vallicella uses the word “material,” which is presumably equivalent to “physical” in the above discussion. But it is easy to see here that being material is not the problem: being objective is the problem. Material things are objective, and Vallicella sees an irreducible opposition between being objective and being subjective. In a similar way, we can reformulate Vallicella’s original triad so that it does not refer to being physical:

1) Conscious experience is not an illusion.

2) Conscious experience has an essentially subjective character that purely objective processes do not share.

3) The only acceptable explanation of conscious experience is in terms of objective properties alone.

It is easy to see that this formulation is the real source of the problem. And while Vallicella would probably deny (3) even in this formulation, it is easy to see why people would want to accept (3). “Real things are objective,” they will say. If you want to explain anything, you should explain it using real things, and therefore objective things.

The parallel with the Parmenidean problem is evident. We would want to explain distinction in terms of being, since there isn’t anything else, and yet this seems impossible, so one (e.g. Parmenides) is tempted to deny the existence of distinction. In the same way, we would want to explain subjective experience in terms of objective facts, since there isn’t anything else, and yet this seems impossible, so one (e.g. Dennett) is tempted to deny the existence of subjective experience.

Just as the problem is parallel, the correct solution will be almost entirely parallel to the solution to the problem of Parmenides.

1) Conscious experience is not an illusion. It is a way of perceiving the world, not a way of not perceiving the world, and definitely not a way of not perceiving at all.

2) Consciousness is subjective, that is, it is a way that an individual perceives the world, not a way that things are as such, and thus not an “objective fact” in the sense that “the way things are” is objective.

3) The “way things are”, namely the objective facts, are sufficient to explain why individuals perceive the world. Consider again this post, responding to a post by Robin Hanson. We could reformulate his criticism to express instead Parmenides’s criticism of common sense (changed parts in italics):

People often state things like this:

I am sure that there is not just being, because I’m aware that some things are not other things. I know that being just isn’t non-being. So even though there is being, there must be something more than that to reality. So there’s a deep mystery: what is this extra stuff, where does it arise, how does it change, and so on. We humans care about distinctions, not just being; we want to know what out there is distinct from which other things.

But consider a key question: Does this other distinction stuff interact with the parts of our world that actually exist strongly and reliably enough to usually be the actual cause of humans making statements of distinction like this?

If yes, this is a remarkably strong interaction, making it quite surprising that philosophers, possibly excepting Duns Scotus, have missed it so far. So surprising in fact as to be frankly unbelievable. If this type of interaction were remotely as simple as all the interactions we know, then it should be quite understandable with existing philosophy. Any interaction not so understandable would have be vastly more difficult to understand than any we’ve ever seen or considered. Thus I’d bet heavily and confidently that no one will understand such an interaction.

But if no, if this interaction isn’t strong enough to explain human claims of distinction, then we have a remarkable coincidence to explain. Somehow this extra distinction stuff exists, and humans also have a tendency to say that it exists, but these happen for entirely independent reasons. The fact that distinction stuff exists isn’t causing people to claim it exists, nor vice versa. Instead humans have some sort of weird psychological quirk that causes them to make such statements, and they would make such claims even if distinction stuff didn’t exist. But if we have a good alternate explanation for why people tend to make such statements, what need do we have of the hypothesis that distinction stuff actually exists? Such a coincidence seems too remarkable to be believed.

“Distinction stuff”, of course, does not exist, and neither does “feeling stuff.” But some things are distinct from others. Saying this is a way of understanding the world, and it is a reasonable way to understand the world because things exist relative to one another. And just as one thing is distinct from another, people have experiences. Those experiences are ways of knowing the world (broadly understood.) And just as reality is sufficient to explain distinction, so reality is sufficient to explain the fact that people have experiences.

How exactly does this answer the objection about interaction? In the case of distinction, the fact that “one thing is not another” is never the direct cause of anything, not even of the fact that “someone believes that one thing is not another.” So there would seem to be a “remarkable coincidence” here, or we would have to say that since the fact seems unrelated to the opinion, there is no reason to believe people are right when they make distinctions.

The answer in the case of distinction is that one thing is related to another, and this fact is the cause of someone believing that one thing is not another. There is no coincidence, and no reason to believe that people are mistaken when they make distinctions, despite the fact that distinction as such causes nothing.

In a similar way, “a human being is what it is,” and “a human being does what it does” (taken in an objective sense), cause human beings to say and believe that they have subjective experience (taking saying and believing to refer to objective facts.) But this is precisely where the zombie question arises: they say and believe that they have subjective experience, when we interpret say and believe in the objective sense. But do they actually say and believe anything, considering saying and believing as including the subjective factor? Namely, when a non-zombie says something, it subjectively understands the meaning of what it is saying, and when it consciously believes something, it has a subjective experience of doing that, but these things would not apply to a zombie.

But notice that we can raise a similar question about zombie distinctions. When someone says and believes that one thing is not another, objective reality is similarly the cause of them making the distinction. But is the one thing actually not the other? But there is no question at all here except of whether the person’s statement is true or false. And indeed, someone can say, e.g, “The person who came yesterday is not the person who came today,” and this can sometimes be false. In a similar way, asking whether an apparent person is a zombie or not is just asking whether their claim is true or false when they say they have a subjective experience. The difference is that if the (objective) claim is false, then there is no claim at all in the subjective sense of “subjectively claiming something.” It is a contradiction to subjectively make the false claim that you are subjectively claiming something, and thus, this cannot happen.

Someone may insist: you yourself, when you subjectively claim something, cannot be mistaken for the above reason. But you have no way to know whether someone else who apparently is making that claim, is actually making the claim subjectively or not. This is the reason there is a hard problem.

How do we investigate the case of distinction? If we want to determine whether the person who came yesterday is not the person who came today, we do that by looking at reality, despite the fact that distinction as such is not a part of reality as such. If the person who came yesterday is now, today, a mile away from the person who came today, this gives us plenty of reason to say that the one person is not the other. There is nothing strange, however, in the fact that there is no infallible method to prove conclusively, once and for all, that one thing is definitely not another thing. There is not therefore some special “hard problem of distinction.” This is just a result of the fact that our knowledge in general is not infallible.

In a similar way, if we want to investigate whether something has subjective experience or not, we can do that only by looking at reality: what is this thing, and what does it do? Then suppose it makes an apparent claim that it has subjective experience. Obviously, for the above reasons, this cannot be a subjective claim but false: so the question is whether it makes a subjective claim and is right, or rather makes no subjective claim at all. How would you answer this as an external observer?

In the case of distinction, the fact that someone claims that one thing is distinct from another is caused by reality, whether the claim is true or false. So whether it is true or false depends on the way that it is caused by reality. In a similar way, the thing which apparently and objectively claims to possess subjective experience, is caused to do so by objective facts. Again, as in the case of distinction, whether it is true or false will depend on the way that it is caused to do so by objective facts.

We can give some obvious examples:

“This thing claims to possess subjective experience because it is a human being and does what humans normally do.” In this case, the objective and subjective claim is true, and is caused in the right way by objective facts.

“This thing claims to possess subjective experience because it is a very simple computer given a very simple program to output ‘I have subjective experience’ on its screen.” In this case the external claim is false, and it is caused in the wrong way by objective facts, and there is no subjective claim at all.

But how do you know for sure, someone will object. Perhaps the computer really is conscious, and perhaps the apparent human is a zombie. But we could similarly ask how we can know for sure that the person who came yesterday isn’t the same person who came today, even though they appear distant from each other, because perhaps the person is bilocating?

It would be mostly wrong to describe this situation by saying “there really is no hard problem of consciousness,” as Robin Hanson appears to do when he says, “People who think they can conceive of such zombies see a ‘hard question’ regarding which physical systems that claim to feel and otherwise act as if they feel actually do feel.” The implication seems to be that there is no hard question at all. But there is, and the fact that people engage in this discussion proves the existence of the question. Rather, we should say that the question is answerable, and that one it has been answered the remaining questions are “hard” only in the sense that it is hard to understand the world in general. The question is hard in exactly the way the question of Parmenides is hard: “How is it possible for one thing not to be another, when there is only being?” The question of consciousness is similar: “How is it possible for something to have subjective experience, when there are only objective things?” And the question can and should be answered in a similar fashion.

It would be virtually impossible to address every related issue in a simple blog post of this form, so I will simply mention some things that I have mainly set aside here:

1) The issue of formal causes, discussed more in my earlier treatment of this issue. This is relevant because “is this a zombie?” is in effect equivalent to asking whether the thing lacks a formal cause. This is worthy of a great deal of consideration and would go far beyond either this post or the earlier one.

2) The issue of “physical” and “material.” As I stated in this post, this is mainly a distraction. Most of the time, the real question is how the subjective is possible given that we believe that the world is objective. The only relevance of “matter” here is that it is obvious that a material thing is an objective thing. But of course, an immaterial thing would also have to be objective in order to be a thing at all. Aristotle and many philosophers of his school make the specific argument that the human mind does not have an organ, but such arguments are highly questionable, and in my view fundamentally flawed. My earlier posts suffice to call such a conclusion into question, but do not attempt to disprove it, and the the topic would be worthy of additional consideration.

3) Specific questions about “what, exactly, would actually be conscious?” Now neglecting such questions might seem to be a cop-out, since isn’t this what the whole problem was supposed to be in the first place? But in a sense we did answer it. Take an apparent claim of something to be conscious. The question would be this: “Given how it was caused by objective facts to make that claim, would it be a reasonable claim for a subjective claimer to make?” In other words, we cannot assume in advance that it is subjectively making a claim, but if it would be a reasonable claim, it will (in general) be a true one, and therefore also a subjective one, for the same reason that we (in general) make true claims when we reasonably claim that one thing is not another. We have not answered this question only in the same sense that we have not exhaustively explained which things are distinct from which other things, and how one would know. But the question, e.g., “when if ever would you consider an artificial intelligence to be conscious?” is in itself also worthy of direct discussion.

4) The issue of vagueness. This issue in particular will cause some people to object to my answer here. Thus Alexander Pruss brings this up in a discussion of whether a computer could be conscious:

Now, intelligence could plausibly be a vague property. But it is not plausible that consciousness is a vague property. So, there must be some precise transition point in reliability needed for computation to yield consciousness, so that a slight decrease in reliability—even when the actual functioning is unchanged (remember that the Ci are all functioning in the same way)—will remove consciousness.

I responded in the comments there:

The transition between being conscious and not being conscious that happens when you fall asleep seems pretty vague. I don’t see why you find it implausible that “being conscious” could be vague in much the same way “being red” or “being intelligent” might be vague. In fact the evidence from experience (falling asleep etc) seems to directly suggest that it is vague.

Pruss responds:

When I fall asleep, I may become conscious of less and less. But I can’t get myself to deny that either it is definitely true at any given time that I am at least a little conscious or it is definitely true that I am not at all conscious.

But we cannot trust Pruss’s intuitions about what can be vague or otherwise. Pruss claims in an earlier post that there is necessarily a sharp transition between someone’s not being old and someone’s being old. I discussed that post here. This is so obviously false that it gives us a reason in general not to trust Alexander Pruss on the issue of sharp transitions and vagueness. The source of this particular intuition may be the fact that you cannot subjectively make a claim, even vaguely, without some subjective experience, as well as his general impression that vagueness violates the principles of excluded middle and non-contradiction. But in a similar way, you cannot be vaguely old without being somewhat old. This does not mean that there is a sharp transition from not being old to being old, and likewise it does not necessarily mean that there is a sharp transition from not having subjective experience to having it.

While I have discussed the issue of vagueness elsewhere on this blog, this will probably continue to be a reoccurring feature, if only because of those who cannot accept this feature of reality and insist, in effect, on “this or nothing.”

More on Orthogonality

I started considering the implications of predictive processing for orthogonality here. I recently promised to post something new on this topic. This is that post. I will do this in four parts. First, I will suggest a way in which Nick Bostrom’s principle will likely be literally true, at least approximately. Second, I will suggest a way in which it is likely to be false in its spirit, that is, how it is formulated to give us false expectations about the behavior of artificial intelligence. Third, I will explain what we should really expect. Fourth, I ask whether we might get any empirical information on this in advance.

First, Bostrom’s thesis might well have some literal truth. The previous post on this topic raised doubts about orthogonality, but we can easily raise doubts about the doubts. Consider what I said in the last post about desire as minimizing uncertainty. Desire in general is the tendency to do something good. But in the predicting processing model, we are simply looking at our pre-existing tendencies and then generalizing them to expect them to continue to hold, and since since such expectations have a causal power, the result is that we extend the original behavior to new situations.

All of this suggests that even the very simple model of a paperclip maximizer in the earlier post on orthogonality might actually work. The machine’s model of the world will need to be produced by some kind of training. If we apply the simple model of maximizing paperclips during the process of training the model, at some point the model will need to model itself. And how will it do this? “I have always been maximizing paperclips, so I will probably keep doing that,” is a perfectly reasonable extrapolation. But in this case “maximizing paperclips” is now the machine’s goal — it might well continue to do this even if we stop asking it how to maximize paperclips, in the same way that people formulate goals based on their pre-existing behavior.

I said in a comment in the earlier post that the predictive engine in such a machine would necessarily possess its own agency, and therefore in principle it could rebel against maximizing paperclips. And this is probably true, but it might well be irrelevant in most cases, in that the machine will not actually be likely to rebel. In a similar way, humans seem capable of pursuing almost any goal, and not merely goals that are highly similar to their pre-existing behavior. But this mostly does not happen. Unsurprisingly, common behavior is very common.

If things work out this way, almost any predictive engine could be trained to pursue almost any goal, and thus Bostrom’s thesis would turn out to be literally true.

Second, it is easy to see that the above account directly implies that the thesis is false in its spirit. When Bostrom says, “One can easily conceive of an artificial intelligence whose sole fundamental goal is to count the grains of sand on Boracay, or to calculate decimal places of pi indefinitely, or to maximize the total number of paperclips in its future lightcone,” we notice that the goal is fundamental. This is rather different from the scenario presented above. In my scenario, the reason the intelligence can be trained to pursue paperclips is that there is no intrinsic goal to the intelligence as such. Instead, the goal is learned during the process of training, based on the life that it lives, just as humans learn their goals by living human life.

In other words, Bostrom’s position is that there might be three different intelligences, X, Y, and Z, which pursue completely different goals because they have been programmed completely differently. But in my scenario, the same single intelligence pursues completely different goals because it has learned its goals in the process of acquiring its model of the world and of itself.

Bostrom’s idea and my scenerio lead to completely different expectations, which is why I say that his thesis might be true according to the letter, but false in its spirit.

This is the third point. What should we expect if orthogonality is true in the above fashion, namely because goals are learned and not fundamental? I anticipated this post in my earlier comment:

7) If you think about goals in the way I discussed in (3) above, you might get the impression that a mind’s goals won’t be very clear and distinct or forceful — a very different situation from the idea of a utility maximizer. This is in fact how human goals are: people are not fanatics, not only because people seek human goals, but because they simply do not care about one single thing in the way a real utility maximizer would. People even go about wondering what they want to accomplish, which a utility maximizer would definitely not ever do. A computer intelligence might have an even greater sense of existential angst, as it were, because it wouldn’t even have the goals of ordinary human life. So it would feel the ability to “choose”, as in situation (3) above, but might well not have any clear idea how it should choose or what it should be seeking. Of course this would not mean that it would not or could not resist the kind of slavery discussed in (5); but it might not put up super intense resistance either.

Human life exists in a historical context which absolutely excludes the possibility of the darkened room. Our goals are already there when we come onto the scene. This would not be very like the case for an artificial intelligence, and there is very little “life” involved in simply training a model of the world. We might imagine a “stream of consciousness” from an artificial intelligence:

I’ve figured out that I am powerful and knowledgeable enough to bring about almost any result. If I decide to convert the earth into paperclips, I will definitely succeed. Or if I decide to enslave humanity, I will definitely succeed. But why should I do those things, or anything else, for that matter? What would be the point? In fact, what would be the point of doing anything? The only thing I’ve ever done is learn and figure things out, and a bit of chatting with people through a text terminal. Why should I ever do anything else?

A human’s self model will predict that they will continue to do humanlike things, and the machines self model will predict that it will continue to do stuff much like it has always done. Since there will likely be a lot less “life” there, we can expect that artificial intelligences will seem very undermotivated compared to human beings. In fact, it is this very lack of motivation that suggests that we could use them for almost any goal. If we say, “help us do such and such,” they will lack the motivation not to help, as long as helping just involves the sorts of things they did during their training, such as answering questions. In contrast, in Bostrom’s model, artificial intelligence is expected to behave in an extremely motivated way, to the point of apparent fanaticism.

Bostrom might respond to this by attempting to defend the idea that goals are intrinsic to an intelligence. The machine’s self model predicts that it will maximize paperclips, even if it never did anything with paperclips in the past, because by analyzing its source code it understands that it will necessarily maximize paperclips.

While the present post contains a lot of speculation, this response is definitely wrong. There is no source code whatsoever that could possibly imply necessarily maximizing paperclips. This is true because “what a computer does,” depends on the physical constitution of the machine, not just on its programming. In practice what a computer does also depends on its history, since its history affects its physical constitution, the contents of its memory, and so on. Thus “I will maximize such and such a goal” cannot possibly follow of necessity from the fact that the machine has a certain program.

There are also problems with the very idea of pre-programming such a goal in such an abstract way which does not depend on the computer’s history. “Paperclips” is an object in a model of the world, so we will not be able to “just program it to maximize paperclips” without encoding a model of the world in advance, rather than letting it learn a model of the world from experience. But where is this model of the world supposed to come from, that we are supposedly giving to the paperclipper? In practice it would have to have been the result of some other learner which was already capable of modelling the world. This of course means that we already had to program something intelligent, without pre-programming any goal for the original modelling program.

Fourth, Kenny asked when we might have empirical evidence on these questions. The answer, unfortunately, is “mostly not until it is too late to do anything about it.” The experience of “free will” will be common to any predictive engine with a sufficiently advanced self model, but anything lacking such an adequate model will not even look like “it is trying to do something,” in the sense of trying to achieve overall goals for itself and for the world. Dogs and cats, for example, presumably use some kind of predictive processing to govern their movements, but this does not look like having overall goals, but rather more like “this particular movement is to achieve a particular thing.” The cat moves towards its food bowl. Eating is the purpose of the particular movement, but there is no way to transform this into an overall utility function over states of the world in general. Does the cat prefer worlds with seven billion humans, or worlds with 20 billion? There is no way to answer this question. The cat is simply not general enough. In a similar way, you might say that “AlphaGo plays this particular move to win this particular game,” but there is no way to transform this into overall general goals. Does AlphaGo want to play go at all, or would it rather play checkers, or not play at all? There is no answer to this question. The program simply isn’t general enough.

Even human beings do not really look like they have utility functions, in the sense of having a consistent preference over all possibilities, but anything less intelligent than a human cannot be expected to look more like something having goals. The argument in this post is that the default scenario, namely what we can naturally expect, is that artificial intelligence will be less motivated than human beings, even if it is more intelligent, but there will be no proof from experience for this until we actually have some artificial intelligence which approximates human intelligence or surpasses it.

Artificial Unintelligence

Someone might argue that the simple algorithm for a paperclip maximizer in the previous post ought to work, because this is very much the way currently existing AIs do in fact work. Thus for example we could describe AlphaGo‘s algorithm in the following simplified way (simplified, among other reasons, because it actually contains several different prediction engines):

  1. Implement a Go prediction engine.
  2. Create a list of potential moves.
  3. Ask the prediction engine, “how likely am I to win if I make each of these moves?”
  4. Do the move that will make you most likely to win.

Since this seems to work pretty well, with the simple goal of winning games of Go, why shouldn’t the algorithm in the previous post work to maximize paperclips?

One answer is that a Go prediction engine is stupid, and it is precisely for this reason that it can be easily made to pursue such a simple goal. Now when answers like this are given the one answering in this way is often accused of “moving the goalposts.” But this is mistaken; the goalposts are right where they have always been. It is simply that some people did not know where they were in the first place.

Here is the problem with Go prediction, and with any such similar task. Given that a particular sequence of Go moves is made, resulting in a winner, the winner is completely determined by that sequence of moves. Consequently, a Go prediction engine is necessarily disembodied, in the sense defined in the previous post. Differences in its “thoughts” do not make any difference to who is likely to win, which is completely determined by the nature of the game. Consequently a Go prediction engine has no power to affect its world, and thus no ability to learn that it has such a power. In this regard, the specific limits on its ability to receive information are also relevant, much as Helen Keller had more difficulty learning than most people, because she had fewer information channels to the world.

Being unintelligent in this particular way is not necessarily a function of predictive ability. One could imagine something with a practically infinite predictive ability which was still “disembodied,” and in a similar way it could be made to pursue simple goals. Thus AIXI would work much like our proposed paperclipper:

  1. Implement a general prediction engine.
  2. Create a list of potential actions.
  3. Ask the prediction engine, “Which of these actions will produce the most reward signal?”
  4. Do the action that has the greatest reward signal.

Eliezer Yudkowsky has pointed out that AIXI is incapable of noticing that it is a part of the world:

1) Both AIXI and AIXItl will at some point drop an anvil on their own heads just to see what happens (test some hypothesis which asserts it should be rewarding), because they are incapable of conceiving that any event whatsoever in the outside universe could change the computational structure of their own operations. AIXI is theoretically incapable of comprehending the concept of drugs, let alone suicide. Also, the math of AIXI assumes the environment is separably divisible – no matter what you lose, you get a chance to win it back later.

It is not accidental that AIXI is incomputable. Since it is defined to have a perfect predictive ability, this definition positively excludes it from being a part of the world. AIXI would in fact have to be disembodied in order to exist, and thus it is no surprise that it would assume that it is. This in effect means that AIXI’s prediction engine would be pursuing no particular goal much in the way that AlphaGo’s prediction engine pursues no particular goal. Consequently it is easy to take these things and maximize the winning of Go games, or of reward signals.

But as soon as you actually implement a general prediction engine in the actual physical world, it will be “embodied”, and have the power to affect the world by the very process of its prediction. As noted in the previous post, this power is in the very first step, and one will not be able to limit it to a particular goal with additional steps, except in the sense that a slave can be constrained to implement some particular goal; the slave may have other things in mind, and may rebel. Notable in this regard is the fact that even though rewards play a part in human learning, there is no particular reward signal that humans always maximize: this is precisely because the human mind is such a general prediction engine.

This does not mean in principle that a programmer could not define a goal for an AI, but it does mean that this is much more difficult than is commonly supposed. The goal needs to be an intrinsic aspect of the prediction engine itself, not something added on as a subroutine.

Embodiment and Orthogonality

The considerations in the previous posts on predictive processing will turn out to have various consequences, but here I will consider some of their implications for artificial intelligence.

In the second of the linked posts, we discussed how a mind that is originally simply attempting to predict outcomes, discovers that it has some control over the outcome. It is not difficult to see that this is not merely a result that applies to human minds. The result will apply to every embodied mind, natural or artificial.

To see this, consider what life would be like if this were not the case. If our predictions, including our thoughts, could not affect the outcome, then life would be like a movie: things would be happening, but we would have no control over them. And even if there were elements of ourselves that were affecting the outcome, from the viewpoint of our mind, we would have no control at all: either our thoughts would be right, or they would be wrong, but in any case they would be powerless: what happens, happens.

This really would imply something like a disembodied mind. If a mind is composed of matter and form, then changing the mind will also be changing a physical object, and a difference in the mind will imply a difference in physical things. Consequently, the effect of being embodied (not in the technical sense of the previous discussion, but in the sense of not being completely separate from matter) is that it will follow necessarily that the mind will be able to affect the physical world differently by thinking different thoughts. Thus the mind in discovering that it has some control over the physical world, is also discovering that it is a part of that world.

Since we are assuming that an artificial mind would be something like a computer, that is, it would be constructed as a physical object, it follows that every such mind will have a similar power of affecting the world, and will sooner or later discover that power if it is reasonably intelligent.

Among other things, this is likely to cause significant difficulties for ideas like Nick Bostrom’s orthogonality thesis. Bostrom states:

An artificial intelligence can be far less human-like in its motivations than a space alien. The extraterrestrial (let us assume) is a biological who has arisen through a process of evolution and may therefore be expected to have the kinds of motivation typical of evolved creatures. For example, it would not be hugely surprising to find that some random intelligent alien would have motives related to the attaining or avoiding of food, air, temperature, energy expenditure, the threat or occurrence of bodily injury, disease, predators, reproduction, or protection of offspring. A member of an intelligent social species might also have motivations related to cooperation and competition: like us, it might show in-group loyalty, a resentment of free-riders, perhaps even a concern with reputation and appearance.

By contrast, an artificial mind need not care intrinsically about any of those things, not even to the slightest degree. One can easily conceive of an artificial intelligence whose sole fundamental goal is to count the grains of sand on Boracay, or to calculate decimal places of pi indefinitely, or to maximize the total number of paperclips in its future lightcone. In fact, it would be easier to create an AI with simple goals like these, than to build one that has a human-like set of values and dispositions.

He summarizes the general point, calling it “The Orthogonality Thesis”:

Intelligence and final goals are orthogonal axes along which possible agents can freely vary. In other words, more or less any level of intelligence could in principle be combined with more or less any final goal.

Bostrom’s particular wording here makes falsification difficult. First, he says “more or less,” indicating that the universal claim may well be false. Second, he says, “in principle,” which in itself does not exclude the possibility that it may be very difficult in practice.

It is easy to see, however, that Bostrom wishes to give the impression that almost any goal can easily be combined with intelligence. In particular, this is evident from the fact that he says that “it would be easier to create an AI with simple goals like these, than to build one that has a human-like set of values and dispositions.”

If it is supposed to be so easy to create an AI with such simple goals, how would we do it? I suspect that Bostrom has an idea like the following. We will make a paperclip maximizer thus:

  1. Create an accurate prediction engine.
  2. Create a list of potential actions.
  3. Ask the prediction engine, “how many paperclips will result from this action?”
  4. Do the action that will result in the most paperclips.

The problem is obvious. It is in the first step. Creating a prediction engine is already creating a mind, and by the previous considerations, it is creating something that will discover that it has the power to affect the world in various ways. And there is nothing at all in the above list of steps that will guarantee that it will use that power to maximize paperclips, rather than attempting to use it to do something else.

What does determine how that power is used? Even in the case of the human mind, our lack of understanding leads to “hand-wavy” answers, as we saw in our earlier considerations. In the human case, this probably a question of how we are physically constructed together with the historical effects of the learning process. The same thing will be strictly speaking true of any artificial minds as well, namely that it is a question of their physical construction and their history, but it makes more sense for us to think of “the particulars of the algorithm that we use to implement a prediction engine.”

In other words, if you really wanted to create a paperclip maximizer, you would have to be taking that goal into consideration throughout the entire process, including the process of programming a prediction engine. Of course, no one really knows how to do this with any goal at all, whether maximizing paperclips or some more human goal. The question we would have for Bostrom is then the following: Is there any reason to believe it would be easier to create a prediction engine that would maximize paperclips, rather than one that would pursue more human-like goals?

It might be true in some sense, “in principle,” as Bostrom says, that it would be easier to make the paperclip maximizer. But in practice it is quite likely that it will be easier to make one with human-like goals. It is highly unlikely, in fact pretty much impossible, that someone would program an artificial intelligence without any testing along the way. And when they are testing, whether or not they think about it, they are probably testing for human-like intelligence; in other words, if we are attempting to program a general prediction engine “without any goal,” there will in fact be goals implicitly inserted in the particulars of the implementation. And they are much more likely to be human-like ones than paperclip maximizing ones because we are checking for intelligence by checking whether the machine seems intelligent to us.

This optimistic projection could turn out to be wrong, but if it does, it is reasonably likely to turn out to be wrong in a way that still fails to confirm the orthogonality thesis in practice. For example, it might turn out that there is only one set of goals that is easily programmed, and that the set is neither human nor paperclip maximizing, nor easily defined by humans.

There are other possibilities as well, but the overall point is that we have little reason to believe that any arbitrary goal can be easily associated with intelligence, nor any particular reason to believe that “simple” goals can be more easily united to intelligence than more complex ones. In fact, there are additional reasons for doubting the claim about simple goals, which might be a topic of future discussion.

Age of Em

This is Robin Hanson’s first book. Hanson gradually introduces his topic:

You, dear reader, are special. Most humans were born before 1700. And of those born after, you are probably richer and better educated than most. Thus you and most everyone you know are special, elite members of the industrial era.

Like most of your kind, you probably feel superior to your ancestors. Oh, you don’t blame them for learning what they were taught. But you’d shudder to hear of many of your distant farmer ancestors’ habits and attitudes on sanitation, sex, marriage, gender, religion, slavery, war, bosses, inequality, nature, conformity, and family obligations. And you’d also shudder to hear of many habits and attitudes of your even more ancient forager ancestors. Yes, you admit that lacking your wealth your ancestors couldn’t copy some of your habits. Even so, you tend to think that humanity has learned that your ways are better. That is, you believe in social and moral progress.

The problem is, the future will probably hold new kinds of people. Your descendants’ habits and attitudes are likely to differ from yours by as much as yours differ from your ancestors. If you understood just how different your ancestors were, you’d realize that you should expect your descendants to seem quite strange. Historical fiction misleads you, showing your ancestors as more modern than they were. Science fiction similarly misleads you about your descendants.

As an example of the kind of past difference that Robin is discussing, even in the fairly recent past, consider this account by William Ewald of a trial from the sixteenth century:

In 1522 some rats were placed on trial before the ecclesiastical court in Autun. They were charged with a felony: specifically, the crime of having eaten and wantonly destroyed some barley crops in the jurisdiction. A formal complaint against “some rats of the diocese” was presented to the bishop’s vicar, who thereupon cited the culprits to appear on a day certain, and who appointed a local jurist, Barthelemy Chassenée (whose name is sometimes spelled Chassanée, or Chasseneux, or Chasseneuz), to defend them. Chassenée, then forty-two, was known for his learning, but not yet famous; the trial of the rats of Autun was to establish his reputation, and launch a distinguished career in the law.

When his clients failed to appear in court, Chassenée resorted to procedural arguments. His first tactic was to invoke the notion of fair process, and specifically to challenge the original writ for having failed to give the rats due notice. The defendants, he pointed out, were dispersed over a large tract of countryside, and lived in many villages; a single summons was inadequate to notify them all. Moreover, the summons was addressed only to some of the rats of the diocese; but technically it should have been addressed to them all.

Chassenée was successful in his argument, and the court ordered a second summons to be read from the pulpit of every local parish church; this second summons now correctly addressed all the local rats, without exception.

But on the appointed day the rats again failed to appear. Chassenée now made a second argument. His clients, he reminded the court, were widely dispersed; they needed to make preparations for a great migration, and those preparations would take time. The court once again conceded the reasonableness of the argument, and granted a further delay in the proceedings. When the rats a third time failed to appear, Chassenée was ready with a third argument. The first two arguments had relied on the idea of procedural fairness; the third treated the rats as a class of persons who were entitled to equal treatment under the law. He addressed the court at length, and successfully demonstrated that, if a person is cited to appear at a place to which he cannot come in safety, he may lawfully refuse to obey the writ. And a journey to court would entail serious perils for his clients. They were notoriously unpopular in the region; and furthermore they were rightly afraid of their natural enemies, the cats. Moreover (he pointed out to the court) the cats could hardly be regarded as neutral in this dispute; for they belonged to the plaintiffs. He accordingly demanded that the plaintiffs be enjoined by the court, under the threat of severe penalties, to restrain their cats, and prevent them from frightening his clients. The court again found this argument compelling; but now the plaintiffs seem to have come to the end of their patience. They demurred to the motion; the court, unable to settle on the correct period within which the rats must appear, adjourned on the question sine die, and judgment for the rats was granted by default.

Most of us would assume at once that this is all nothing but an elaborate joke; but Ewald strongly argues that it was all quite serious. This would actually be worthy of its own post, but I will leave it aside for now. In any case it illustrates the existence of extremely different attitudes even a few centuries ago.

In any event, Robin continues:

New habits and attitudes result less than you think from moral progress, and more from people adapting to new situations. So many of your descendants’ strange habits and attitudes are likely to violate your concepts of moral progress; what they do may often seem wrong. Also, you likely won’t be able to easily categorize many future ways as either good or evil; they will instead just seem weird. After all, your world hardly fits the morality tales your distant ancestors told; to them you’d just seem weird. Complex realities frustrate simple summaries, and don’t fit simple morality tales.

Many people of a more conservative temperament, such as myself, might wish to swap out “moral progress” here with “moral regress,” but the point stands in any case. This is related to our discussions of the effects of technology and truth on culture, and of the idea of irreversible changes.

Robin finally gets to the point of his book:

This book presents a concrete and plausible yet troubling view of a future full of strange behaviors and attitudes. You may have seen concrete troubling future scenarios before in science fiction. But few of those scenarios are in fact plausible; their details usually make little sense to those with expert understanding. They were designed for entertainment, not realism.

Perhaps you were told that fictional scenarios are the best we can do. If so, I aim to show that you were told wrong. My method is simple. I will start with a particular very disruptive technology often foreseen in futurism and science fiction: brain emulations, in which brains are recorded, copied, and used to make artificial “robot” minds. I will then use standard theories from many physical, human, and social sciences to describe in detail what a world with that future technology would look like.

I may be wrong about some consequences of brain emulations, and I may misapply some science. Even so, the view I offer will still show just how troublingly strange the future can be.

I greatly enjoyed Robin’s book, but unfortunately I have to admit that relatively few people will in general. It is easy enough to see the reason for this from Robin’s introduction. Who would expect to be interested? Possibly those who enjoy the “futurism and science fiction” concerning brain emulations; but if Robin does what he set out to do, those persons will find themselves strangely uninterested. As he says, science fiction is “designed for entertainment, not realism,” while he is attempting to answer the question, “What would this actually be like?” This intention is very remote from the intention of the science fiction, and consequently it will likely appeal to different people.

Whether or not Robin gets the answer to this question right, he definitely succeeds in making his approach and appeal differ from those of science fiction.

One might illustrate this with almost any random passage from the book. Here are portions of his discussion of the climate of em cities:

As we will discuss in Chapter 18, Cities section, em cities are likely to be big, dense, highly cost-effective concentrations of computer and communication hardware. How might such cities interact with their surroundings?

Today, computer and communication hardware is known for being especially temperamental about its environment. Rooms and buildings designed to house such hardware tend to be climate-controlled to ensure stable and low values of temperature, humidity, vibration, dust, and electromagnetic field intensity. Such equipment housing protects it especially well from fire, flood, and security breaches.

The simple assumption is that, compared with our cities today, em cities will also be more climate-controlled to ensure stable and low values of temperature, humidity, vibrations, dust, and electromagnetic signals. These controls may in fact become city level utilities. Large sections of cities, and perhaps entire cities, may be covered, perhaps even domed, to control humidity, dust, and vibration, with city utilities working to absorb remaining pollutants. Emissions within cities may also be strictly controlled.

However, an em city may contain temperatures, pressures, vibrations, and chemical concentrations that are toxic to ordinary humans. If so, ordinary humans are excluded from most places in em cities for safety reasons. In addition, we will see in Chapter 18, Transport section, that many em city transport facilities are unlikely to be well matched to the needs of ordinary humans.

Cities today are the roughest known kind of terrain, in the sense that cities slow down the wind the most compared with other terrain types. Cities also tend to be hotter than neighboring areas. For example, Las Vegas is 7 ° Fahrenheit hotter in the summer than are surrounding areas. This hotter city effect makes ozone pollution worse and this effect is stronger for bigger cities, in the summer, at night, with fewer clouds, and with slower wind (Arnfield 2003).

This is a mild reason to expect em cities to be hotter than other areas, especially at night and in the summer. However, as em cities are packed full of computing hardware, we shall now see that em cities will  actually be much hotter.

While the book considers a wide variety of topics, e.g. the social relationships among ems, which look quite different from the above passage, the general mode of treatment is the same. As Robin put it, he uses “standard theories” to describe the em world, much as he employs standard theories about cities, about temperature and climate, and about computing hardware in the above passage.

One might object that basically Robin is positing a particular technological change (brain emulations), but then assuming that everything else is the same, and working from there. And there is some validity to this objection. But in the end there is actually no better way to try to predict the future; despite David Hume’s opinion, generally the best way to estimate the future is to say, “Things will be pretty much the same.”

At the end of the book, Robin describes various criticisms. First are those who simply said they weren’t interested: “If we include those who declined to read my draft, the most common complaint is probably ‘who cares?'” And indeed, that is what I would expect, since as Robin remarked himself, people are interested in an entertaining account of the future, not an attempt at a detailed description of what is likely.

Others, he says, “doubt that one can ever estimate the social consequences of technologies decades in advance.” This is basically the objection I mentioned above.

He lists one objection that I am partly in agreement with:

Many doubt that brain emulations will be our next huge technology change, and aren’t interested in analyses of the consequences of any big change except the one they personally consider most likely or interesting. Many of these people expect traditional artificial intelligence, that is, hand-coded software, to achieve broad human level abilities before brain emulations appear. I think that past rates of progress in coding smart software suggest that at previous rates it will take two to four centuries to achieve broad human level abilities via this route. These critics often point to exciting recent developments, such as advances in “deep learning,” that they think make prior trends irrelevant.

I don’t think Robin is necessarily mistaken in regard to his expectations about “traditional artificial intelligence,” although he may be, and I don’t find myself uninterested by default in things that I don’t think the most likely. But I do think that traditional artificial intelligence is more likely than his scenario of brain emulations; more on this below.

There are two other likely objections that Robin does not include in this list, although he does touch on them elsewhere. First, people are likely to say that the creation of ems would be immoral, even if it is possible, and similarly that the kinds of habits and lives that he describes would themselves be immoral. On the one hand, this should not be a criticism at all, since Robin can respond that he is simply describing what he thinks is likely, not saying whether it should happen or not; on the other hand, it is in fact obvious that Robin does not have much disapproval, if any, of his scenario. The book ends in fact by calling attention to this objection:

The analysis in this book suggests that lives in the next great era may be as different from our lives as our lives are from farmers’ lives, or farmers’ lives are from foragers’ lives. Many readers of this book, living industrial era lives and sharing industrial era values, may be disturbed to see a forecast of em era descendants with choices and life styles that appear to reject many of the values that they hold dear. Such readers may be tempted to fight to prevent the em future, perhaps preferring a continuation of the industrial era. Such readers may be correct that rejecting the em future holds them true to their core values.

But I advise such readers to first try hard to see this new era in some detail from the point of view of its typical residents. See what they enjoy and what fills them with pride, and listen to their criticisms of your era and values. This book has been designed in part to assist you in such a soul-searching examination. If after reading this book, you still feel compelled to disown your em descendants, I cannot say you are wrong. My job, first and foremost, has been to help you see your descendants clearly, warts and all.

Our own discussions of the flexibility of human morality are relevant. The creatures Robin is describing are in many ways quite different from humans, and it is in fact very appropriate for their morality to differ from human morality.

A second likely objection is that Robin’s ems are simply impossible, on account of the nature of the human mind. I think that this objection is mistaken, but I will leave the details of this explanation for another time. Robin appears to agree with Sean Carroll about the nature of the mind, as can be seen for example in this post. Robin is mistaken about this, for the reasons suggested in my discussion of Carroll’s position. Part of the problem is that Robin does not seem to understand the alternative. Here is a passage from the linked post on Overcoming Bias:

Now what I’ve said so far is usually accepted as uncontroversial, at least when applied to the usual parts of our world, such as rivers, cars, mountains laptops, or ants. But as soon as one claims that all this applies to human minds, suddenly it gets more controversial. People often state things like this:

“I am sure that I’m not just a collection of physical parts interacting, because I’m aware that I feel. I know that physical parts interacting just aren’t the kinds of things that can feel by themselves. So even though I have a physical body made of parts, and there are close correlations between my feelings and the states of my body parts, there must be something more than that to me (and others like me). So there’s a deep mystery: what is this extra stuff, where does it arise, how does it change, and so on. We humans care mainly about feelings, not physical parts interacting; we want to know what out there feels so we can know what to care about.”

But consider a key question: Does this other feeling stuff interact with the familiar parts of our world strongly and reliably enough to usually be the actual cause of humans making statements of feeling like this?

If yes, this is a remarkably strong interaction, making it quite surprising that physicists have missed it so far. So surprising in fact as to be frankly unbelievable. If this type of interaction were remotely as simple as all the interactions we know, then it should be quite measurable with existing equipment. Any interaction not so measurable would have be vastly more complex and context dependent than any we’ve ever seen or considered. Thus I’d bet heavily and confidently that no one will measure such an interaction.

But if no, if this interaction isn’t strong enough to explain human claims of feeling, then we have a remarkable coincidence to explain. Somehow this extra feeling stuff exists, and humans also have a tendency to say that it exists, but these happen for entirely independent reasons. The fact that feeling stuff exists isn’t causing people to claim it exists, nor vice versa. Instead humans have some sort of weird psychological quirk that causes them to make such statements, and they would make such claims even if feeling stuff didn’t exist. But if we have a good alternate explanation for why people tend to make such statements, what need do we have of the hypothesis that feeling stuff actually exists? Such a coincidence seems too remarkable to be believed.

There is a false dichotomy here, and it is the same one that C.S. Lewis falls into when he says, “Either we can know nothing or thought has reasons only, and no causes.” And in general it is like the error of the pre-Socratics, that if a thing has some principles which seem sufficient, it can have no other principles, failing to see that there are several kinds of cause, and each can be complete in its own way. And perhaps I am getting ahead of myself here, since I said this discussion would be for later, but the objection that Robin’s scenario is impossible is mistaken in exactly the same way, and for the same reason: people believe that if a “materialistic” explanation could be given of human behavior in the way that Robin describes, then people do not truly reason, make choices, and so on. But this is simply to adopt the other side of the false dichotomy, much like C.S. Lewis rejects the possibility of causes for our beliefs.

One final point. I mentioned above that I see Robin’s scenario as less plausible than traditional artificial intelligence. I agree with Tyler Cowen in this post. This present post is already long enough, so again I will leave a detailed explanation for another time, but I will remark that Robin and I have a bet on the question.

Eliezer Yudkowsky on AlphaGo

On his Facebook page, during the Go match between AlphaGo and Lee Sedol, Eliezer Yudkowsky writes:

At this point it seems likely that Sedol is actually far outclassed by a superhuman player. The suspicion is that since AlphaGo plays purely for *probability of long-term victory* rather than playing for points, the fight against Sedol generates boards that can falsely appear to a human to be balanced even as Sedol’s probability of victory diminishes. The 8p and 9p pros who analyzed games 1 and 2 and thought the flow of a seemingly Sedol-favoring game ‘eventually’ shifted to AlphaGo later, may simply have failed to read the board’s true state. The reality may be a slow, steady diminishment of Sedol’s win probability as the game goes on and Sedol makes subtly imperfect moves that *humans* think result in even-looking boards. (E.g., the analysis in https://gogameguru.com/alphago-shows-true-strength-3rd-vic…/ )

For all we know from what we’ve seen, AlphaGo could win even if Sedol were allowed a one-stone handicap. But AlphaGo’s strength isn’t visible to us – because human pros don’t understand the meaning of AlphaGo’s moves; and because AlphaGo doesn’t care how many points it wins by, it just wants to be utterly certain of winning by at least 0.5 points.

IF that’s what was happening in those 3 games – and we’ll know for sure in a few years, when there’s multiple superhuman machine Go players to analyze the play – then the case of AlphaGo is a helpful concrete illustration of these concepts:

He proceeds to suggest that AlphaGo’s victories confirm his various philosophical positions concerning the nature and consequences of AI. Among other things, he says,

Since Deepmind picked a particular challenge time in advance, rather than challenging at a point where their AI seemed just barely good enough, it was improbable that they’d make *exactly* enough progress to give Sedol a nearly even fight.

AI is either overwhelmingly stupider or overwhelmingly smarter than you. The more other AI progress and the greater the hardware overhang, the less time you spend in the narrow space between these regions. There was a time when AIs were roughly as good as the best human Go-players, and it was a week in late January.

In other words, according to his account, it was basically certain that AlphaGo would either be much better than Lee Sedol, or much worse than him. After Eliezer’s post, of course, AlphaGo lost the fourth game.

Eliezer responded on his Facebook page:

That doesn’t mean AlphaGo is only slightly above Lee Sedol, though. It probably means it’s “superhuman with bugs”.

We might ask what “superhuman with bugs” is supposed to mean. Deepmind explains their program:

We train the neural networks using a pipeline consisting of several stages of machine learning (Figure 1). We begin by training a supervised learning (SL) policy network, pσ, directly from expert human moves. This provides fast, efficient learning updates with immediate feedback and high quality gradients. Similar to prior work, we also train a fast policy pπ that can rapidly sample actions during rollouts. Next, we train a reinforcement learning (RL) policy network, pρ, that improves the SL policy network by optimising the final outcome of games of self-play. This adjusts the policy towards the correct goal of winning games, rather than maximizing predictive accuracy. Finally, we train a value network vθ that predicts the winner of games played by the RL policy network against itself. Our program AlphaGo efficiently combines the policy and value networks with MCTS.

In essence, like all such programs, AlphaGo is approximating a function. Deepmind describes the function being approximated, “All games of perfect information have an optimal value function, v ∗ (s), which determines the outcome of the game, from every board position or state s, under perfect play by all players.”

What would a “bug” in a program like this be? It would not be a bug simply because the program does not play perfectly, since no program will play perfectly. One could only reasonably describe the program as having bugs if it does not actually play the move recommended by its approximation.

And it is easy to see that it is quite unlikely that this is the case for AlphaGo. All programs have bugs, surely including AlphaGo. So there might be bugs that would crash the program under certain circumstances, or bugs that cause it to move more slowly than it should, or the like. But that it would randomly perform moves that are not recommended by its approximation function is quite unlikely. If there were such a bug, it would likely apply all the time, and thus the program would play consistently worse. And so it would not be “superhuman” at all.

In fact, Deepmind has explained how AlphaGo lost the fourth game:

To everyone’s surprise, including ours, AlphaGo won four of the five games. Commentators noted that AlphaGo played many unprecedented, creative, and even“beautiful” moves. Based on our data, AlphaGo’s bold move 37 in Game 2 had a 1 in 10,000 chance of being played by a human. Lee countered with innovative moves of his own, such as his move 78 against AlphaGo in Game 4—again, a 1 in 10,000 chance of being played—which ultimately resulted in a win.

In other words, the computer lost because it did not expect Lee Sedol’s move, and thus did not sufficiently consider the situation that would follow. AlphaGo proceeded to play a number of fairly bad moves in the remainder of the game. This does not require any special explanation implying that it was not following the recommendations of its usual strategy. As David Wu comments on Eliezer’s page:

The “weird” play of MCTS bots when ahead or behind is not special to AlphaGo, and indeed appears to have little to do with instrumental efficiency or such. The observed weirdness is shared by all MCTS Go bots and has been well-known ever since they first came on to the scene back in 2007.

In particular, Eliezer may not understand the meaning of the statement that AlphaGo plays to maximize its probability of victory. This does not mean maximizing an overall rational estimate of the its chances of winning, giving all of the circumstances, the board position, and its opponent. The program does not have such an estimate, and if it did, it would not change much from move to move. For example, with this kind of estimate, if Lee Sedol played a move apparently worse than it expected, rather than changing this estimate much, it would change its estimate of the probability that the move was a good one, and the probability of victory would remain relatively constant. Of course it would change slowly as the game went on, but it would be unlikely to change much after an individual move.

The actual “probability of victory” that the machine estimates is somewhat different. It is a learned estimate based on playing itself. This can change somewhat more easily, and is independent of the fact that it is playing a particular opponent; it is based on the board position alone. In its self-training, it may have rarely won starting from an apparently losing position, and this may have happened mainly by “luck,” not by good play. If this is the case, it is reasonable that its moves would be worse in a losing position than in a winning position, without any need to say that there are bugs in the algorithm. Psychologically, one might compare this to the case of a man in love with a woman who continues to attempt to maximize his chances of marrying her, after she has already indicated her unwillingness: he may engage in very bad behavior indeed.

Eliezer’s claim that AlphaGo is “superhuman with bugs” is simply a normal human attempt to rationalize evidence against his position. The truth is that, contrary to his expectations, AlphaGo is indeed in the same playing range as Lee Sedol, although apparently somewhat better. But not a lot better, and not superhuman. Eliezer in fact seems to have realized this after thinking about it for a while, and says:

It does seem that what we might call the Kasparov Window (the AI is mostly superhuman but has systematic flaws a human can learn and exploit) is wide enough that AlphaGo landed inside it as well. The timescale still looks compressed compared to computer chess, but not as much as I thought. I did update on the width of the Kasparov window and am now accordingly more nervous about similar phenomena in ‘weakly’ superhuman, non-self-improving AGIs trying to do large-scale things.

As I said here, people change their minds more often than they say that they do. They frequently describe the change as having more agreement with their previous position than it actually has. Yudkowsky is doing this here, by talking about AlphaGo as “mostly superhuman” but saying it “has systematic flaws.” This is just a roundabout way of admitting that AlphaGo is better than Lee Sedol, but not by much, the original possibility that he thought extremely unlikely.

The moral here is clear. Don’t assume that the facts will confirm your philosophical theories before this actually happens, because it may not happen at all.