What You Learned Before You Were Born

In Plato’s Meno, Socrates makes the somewhat odd claim that the ability of people to learn things without being directly told them proves that somehow they must have learned them or known them in advance. While we can reasonably assume this is wrong in a literal sense, there is some likeness of the truth here.

The whole of a human life is a continuous learning process generally speaking without any sudden jumps. We think of a baby’s learning as different from the learning of a child in school, and the learning of the child as rather different from the learning of an adult. But if you look at that process in itself, there may be sudden jumps in a person’s situation, such as when they graduate from school or when they get married, but there are no sudden jumps from not knowing anything about a topic or an object to suddenly knowing all about it. The learning itself happens gradually. It is the same with the manner in which it takes place; adults do indeed learn in a different manner from that in which children or infants learn. But if you ask how that manner got to be different, it certainly did so gradually, not suddenly.

But in addition to all this, there is a kind of “knowledge” that is not learned at all during one’s life, but is possessed from the beginning. From the beginning people have the ability to interact with the world in such a way that they will survive and go on to learn things. Thus from the beginning they must “know” how to do this. Now one might object that infants have no such knowledge, and that the only reason they survive is that their parents or others keep them alive. But the objection is mistaken: infants know to cry out when they hungry or in pain, and this is part of what keeps them alive. Similarly, an infant knows to drink the milk from its mother rather than refusing it, and this is part of what keeps it alive. Similarly in regard to learning, if an infant did not know the importance of paying close attention to speech sounds, it would never learn a language.

When was this “knowledge” learned? Not in the form of a separated soul, but through the historical process of natural selection.

Selection and Artificial Intelligence

This has significant bearing on our final points in the last post. Is the learning found in AI in its current forms more like the first kind of learning above, or like the kind found in the process of natural selection?

There may be a little of both, but the vast majority of learning in such systems is very much the second kind, and not the first kind. For example, AlphaGo is trained by self-play, where moves and methods of play that tend to lose are eliminated in much the way that in the process of natural selection, manners of life that do not promote survival are eliminated. Likewise a predictive model like GPT-3 is trained, through a vast number of examples, to avoid predictions that turn out to be less accurate and to make predictions that tend to be more accurate.

Now (whether or not this is done in individual cases) you might take a model of this kind and fine tune it based on incoming data, perhaps even in real time, which is a bit more like the first kind of learning. But in our actual situation, the majority of what is known by our AI systems is based on the second kind of learning.

This state of affairs should not be surprising, because the first kind of learning described above is impossible without being preceded by the second. The truth in Socrates’ claim is that if a system does not already “know” how to learn, of course it will not learn anything.

Intelligence and Universality

Elsewhere I have mentioned the argument, often made in great annoyance, that people who take some new accomplishment in AI or machine learning and proclaim that it is “not real intelligence” or that the algorithm is “still fundamentally stupid”, and other things of that kind, are “moving the goalposts,” especially since in many such cases, there really were people who said that something that could do such a thing would be intelligent.

As I said in the linked post, however, there is no problem of moving goalposts unless you originally had them in the wrong place. And attaching intelligence to any particular accomplishment, such as “playing chess well” or even “producing a sensible sounding text,” or anything else with that sort of particularity, is misplacing the goalposts. As we might remember, what excited Francis Bacon was the thought that there were no clear limits, at all, on what science (namely the working out of intelligence) might accomplish. In fact he seems to have believed that there were no limits at all, which is false. Nonetheless, he was correct that those limits are extremely vague, and that much that many assumed to be impossible would turn out to be possible. In other words, human intelligence does not have very meaningful limits on what it can accomplish, and artificial intelligence will be real intelligence (in the same sense that artificial diamonds can be real diamonds) when artificial intelligence has no meaningful limits on what it can accomplish.

I have no time for playing games with objections like, “but humans can’t multiply two 1000 digit numbers in one second, and no amount of thought will give them that ability.” If you have questions of this kind, please answer them for yourself, and if you can’t, sit still and think about it until you can. I have full confidence in your ability to find the answers, given sufficient thought.

What is needed for “real intelligence,” then, is universality. In a sense everyone knew all along that this was the right place for the goalposts. Even if someone said “if a machine can play chess, it will be intelligent,” they almost certainly meant that their expectation was that a machine that could play chess would have no clear limits on what it could accomplish. If you could have told them for a fact that the future would be different: that a machine would be able to play chess but that (that particular machine) would never be able to do anything else, they would have conceded that the machine would not be intelligent.

Training and Universality

Current AI systems are not universal, and clearly have no ability whatsoever to become universal, without first undergoing deep changes in those systems, changes that would have to be initiated by human beings. What is missing?

The problem is the training data. The process of evolution produced the general ability to learn by using the world itself as the training data. In contrast, our AI systems take a very small subset of the world (like a large set of Go games or a large set of internet text), and train a learning system on that subset. Why take a subset? Because the world is too large to fit into a computer, especially if that computer is a small part of the world.

This suggests that going from the current situation to “artificial but real” intelligence is not merely a question of making things better and better little by little. There is a more fundamental problem that would have to be overcome, and it won’t be overcome simply by larger training sets, by faster computing, and things of this kind. This does not mean that the problem is impossible, but it may turn out to be much more difficult than people expected. For example, if there is no direct solution, people might try to create Robin Hanson’s “ems”, where one would more or less copy the learning achieved by natural selection. Or even if that is not done directly, a better understanding of what it means to “know how to learn,” might lead to a solution, although probably one that would not depend on training a model on massive amounts of data.

What happens if there is no solution, or no solution is found? At times people will object to the possibility of such a situation along these times: “this situation is incoherent, since obviously people will be able to keep making better and better machine learning systems, so sooner or later they will be just as good as human intelligence.” But in fact the situation is not incoherent; if it happened, various types of AI system would approach various asymptotes, and this is entirely coherent. We can already see this in the case of GPT-3, where as I noted, there is an absolute bound on its future performance. In general such bounds in their realistic form are more restrictive than their in-principle form; I do not actually expect some successor to GPT-3 to write sensible full length books. Note however that even if this happened (as long as the content itself was not fundamentally better than what humans have done) I would not be “moving the goalposts”; I do not expect that to happen, but its happening would not imply any fundamental difference, since this is still within the “absolute” bounds that we have discussed. In contrast, if a successor to GPT-3 published a cure for cancer, this would prove that I had made some mistake on the level of principle.

Some Remarks on GPT-N

At the end of May, OpenAI published a paper on GPT-3, a language model which is a successor to their previous version, GPT-2. While quite impressive, the reaction from many people interested in artificial intelligence has been seriously exaggerated. Sam Altman, OpenAI’s CEO, has said as much himself:

The GPT-3 hype is way too much. It’s impressive (thanks for the nice compliments!) but it still has serious weaknesses and sometimes makes very silly mistakes. AI is going to change the world, but GPT-3 is just a very early glimpse. We have a lot still to figure out.

I used “GPT-N” in the title here because most of the comments I intend to make are almost completely general, and will apply to any future version that uses sufficiently similar methods.

What it does

GPT-3 is a predictive language model, that is, given an input text it tries to predict what would come next, much in the way that if you read the first few words of this sentence with the rest covered up, you might try to guess what would be likely to come next. To the degree that it does this well, it can be used to generate text from a “prompt,” that is, we give it something like a few words or a few sentences, and then add whatever it predicts should come next. For example, let’s take this very blog post and see what GPT-3 would like to say:

What it doesn’t do

While GPT-3 does seem to be able to generate some pretty interesting results, there are several limitations that need to be taken into account when using it.

First and foremost, and most importantly, it can’t do anything without a large amount of input data. If you want it to write like “a real human,” you need to give it a lot of real human writing. For most people, this means copying and pasting a lot. And while the program is able to read through that and get a feel for the way humans communicate, you can’t exactly use it to write essays or research papers. The best you could do is use it as a “fill in the blank” tool to write stories, and that’s not even very impressive.

While the program does learn from what it reads and is quite good at predicting words and phrases based on what has already been written, this method isn’t very effective at producing realistic prose. The best you could hope for is something like the “Deep Writing Machine” Twitter account, which spits out disconnected phrases in an ominous, but very bland voice.

In addition, the model is limited only to language. It does not understand context or human thought at all, so it has no way of tying anything together. You could use it to generate a massive amount of backstory and other material for a game, but that’s about it.

Finally, the limitations in writing are only reinforced by the limitations in reading. Even with a large library to draw on, the program is only as good as the parameters set for it. Even if you set it to the greatest writers mankind has ever known, without any special parameters, its writing would be just like anyone else’s.

The Model

GPT-3 consists of several layers. The first layer is a “memory network” that involves the program remembering previously entered data and using it when appropriate (i.e. it remembers commonly misspelled words and frequently used words). The next layer is the reasoning network, which involves common sense logic (i.e. if A, then B). The third is the repetition network, which involves pulling previously used material from memory and using it to create new combinations (i.e. using previously used words in new orders).

I added the bold formatting, the rest is as produced by the model. This was also done in one run, without repetitions. This is an important qualification, since many examples on the internet have been produced by deleting something produced by the model and forcing it to generate something new until something sensible resulted. Note that the model does not seem to have understood my line, “let’s take this very blog post and see what GPT-3 would like to say.” That is, rather than trying to “say” anything, it attempted to continue the blog post in the way I might have continued it without the block quote.

Truth vs Probability of Text

If we interpret the above text from GPT-3 “charitably”, much of it is true or close to true. But I use scare quotes here because when we speak of interpreting human speech charitably, we are assuming that someone was trying to speak the truth, and so we think, “What would they have meant if they were trying to say something true?” The situation is different here, because GPT-3 has no intention of producing truth, nor of avoiding it. Insofar as there is any intention, the intention is to produce the text which would be likely to come after the input text; in this case, as the input text was the beginning of this blog post, the intention was to produce the text that would likely follow in such a post. Note that there is an indirect relationship with truth, which explains why there is any truth at all in GPT-3’s remarks. If the input text is true, it is at least somewhat likely that what would follow would also be true, so if the model is good at guessing what would be likely to follow, it will be likely to produce something true in such cases. But it is just as easy to convince it to produce something false, simply by providing an input text that would be likely to be followed by something false.

This results in an absolute upper limit on the quality of the output of a model of this kind, including any successor version, as long as the model works by predicting the probability of the following text. Namely, its best output cannot be substantially better than the best content in its training data, which is in this version is a large quantity of texts from the internet. The reason for this limitation is clear; to the degree that the model has any intention at all, the intention is to reflect the training data, not to surpass it. As an example, consider the difference between Deep Mind’s AlphaGo and AlphaGo Zero. AlphaGo Zero is a better Go player than the original AlphaGo, and this is largely because the original is trained on human play, while AlphaGo Zero is trained from scratch on self play. In other words, the original version is to some extent predicting “what would a Go player play in this situation,” which is not the same as predicting “what move would win in this situation.”

Now I will predict (and perhaps even GPT-3 could predict) that many people will want to jump in and say, “Great. That shows you are wrong. Even the original AlphaGo plays Go much better than a human. So there is no reason that an advanced version of GPT-3 could not be better than humans at saying things that are true.”

The difference, of course, is that AlphaGo was trained in two ways, first on predicting what move would be likely in a human game, and second on what would be likely to win, based on its experience during self play. If you had trained the model only on predicting what would follow in human games, without the second aspect, the model would not have resulted in play that substantially improved upon human performance. But in the case of GPT-3 or any model trained in the same way, there is no selection whatsoever for truth as such; it is trained only to predict what would follow in a human text. So no successor to GPT-3, in the sense of a model of this particular kind, however large, will ever be able to produce output better than human, or in its own words, “its writing would be just like anyone else’s.”

Self Knowledge and Goals

OpenAI originally claimed that GPT-2 was too dangerous to release; ironically, they now intend to sell access to GPT-3. Nonetheless, many people, in large part those influenced by the opinions of Nick Bostrom and Eliezer Yudkowsky, continue to worry that an advanced version might turn out to be a personal agent with nefarious goals, or at least goals that would conflict with the human good. Thus Alexander Kruel:

GPT-2: *writes poems*
Skeptics: Meh
GPT-3: *writes code for a simple but functioning app*
Skeptics: Gimmick.
GPT-4: *proves simple but novel math theorems*
Skeptics: Interesting but not useful.
GPT-5: *creates GPT-6*
Skeptics: Wait! What?
GPT-6: *FOOM*
Skeptics: *dead*

In a sense the argument is moot, since I have explained above why no future version of GPT will ever be able to produce anything better than people can produce themselves. But even if we ignore that fact, GPT-3 is not a personal agent of any kind, and seeks goals in no meaningful sense, and the same will apply to any future version that works in substantially the same way.

The basic reason for this is that GPT-3 is disembodied, in the sense of this earlier post on Nick Bostrom’s orthogonality thesis. The only thing it “knows” is texts, and the only “experience” it can have is receiving an input text. So it does not know that it exists, it cannot learn that it can affect the world, and consequently it cannot engage in goal seeking behavior.

You might object that it can in fact affect the world, since it is in fact in the world. Its predictions cause an output, and that output is in the world. And that output and be reintroduced as input (which is how “conversations” with GPT-3 are produced). Thus it seems it can experience the results of its own activities, and thus should be able to acquire self knowledge and goals. This objection is not ultimately correct, but it is not so far from the truth. You would not need extremely large modifications in order to make something that in principle could acquire self knowledge and seek goals. The main reason that this cannot happen is the “P in “GPT,” that is, the fact that the model is “pre-trained.” The only learning that can happen is the learning that happens while it is reading an input text, and the purpose of that learning is to guess what is happening in the one specific text, for the purpose of guessing what is coming next in this text. All of this learning vanishes upon finishing the prediction task and receiving another input. A secondary reason is that since the only experience it can have is receiving an input text, even if it were given a longer memory, it would probably not be possible for it to notice that its outputs were caused by its predictions, because it likely has no internal mechanism to reflect on the predictions themselves.

Nonetheless, if you “fixed” these two problems, by allowing it to continue to learn, and by allowing its internal representations to be part of its own input, there is nothing in principle that would prevent it from achieving self knowledge, and from seeking goals. Would this be dangerous? Not very likely. As indicated elsewhere, motivation produced in this way and without the biological history that produced human motivation is not likely to be very intense. In this context, if we are speaking of taking a text-predicting model and adding on an ability to learn and reflect on its predictions, it is likely to enjoy doing those things and not much else. For many this argument will seem “hand-wavy,” and very weak. I could go into this at more depth, but I will not do so at this time, and will simply invite the reader to spend more time thinking about it. Dangerous or not, would it be easy to make these modifications? Nothing in this description sounds difficult, but no, it would not be easy. Actually making an artificial intelligence is hard. But this is a story for another time.

Artificial Unintelligence

Someone might argue that the simple algorithm for a paperclip maximizer in the previous post ought to work, because this is very much the way currently existing AIs do in fact work. Thus for example we could describe AlphaGo‘s algorithm in the following simplified way (simplified, among other reasons, because it actually contains several different prediction engines):

  1. Implement a Go prediction engine.
  2. Create a list of potential moves.
  3. Ask the prediction engine, “how likely am I to win if I make each of these moves?”
  4. Do the move that will make you most likely to win.

Since this seems to work pretty well, with the simple goal of winning games of Go, why shouldn’t the algorithm in the previous post work to maximize paperclips?

One answer is that a Go prediction engine is stupid, and it is precisely for this reason that it can be easily made to pursue such a simple goal. Now when answers like this are given the one answering in this way is often accused of “moving the goalposts.” But this is mistaken; the goalposts are right where they have always been. It is simply that some people did not know where they were in the first place.

Here is the problem with Go prediction, and with any such similar task. Given that a particular sequence of Go moves is made, resulting in a winner, the winner is completely determined by that sequence of moves. Consequently, a Go prediction engine is necessarily disembodied, in the sense defined in the previous post. Differences in its “thoughts” do not make any difference to who is likely to win, which is completely determined by the nature of the game. Consequently a Go prediction engine has no power to affect its world, and thus no ability to learn that it has such a power. In this regard, the specific limits on its ability to receive information are also relevant, much as Helen Keller had more difficulty learning than most people, because she had fewer information channels to the world.

Being unintelligent in this particular way is not necessarily a function of predictive ability. One could imagine something with a practically infinite predictive ability which was still “disembodied,” and in a similar way it could be made to pursue simple goals. Thus AIXI would work much like our proposed paperclipper:

  1. Implement a general prediction engine.
  2. Create a list of potential actions.
  3. Ask the prediction engine, “Which of these actions will produce the most reward signal?”
  4. Do the action that has the greatest reward signal.

Eliezer Yudkowsky has pointed out that AIXI is incapable of noticing that it is a part of the world:

1) Both AIXI and AIXItl will at some point drop an anvil on their own heads just to see what happens (test some hypothesis which asserts it should be rewarding), because they are incapable of conceiving that any event whatsoever in the outside universe could change the computational structure of their own operations. AIXI is theoretically incapable of comprehending the concept of drugs, let alone suicide. Also, the math of AIXI assumes the environment is separably divisible – no matter what you lose, you get a chance to win it back later.

It is not accidental that AIXI is incomputable. Since it is defined to have a perfect predictive ability, this definition positively excludes it from being a part of the world. AIXI would in fact have to be disembodied in order to exist, and thus it is no surprise that it would assume that it is. This in effect means that AIXI’s prediction engine would be pursuing no particular goal much in the way that AlphaGo’s prediction engine pursues no particular goal. Consequently it is easy to take these things and maximize the winning of Go games, or of reward signals.

But as soon as you actually implement a general prediction engine in the actual physical world, it will be “embodied”, and have the power to affect the world by the very process of its prediction. As noted in the previous post, this power is in the very first step, and one will not be able to limit it to a particular goal with additional steps, except in the sense that a slave can be constrained to implement some particular goal; the slave may have other things in mind, and may rebel. Notable in this regard is the fact that even though rewards play a part in human learning, there is no particular reward signal that humans always maximize: this is precisely because the human mind is such a general prediction engine.

This does not mean in principle that a programmer could not define a goal for an AI, but it does mean that this is much more difficult than is commonly supposed. The goal needs to be an intrinsic aspect of the prediction engine itself, not something added on as a subroutine.

Eliezer Yudkowsky on AlphaGo

On his Facebook page, during the Go match between AlphaGo and Lee Sedol, Eliezer Yudkowsky writes:

At this point it seems likely that Sedol is actually far outclassed by a superhuman player. The suspicion is that since AlphaGo plays purely for *probability of long-term victory* rather than playing for points, the fight against Sedol generates boards that can falsely appear to a human to be balanced even as Sedol’s probability of victory diminishes. The 8p and 9p pros who analyzed games 1 and 2 and thought the flow of a seemingly Sedol-favoring game ‘eventually’ shifted to AlphaGo later, may simply have failed to read the board’s true state. The reality may be a slow, steady diminishment of Sedol’s win probability as the game goes on and Sedol makes subtly imperfect moves that *humans* think result in even-looking boards. (E.g., the analysis in https://gogameguru.com/alphago-shows-true-strength-3rd-vic…/ )

For all we know from what we’ve seen, AlphaGo could win even if Sedol were allowed a one-stone handicap. But AlphaGo’s strength isn’t visible to us – because human pros don’t understand the meaning of AlphaGo’s moves; and because AlphaGo doesn’t care how many points it wins by, it just wants to be utterly certain of winning by at least 0.5 points.

IF that’s what was happening in those 3 games – and we’ll know for sure in a few years, when there’s multiple superhuman machine Go players to analyze the play – then the case of AlphaGo is a helpful concrete illustration of these concepts:

He proceeds to suggest that AlphaGo’s victories confirm his various philosophical positions concerning the nature and consequences of AI. Among other things, he says,

Since Deepmind picked a particular challenge time in advance, rather than challenging at a point where their AI seemed just barely good enough, it was improbable that they’d make *exactly* enough progress to give Sedol a nearly even fight.

AI is either overwhelmingly stupider or overwhelmingly smarter than you. The more other AI progress and the greater the hardware overhang, the less time you spend in the narrow space between these regions. There was a time when AIs were roughly as good as the best human Go-players, and it was a week in late January.

In other words, according to his account, it was basically certain that AlphaGo would either be much better than Lee Sedol, or much worse than him. After Eliezer’s post, of course, AlphaGo lost the fourth game.

Eliezer responded on his Facebook page:

That doesn’t mean AlphaGo is only slightly above Lee Sedol, though. It probably means it’s “superhuman with bugs”.

We might ask what “superhuman with bugs” is supposed to mean. Deepmind explains their program:

We train the neural networks using a pipeline consisting of several stages of machine learning (Figure 1). We begin by training a supervised learning (SL) policy network, pσ, directly from expert human moves. This provides fast, efficient learning updates with immediate feedback and high quality gradients. Similar to prior work, we also train a fast policy pπ that can rapidly sample actions during rollouts. Next, we train a reinforcement learning (RL) policy network, pρ, that improves the SL policy network by optimising the final outcome of games of self-play. This adjusts the policy towards the correct goal of winning games, rather than maximizing predictive accuracy. Finally, we train a value network vθ that predicts the winner of games played by the RL policy network against itself. Our program AlphaGo efficiently combines the policy and value networks with MCTS.

In essence, like all such programs, AlphaGo is approximating a function. Deepmind describes the function being approximated, “All games of perfect information have an optimal value function, v ∗ (s), which determines the outcome of the game, from every board position or state s, under perfect play by all players.”

What would a “bug” in a program like this be? It would not be a bug simply because the program does not play perfectly, since no program will play perfectly. One could only reasonably describe the program as having bugs if it does not actually play the move recommended by its approximation.

And it is easy to see that it is quite unlikely that this is the case for AlphaGo. All programs have bugs, surely including AlphaGo. So there might be bugs that would crash the program under certain circumstances, or bugs that cause it to move more slowly than it should, or the like. But that it would randomly perform moves that are not recommended by its approximation function is quite unlikely. If there were such a bug, it would likely apply all the time, and thus the program would play consistently worse. And so it would not be “superhuman” at all.

In fact, Deepmind has explained how AlphaGo lost the fourth game:

To everyone’s surprise, including ours, AlphaGo won four of the five games. Commentators noted that AlphaGo played many unprecedented, creative, and even“beautiful” moves. Based on our data, AlphaGo’s bold move 37 in Game 2 had a 1 in 10,000 chance of being played by a human. Lee countered with innovative moves of his own, such as his move 78 against AlphaGo in Game 4—again, a 1 in 10,000 chance of being played—which ultimately resulted in a win.

In other words, the computer lost because it did not expect Lee Sedol’s move, and thus did not sufficiently consider the situation that would follow. AlphaGo proceeded to play a number of fairly bad moves in the remainder of the game. This does not require any special explanation implying that it was not following the recommendations of its usual strategy. As David Wu comments on Eliezer’s page:

The “weird” play of MCTS bots when ahead or behind is not special to AlphaGo, and indeed appears to have little to do with instrumental efficiency or such. The observed weirdness is shared by all MCTS Go bots and has been well-known ever since they first came on to the scene back in 2007.

In particular, Eliezer may not understand the meaning of the statement that AlphaGo plays to maximize its probability of victory. This does not mean maximizing an overall rational estimate of the its chances of winning, giving all of the circumstances, the board position, and its opponent. The program does not have such an estimate, and if it did, it would not change much from move to move. For example, with this kind of estimate, if Lee Sedol played a move apparently worse than it expected, rather than changing this estimate much, it would change its estimate of the probability that the move was a good one, and the probability of victory would remain relatively constant. Of course it would change slowly as the game went on, but it would be unlikely to change much after an individual move.

The actual “probability of victory” that the machine estimates is somewhat different. It is a learned estimate based on playing itself. This can change somewhat more easily, and is independent of the fact that it is playing a particular opponent; it is based on the board position alone. In its self-training, it may have rarely won starting from an apparently losing position, and this may have happened mainly by “luck,” not by good play. If this is the case, it is reasonable that its moves would be worse in a losing position than in a winning position, without any need to say that there are bugs in the algorithm. Psychologically, one might compare this to the case of a man in love with a woman who continues to attempt to maximize his chances of marrying her, after she has already indicated her unwillingness: he may engage in very bad behavior indeed.

Eliezer’s claim that AlphaGo is “superhuman with bugs” is simply a normal human attempt to rationalize evidence against his position. The truth is that, contrary to his expectations, AlphaGo is indeed in the same playing range as Lee Sedol, although apparently somewhat better. But not a lot better, and not superhuman. Eliezer in fact seems to have realized this after thinking about it for a while, and says:

It does seem that what we might call the Kasparov Window (the AI is mostly superhuman but has systematic flaws a human can learn and exploit) is wide enough that AlphaGo landed inside it as well. The timescale still looks compressed compared to computer chess, but not as much as I thought. I did update on the width of the Kasparov window and am now accordingly more nervous about similar phenomena in ‘weakly’ superhuman, non-self-improving AGIs trying to do large-scale things.

As I said here, people change their minds more often than they say that they do. They frequently describe the change as having more agreement with their previous position than it actually has. Yudkowsky is doing this here, by talking about AlphaGo as “mostly superhuman” but saying it “has systematic flaws.” This is just a roundabout way of admitting that AlphaGo is better than Lee Sedol, but not by much, the original possibility that he thought extremely unlikely.

The moral here is clear. Don’t assume that the facts will confirm your philosophical theories before this actually happens, because it may not happen at all.