Someone might argue that the simple algorithm for a paperclip maximizer in the previous post ought to work, because this is very much the way currently existing AIs do in fact work. Thus for example we could describe AlphaGo‘s algorithm in the following simplified way (simplified, among other reasons, because it actually contains several different prediction engines):
- Implement a Go prediction engine.
- Create a list of potential moves.
- Ask the prediction engine, “how likely am I to win if I make each of these moves?”
- Do the move that will make you most likely to win.
Since this seems to work pretty well, with the simple goal of winning games of Go, why shouldn’t the algorithm in the previous post work to maximize paperclips?
One answer is that a Go prediction engine is stupid, and it is precisely for this reason that it can be easily made to pursue such a simple goal. Now when answers like this are given the one answering in this way is often accused of “moving the goalposts.” But this is mistaken; the goalposts are right where they have always been. It is simply that some people did not know where they were in the first place.
Here is the problem with Go prediction, and with any such similar task. Given that a particular sequence of Go moves is made, resulting in a winner, the winner is completely determined by that sequence of moves. Consequently, a Go prediction engine is necessarily disembodied, in the sense defined in the previous post. Differences in its “thoughts” do not make any difference to who is likely to win, which is completely determined by the nature of the game. Consequently a Go prediction engine has no power to affect its world, and thus no ability to learn that it has such a power. In this regard, the specific limits on its ability to receive information are also relevant, much as Helen Keller had more difficulty learning than most people, because she had fewer information channels to the world.
Being unintelligent in this particular way is not necessarily a function of predictive ability. One could imagine something with a practically infinite predictive ability which was still “disembodied,” and in a similar way it could be made to pursue simple goals. Thus AIXI would work much like our proposed paperclipper:
- Implement a general prediction engine.
- Create a list of potential actions.
- Ask the prediction engine, “Which of these actions will produce the most reward signal?”
- Do the action that has the greatest reward signal.
Eliezer Yudkowsky has pointed out that AIXI is incapable of noticing that it is a part of the world:
1) Both AIXI and AIXItl will at some point drop an anvil on their own heads just to see what happens (test some hypothesis which asserts it should be rewarding), because they are incapable of conceiving that any event whatsoever in the outside universe could change the computational structure of their own operations. AIXI is theoretically incapable of comprehending the concept of drugs, let alone suicide. Also, the math of AIXI assumes the environment is separably divisible – no matter what you lose, you get a chance to win it back later.
It is not accidental that AIXI is incomputable. Since it is defined to have a perfect predictive ability, this definition positively excludes it from being a part of the world. AIXI would in fact have to be disembodied in order to exist, and thus it is no surprise that it would assume that it is. This in effect means that AIXI’s prediction engine would be pursuing no particular goal much in the way that AlphaGo’s prediction engine pursues no particular goal. Consequently it is easy to take these things and maximize the winning of Go games, or of reward signals.
But as soon as you actually implement a general prediction engine in the actual physical world, it will be “embodied”, and have the power to affect the world by the very process of its prediction. As noted in the previous post, this power is in the very first step, and one will not be able to limit it to a particular goal with additional steps, except in the sense that a slave can be constrained to implement some particular goal; the slave may have other things in mind, and may rebel. Notable in this regard is the fact that even though rewards play a part in human learning, there is no particular reward signal that humans always maximize: this is precisely because the human mind is such a general prediction engine.
This does not mean in principle that a programmer could not define a goal for an AI, but it does mean that this is much more difficult than is commonly supposed. The goal needs to be an intrinsic aspect of the prediction engine itself, not something added on as a subroutine.