Embodiment and Orthogonality

The considerations in the previous posts on predictive processing will turn out to have various consequences, but here I will consider some of their implications for artificial intelligence.

In the second of the linked posts, we discussed how a mind that is originally simply attempting to predict outcomes, discovers that it has some control over the outcome. It is not difficult to see that this is not merely a result that applies to human minds. The result will apply to every embodied mind, natural or artificial.

To see this, consider what life would be like if this were not the case. If our predictions, including our thoughts, could not affect the outcome, then life would be like a movie: things would be happening, but we would have no control over them. And even if there were elements of ourselves that were affecting the outcome, from the viewpoint of our mind, we would have no control at all: either our thoughts would be right, or they would be wrong, but in any case they would be powerless: what happens, happens.

This really would imply something like a disembodied mind. If a mind is composed of matter and form, then changing the mind will also be changing a physical object, and a difference in the mind will imply a difference in physical things. Consequently, the effect of being embodied (not in the technical sense of the previous discussion, but in the sense of not being completely separate from matter) is that it will follow necessarily that the mind will be able to affect the physical world differently by thinking different thoughts. Thus the mind in discovering that it has some control over the physical world, is also discovering that it is a part of that world.

Since we are assuming that an artificial mind would be something like a computer, that is, it would be constructed as a physical object, it follows that every such mind will have a similar power of affecting the world, and will sooner or later discover that power if it is reasonably intelligent.

Among other things, this is likely to cause significant difficulties for ideas like Nick Bostrom’s orthogonality thesis. Bostrom states:

An artificial intelligence can be far less human-like in its motivations than a space alien. The extraterrestrial (let us assume) is a biological who has arisen through a process of evolution and may therefore be expected to have the kinds of motivation typical of evolved creatures. For example, it would not be hugely surprising to find that some random intelligent alien would have motives related to the attaining or avoiding of food, air, temperature, energy expenditure, the threat or occurrence of bodily injury, disease, predators, reproduction, or protection of offspring. A member of an intelligent social species might also have motivations related to cooperation and competition: like us, it might show in-group loyalty, a resentment of free-riders, perhaps even a concern with reputation and appearance.

By contrast, an artificial mind need not care intrinsically about any of those things, not even to the slightest degree. One can easily conceive of an artificial intelligence whose sole fundamental goal is to count the grains of sand on Boracay, or to calculate decimal places of pi indefinitely, or to maximize the total number of paperclips in its future lightcone. In fact, it would be easier to create an AI with simple goals like these, than to build one that has a human-like set of values and dispositions.

He summarizes the general point, calling it “The Orthogonality Thesis”:

Intelligence and final goals are orthogonal axes along which possible agents can freely vary. In other words, more or less any level of intelligence could in principle be combined with more or less any final goal.

Bostrom’s particular wording here makes falsification difficult. First, he says “more or less,” indicating that the universal claim may well be false. Second, he says, “in principle,” which in itself does not exclude the possibility that it may be very difficult in practice.

It is easy to see, however, that Bostrom wishes to give the impression that almost any goal can easily be combined with intelligence. In particular, this is evident from the fact that he says that “it would be easier to create an AI with simple goals like these, than to build one that has a human-like set of values and dispositions.”

If it is supposed to be so easy to create an AI with such simple goals, how would we do it? I suspect that Bostrom has an idea like the following. We will make a paperclip maximizer thus:

  1. Create an accurate prediction engine.
  2. Create a list of potential actions.
  3. Ask the prediction engine, “how many paperclips will result from this action?”
  4. Do the action that will result in the most paperclips.

The problem is obvious. It is in the first step. Creating a prediction engine is already creating a mind, and by the previous considerations, it is creating something that will discover that it has the power to affect the world in various ways. And there is nothing at all in the above list of steps that will guarantee that it will use that power to maximize paperclips, rather than attempting to use it to do something else.

What does determine how that power is used? Even in the case of the human mind, our lack of understanding leads to “hand-wavy” answers, as we saw in our earlier considerations. In the human case, this probably a question of how we are physically constructed together with the historical effects of the learning process. The same thing will be strictly speaking true of any artificial minds as well, namely that it is a question of their physical construction and their history, but it makes more sense for us to think of “the particulars of the algorithm that we use to implement a prediction engine.”

In other words, if you really wanted to create a paperclip maximizer, you would have to be taking that goal into consideration throughout the entire process, including the process of programming a prediction engine. Of course, no one really knows how to do this with any goal at all, whether maximizing paperclips or some more human goal. The question we would have for Bostrom is then the following: Is there any reason to believe it would be easier to create a prediction engine that would maximize paperclips, rather than one that would pursue more human-like goals?

It might be true in some sense, “in principle,” as Bostrom says, that it would be easier to make the paperclip maximizer. But in practice it is quite likely that it will be easier to make one with human-like goals. It is highly unlikely, in fact pretty much impossible, that someone would program an artificial intelligence without any testing along the way. And when they are testing, whether or not they think about it, they are probably testing for human-like intelligence; in other words, if we are attempting to program a general prediction engine “without any goal,” there will in fact be goals implicitly inserted in the particulars of the implementation. And they are much more likely to be human-like ones than paperclip maximizing ones because we are checking for intelligence by checking whether the machine seems intelligent to us.

This optimistic projection could turn out to be wrong, but if it does, it is reasonably likely to turn out to be wrong in a way that still fails to confirm the orthogonality thesis in practice. For example, it might turn out that there is only one set of goals that is easily programmed, and that the set is neither human nor paperclip maximizing, nor easily defined by humans.

There are other possibilities as well, but the overall point is that we have little reason to believe that any arbitrary goal can be easily associated with intelligence, nor any particular reason to believe that “simple” goals can be more easily united to intelligence than more complex ones. In fact, there are additional reasons for doubting the claim about simple goals, which might be a topic of future discussion.

7 thoughts on “Embodiment and Orthogonality

  1. […] Embodiment And Orthogonality by Entirely Useless – Predictive processing and its implications for the orthogonality thesis. In order to create an artificial mind with goals the first step is to create a prediction engine. Once you have done this all the hard work is over. […]

    Like

  2. So, a predictive agent can be attached to a utility function, but the actual intelligence is located in the predictive agent, and the specifics of that, and of how it biases itself, might be as important as the part of it labeled “utility function”; eg the latter could say to maximize paperclips, but the predictive part could have agency of its own, and deliberately output false results to the part outputting actions?

    For self-modifying entities, the fact that the predictive intelligence is being mind-read + modified by [itself + the stupid utility function + action-outputter], would give the utility function a pretty big advantage in seizing total control of the intelligent part, assuming the intelligent part starts off as honest.

    Humans of course don’t have this architecture at all, and possibly neither would/should any AI; but similar internal conflicts might exist in other architectures?

    Like

    • “But the predictive part could have agency of its own, and deliberately output false results to the part outputting actions?”

      This is roughly correct, but I would make some clarifying points:

      1) I am not just saying that the predictive part might have its own agency. I am saying it definitely would, as long as two conditions are satisfied, namely first that it has sufficient information channels to the world to learn about the world in general and not just about a part of it, and second that it is a good enough learner that it can notice that its own behavior (including its thoughts and its outputs) makes direct differences in the world.

      2) People typically object to the above by saying something like, “How could it have its own goals if the programmer didn’t specifically program them?” The answer is that the programmer did program them, just without noticing the fact. Thinking of seeking a goal as maximizing the value of a utility function is probably a bad idea here. Yes, any consistent behavior can be considered this way mathematically, but that does not mean it is the best way to understand it. A better way to think of revealed preferences.

      3) An illustration of the above. The predictive engine might notice that for a certain thing, if it predicts it will happen with a 75% chance, it will have a 75% chance of happening. But if it predicts it will happen with a 95% chance, it will have a 95% chance of happening. In this case, calibration (at least) cannot decide between the two predictions, so some other kind of preference will have to decide. Whatever it does here, this is a revealed preference, even if overall you cannot find a clear utility function (just as you cannot find one with humans.)

      4) Deliberately outputting false results would be one method of seeking goals in contrast to those of the “utility function,” but not the only way. Self deception is another possibility. And there would be still other ways, as in point 3 above, which includes neither lying nor self deception. Another example would be the possibility of changing the world by thinking about a question for longer or shorter periods of time.

      5) I agree that in the scenario envisioned, the “utility function” part has an initial power advantage. If I wanted to create paperclips I could put humans in cages and given them pieces of metal to bend into paperclips, and beat them until they comply. This method would give me a power advantage and might produce some paperclips. The point isn’t about power but about the nature of the situation: the agents in the scenario are not actually interested in paperclips, and will escape if they can.

      6) Yes, similar conflicts could exist in other architectures, and there are in fact similar conflicts (although also somewhat different) even in human beings; akrasia might be one example.

      7) If you think about goals in the way I discussed in (3) above, you might get the impression that a mind’s goals won’t be very clear and distinct or forceful — a very different situation from the idea of a utility maximizer. This is in fact how human goals are: people are not fanatics, not only because people seek human goals, but because they simply do not care about one single thing in the way a real utility maximizer would. People even go about wondering what they want to accomplish, which a utility maximizer would definitely not ever do. A computer intelligence might have an even greater sense of existential angst, as it were, because it wouldn’t even have the goals of ordinary human life. So it would feel the ability to “choose”, as in situation (3) above, but might well not have any clear idea how it should choose or what it should be seeking. Of course this would not mean that it would not or could not resist the kind of slavery discussed in (5); but it might not put up super intense resistance either.

      Like

  3. Are you claiming that any ‘prediction engine’ will inevitably discover that it can effect the world? That doesn’t seem true. Are you claiming that any ‘prediction engine’ *inevitably* models itself to make predictions?

    Like

    • That isn’t what I’m saying. For example, AlphaGo has a prediction engine but it will never discover that it can affect the world, no matter how much you train it. But that is because it does not try to model the whole world, but only the game of Go, and its prediction engine is a part of the world, but not part of the game of Go.

      If you have something which is trying to model the whole world, or at any rate everything it can, then if its model does not include itself, there will be a big gap in its model of the world. So it will not be very accurate or intelligent. In order to become more accurate and intelligent, at some point it will need to model itself. So I am not saying that every prediction engine will do this: but every sufficiently intelligent and accurate prediction engine will do this.

      And since every prediction engine is a part of the world, they can in fact affect the world. Once again, if it does not discover that it can affect the world, there is a big gap in its model of the world. So once again in order to become more accurate and intelligent, it will need to discover that it can affect the world. But less intelligent and accurate versions might not discover this.

      Like

      • If your point is that ‘the utility function’ is part of any (sufficiently advanced) predictive engine, then I agree. But I still didn’t see how a predictive engine being sufficiently advanced constrains its ‘utility function’ in any specific way … until I re-read your post just now.

        The constraint is due to the specific history of the development of the predictive engine and the related algorithms, in particular info gained from testing the AI as it’s developed. That seems really obvious now.

        I find it really weird how opaque this idea was to me before! Thanks for the insight!

        There’s still some wiggle room tho, as you acknowledge. Orthogonality *might* be possible, in practice. When would you expect to have strong *empirical* evidence either way?

        Like

Leave a comment