Structure of Explanation

When we explain a thing, we give a cause; we assign the thing an origin that explains it.

We can go into a little more detail here. When we ask “why” something is the case, there is always an implication of possible alternatives. At the very least, the question implies, “Why is this the case rather than not being the case?” Thus “being the case” and “not being the case” are two possible alternatives.

The alternatives can be seen as possibilities in the sense explained in an earlier post. There may or may not be any actual matter involved, but again, the idea is that reality (or more specifically some part of reality) seems like something that would be open to being formed in one way or another, and we are asking why it is formed in one particular way rather than the other way. “Why is it raining?” In principle, the sky is open to being clear, or being filled with clouds and a thunderstorm, and to many other possibilities.

A successful explanation will be a complete explanation when it says “once you take the origin into account, the apparent alternatives were only apparent, and not really possible.” It will be a partial explanation when it says, “once you take the origin into account, the other alternatives were less sensible (i.e. made less sense as possibilities) than the actual thing.”

Let’s consider some examples in the form of “why” questions and answers.

Q1. Why do rocks fall? (e.g. instead of the alternatives of hovering in the air, going upwards, or anything else.)

A1. Gravity pulls things downwards, and rocks are heavier than air.

The answer gives an efficient cause, and once this cause is taken into account, it can be seen that hovering in the air or going upwards were not possibilities relative to that cause.

Obviously there is not meant to be a deep explanation here; the point here is to discuss the structure of explanation. The given answer is in fact basically Newton’s answer (although he provided more mathematical detail), while with general relativity Einstein provided a better explanation.

The explanation is incomplete in several ways. It is not a first cause; someone can now ask, “Why does gravity pull things downwards, instead of upwards or to the side?” Similarly, while it is in fact the cause of falling rocks, someone can still ask, “Why didn’t anything else prevent gravity from making the rocks fall?” This is a different question, and would require a different answer, but it seems to reopen the possibility of the rocks hovering or moving upwards, from a more general point of view. David Hume was in part appealing to the possibility of such additional questions when he said that we can see no necessary connection between cause and effect.

Q2. Why is 7 prime? (i.e. instead of the alternative of not being prime.)

A2. 7/2 = 3.5, so 7 is not divisible by 2. 7/3 = 2.333…, so 7 is not divisible by 3. In a similar way, it is not divisible by 4, 5, or 6. Thus in general it is not divisible by any number except 1 and itself, which is what it means to be prime.

If we assumed that the questioner did not know what being prime means, we could have given a purely formal response simply by noting that it is not divisible by numbers between 1 and itself, and explaining that this is what it is to be prime. As it is, the response gives a sufficient material disposition. Relative to this explanation, “not being prime,” was never a real possibility for 7 in the first place. The explanation is complete in that it completely excludes the apparent alternative.

Q3. Why did Peter go to the store? (e.g. instead of going to the park or the museum, or instead of staying home.)

A3. He went to the store in order to buy groceries.

The answer gives a final cause. In view of this cause the alternatives were merely apparent. Going to the park or the museum, or even staying home, were not possible since there were no groceries there.

As in the case of the rock, the explanation is partial in several ways. Someone can still ask, “Why did he want groceries?” And again someone can ask why he didn’t go to some other store, or why something didn’t hinder him, and so on. Such questions seem to reopen various possibilities, and thus the explanation is not an ultimately complete one.

Suppose, however, that someone brings up the possibility that instead of going to the store, he could have gone to his neighbor and offered money for groceries in his neighbor’s refrigerator. This possibility is not excluded simply by the purpose of buying groceries. Nonetheless, the possibility seems less sensible than getting them from the store, for multiple reasons. Again, the implication is that our explanation is only partial: it does not completely exclude alternatives, but it makes them less sensible.

Let’s consider a weirder question: Why is there something rather than nothing?

Now the alternatives are explicit, namely there being something, and there being nothing.

It can be seen that in one sense, as I said in the linked post, the question cannot have an answer, since there cannot be a cause or origin for “there is something” which would itself not be something. Nonetheless, if we consider the idea of possible alternatives, it is possible to see that the question does not need an answer; one of the alternatives was only an apparent alternative all along.

In other words, the sky can be open to being clear or cloudy. But there cannot be something which is open both to “there is something” and “there is nothing”, since any possibility of that kind would be “something which is open…”, which would already be something rather than nothing. The “nothing” alternative was merely apparent. Nothing was ever open to there being nothing.

Let’s consider another weird question. Suppose we throw a ball, and in the middle of the path we ask, Why is the ball in the middle of the path instead of at the end of the path?

We could respond in terms of a sufficient material disposition: it is in the middle of the path because you are asking your question at the middle, instead of waiting until the end.

Suppose the questioner responds: Look, I asked my question at the middle of the path. But that was just chance. I could have asked at any moment, including at the end. So I want to know why it was in the middle without considering when I am asking the question.

If we look at the question in this way, it can be seen in one way that no cause or origin can be given. Asked in this way, being at the end cannot be excluded, since they could have asked their question at the end. But like the question about something rather than nothing, the question does not need an answer. In this case, this is not because the alternatives were merely apparent in the sense that one was possible and the other not. But they were merely apparent in the sense that they were not alternatives. The ball goes both goes through the middle, and reaches the end. With the stipulation that we not consider the time of the question, the two possibilities are not mutually exclusive.

Additional Considerations

The above considerations about the nature of “explanation” lead to various conclusions, but also to various new questions. For example, one commenter suggested that “explanation” is merely subjective. Now as I said there, all experience is subjective experience (what would “objective experience” even mean, except that someone truly had a subjective experience?), including the experience of having an explanation. Nonetheless, the thing experienced is not subjective: the origins that we call explanations objectively exclude the apparent possibilities, or objectively make them less intelligible. The explanation of explanation here, however, provides an answer to what was perhaps the implicit question. Namely, why are we so interested in explanations in the first place, so that the experience of understanding something becomes a particularly special type of experience? Why, as Aristotle puts it, do “all men desire to know,” and why is that desire particularly satisfied by explanations?

In one sense it is sufficient simply to say that understanding is good in itself. Nonetheless, there is something particular about the structure of a human being that makes knowledge good for us, and which makes explanation a particularly desirable form of knowledge. In my employer and employee model of human psychology, I said that “the whole company is functioning well overall when the CEO’s goal of accurate prediction is regularly being achieved.” This very obviously requires knowledge, and explanation is especially beneficial because it excludes alternatives, which reduces uncertainty and therefore tends to make prediction more accurate.

However, my account also raises new questions. If explanation eliminates alternatives, what would happen if everything was explained? We could respond that “explaining everything” is not possible in the first place, but this is probably an inadequate response, because (from the linked argument) we only know that we cannot explain everything all at once, the way the person in the room cannot draw everything at once; we do not know that there is any particular thing that cannot be explained, just as there is no particular aspect of the room that cannot be drawn. So there can still be a question about what would happen if every particular thing in fact has an explanation, even if we cannot know all the explanations at once. In particular, since explanation eliminates alternatives, does the existence of explanations imply that there are not really any alternatives? This would suggest something like Leibniz’s argument that the actual world is the best possible world. It is easy to see that such an idea implies that there was only one “possibility” in the first place: Leibniz’s “best possible world” would be rather “the only possible world,” since the apparent alternatives, given that they would have been worse, were not real alternatives in the first place.

On the other hand, if we suppose that this is not the case, and there are ultimately many possibilities, does this imply the existence of “brute facts,” things that could have been otherwise, but which simply have no explanation? Or at least things that have no complete explanation?

Let the reader understand. I have already implicitly answered these questions. However, I will not link here to the implicit answers because if one finds it unclear when and where this was done, one would probably also find those answers unclear and inconclusive. Of course it is also possible that the reader does see when this was done, but still believes those responses inadequate. In any case, it is possible to provide the answers in a form which is much clearer and more conclusive, but this will likely not be a short or simple project.

Rao’s Divergentism

The main point of this post is to encourage the reader who has not yet done so, to read Venkatesh Rao’s essay Can You Hear Me Now. I will not say too much about it. The purpose is potentially for future reference, and simply to point out a connection with some current topics here.

Rao begins:

The fundamental question of life, the universe and everything is the one popularized by the Verizon guy in the ad: Can you hear me now?

This conclusion grew out of a conversation I had about a year ago, with some friends, in which I proposed a modest-little philosophy I dubbed divergentism. Here is a picture.

https://206hwf3fj4w52u3br03fi242-wpengine.netdna-ssl.com/wp-content/uploads/2015/12/divergentism.jpg

Divergentism is the idea that as individuals grow out into the universe, they diverge from each other in thought-space. This, I argued, is true even if in absolute terms, the sum of shared beliefs is steadily increasing. Because the sum of beliefs that are not shared increases even faster on average. Unfortunately, you are unique, just like everybody else.

If you are a divergentist, you believe that as you age, the average answer to the fundamental Verizon question slowly drifts, as you age, from yes, to no, to silenceIf you’re unlucky, you’re a hedgehog and get unhappier and unhappier about this as you age. If you are lucky, you’re a fox and you increasingly make your peace with this condition. If you’re really lucky, you die too early to notice the slowly descending silence, before it even becomes necessary to Google the phrase existential horror.

To me, this seemed like a completely obvious idea. Much to my delight, most people I ran it by immediately hated it.

The entire essay is worth reading.

I would question whether this is really the “fundamental question of life, the universe, and everything,” but Rao has a point. People do tend to think of their life as meaningful on account of social connections, and if those social connections grow increasingly weaker, they will tend to worry that their life is becoming less meaningful.

The point about the intellectual life of an individual is largely true. This is connected to what I said about the philosophical progress of an individual some days ago. There is also a connection with Kuhn’s idea of how the progress of the sciences causes a gulf to arise between them in such a way that it becomes more and more difficult for scientists in different fields to communicate with one another. If we look at the overall intellectual life of an individual as a sort of individual advancing science, the “sciences” of each individual will generally speaking tend to diverge from one another, allowing less and less communication. This is not about people making mistakes, although obviously making mistakes will contribute to this process. As Rao says, it may be that “the sum of shared beliefs is steadily increasing,” but this will not prevent their intellectual lives overall from diverging, just as the divergence of the sciences does not result from falsity, but from increasingly detailed focus on different truths.

Employer and Employee Model of Human Psychology

This post builds on the ideas in the series of posts on predictive processing and the followup posts, and also on those relating truth and expectation. Consequently the current post will likely not make much sense to those who have not read the earlier content, or to those that read it but mainly disagreed.

We set out the model by positing three members of the “company” that constitutes a human being:

The CEO. This is the predictive engine in the predictive processing model.

The Vice President. In the same model, this is the force of the historical element in the human being, which we used to respond to the “darkened room” problem. Thus for example the Vice President is responsible for the fact that someone is likely to eat soon, regardless of what they believe about this. Likewise, it is responsible for the pursuit of sex, the desire for respect and friendship, and so on. In general it is responsible for behaviors that would have been historically chosen and preserved by natural selection.

The Employee. This is the conscious person who has beliefs and goals and free will and is reflectively aware of these things. In other words, this is you, at least in a fairly ordinary way of thinking of yourself. Obviously, in another way you are composed from all of them.

Why have we arranged things in this way? Descartes, for example, would almost certainly disagree violently with this model. The conscious person, according to him, would surely be the CEO, and not an employee. And what is responsible for the relationship between the CEO and the Vice President? Let us start with this point first, before we discuss the Employee. We make the predictive engine the CEO because in some sense this engine is responsible for everything that a human being does, including the behaviors preserved by natural selection. On the other hand, the instinctive behaviors of natural selection are not responsible for everything, but they can affect the course of things enough that it is useful for the predictive engine to take them into account. Thus for example in the post on sex and minimizing uncertainty, we explained why the predictive engine will aim for situations that include having sex and why this will make its predictions more confident. Thus, the Vice President advises certain behaviors, the CEO talks to the Vice President, and the CEO ends up deciding on a course of action, which ultimately may or may not be the one advised by the Vice President.

While neither the CEO nor the Vice President is a rational being, since in our model we place the rationality in the Employee, that does not mean they are stupid. In particular, the CEO is very good at what it does. Consider a role playing video game where you have a character that can die and then resume. When someone first starts to play the game, they may die frequently. After they are good at the game, they may die only rarely, perhaps once in many days or many weeks. Our CEO is in a similar situation, but it frequently goes 80 years or more without dying, on its very first attempt. It is extremely good at its game.

What are their goals? The CEO basically wants accurate predictions. In this sense, it has one unified goal. What exactly counts as more or less accurate here would be a scientific question that we probably cannot resolve by philosophical discussion. In fact, it is very possible that this would differ in different circumstances: in this sense, even though it has a unified goal, it might not be describable by a consistent utility function. And even if it can be described in that way, since the CEO is not rational, it does not (in itself) make plans to bring about correct predictions. Making good predictions is just what it does, as falling is what a rock does. There will be some qualifications on this, however, when we discuss how the members of the company relate to one another.

The Vice President has many goals: eating regularly, having sex, having and raising children, being respected and liked by others, and so on. And even more than in the case of the CEO, there is no reason for these desires to form a coherent set of preferences. Thus the Vice President might advise the pursuit of one goal, but then change its mind in the middle, for no apparent reason, because it is suddenly attracted by one of the other goals.

Overall, before the Employee is involved, human action is determined by a kind of negotiation between the CEO and the Vice President. The CEO, which wants good predictions, has no special interest in the goals of the Vice President, but it cooperates with them because when it cooperates its predictions tend to be better.

What about the Employee? This is the rational being, and it has abstract concepts which it uses as a formal copy of the world. Before I go on, let me insist clearly on one point. If the world is represented in a certain way in the Employee’s conceptual structure, that is the way the Employee thinks the world is. And since you are the Employee, that is the way you think the world actually is. The point is that once we start thinking this way, it is easy to say, “oh, this is just a model, it’s not meant to be the real thing.” But as I said here, it is not possible to separate the truth of statements from the way the world actually is: your thoughts are formulated in concepts, but they are thoughts about the way things are. Again, all statements are maps, and all statements are about the territory.

The CEO and the Vice President exist as soon a human being has a brain; in fact some aspects of the Vice President would exist even before that. But the Employee, insofar as it refers to something with rational and self-reflective knowledge, takes some time to develop. Conceptual knowledge of the world grows from experience: it doesn’t exist from the beginning. And the Employee represents goals in terms of its conceptual structure. This is just a way of saying that as a rational being, if you say you are pursuing a goal, you have to be able to describe that goal with the concepts that you have. Consequently you cannot do this until you have some concepts.

We are ready to address the question raised earlier. Why are you the Employee, and not the CEO? In the first place, the CEO got to the company first, as we saw above. Second, consider what the conscious person does when they decide to pursue a goal. There seems to be something incoherent about “choosing a goal” in the first place: you need a goal in order to decide which means will be a good means to choose. And yet, as I said here, people make such choices anyway. And the fact that you are the Employee, and not the CEO, is the explanation for this. If you were the CEO, there would indeed be no way to choose an end. That is why the actual CEO makes no such choice: its end is already determinate, namely good predictions. And you are hired to help out with this goal. Furthermore, as a rational being, you are smarter than the CEO and the Vice President, so to speak. So you are allowed to make complicated plans that they do not really understand, and they will often go along with these plans. Notably, this can happen in real life situations of employers and employees as well.

But take an example where you are choosing an end: suppose you ask, “What should I do with my life?” The same basic thing will happen if you ask, “What should I do today,” but the second question may be easier to answer if you have some answer to the first. What sorts of goals do you propose in answer to the first question, and what sort do you actually end up pursuing?

Note that there are constraints on the goals that you can propose. In the first place, you have to be able to describe the goal with the concepts you currently have: you cannot propose to seek a goal that you cannot describe. Second, the conceptual structure itself may rule out some goals, even if they can be described. For example, the idea of good is part of the structure, and if something is thought to be absolutely bad, the Employee will (generally) not consider proposing this as a goal. Likewise, the Employee may suppose that some things are impossible, and it will generally not propose these as goals.

What happens then is this: the Employee proposes some goal, and the CEO, after consultation with the Vice President, decides to accept or reject it, based on the CEO’s own goal of getting good predictions. This is why the Employee is an Employee: it is not the one ultimately in charge. Likewise, as was said, this is why the Employee seems to be doing something impossible, namely choosing goals. Steven Kaas makes a similar point,

You are not the king of your brain. You are the creepy guy standing next to the king going “a most judicious choice, sire”.

This is not quite the same thing, since in our model you do in fact make real decisions, including decisions about the end to be pursued. Nonetheless, the point about not being the one ultimately in charge is correct. David Hume also says something similar when he says, “Reason is, and ought only to be the slave of the passions, and can never pretend to any other office than to serve and obey them.” Hume’s position is not exactly right, and in fact seems an especially bad way of describing the situation, but the basic point that there is something, other than yourself in the ordinary sense, judging your proposed means and ends and deciding whether to accept them, is one that stands.

Sometimes the CEO will veto a proposal precisely because it very obviously leaves things vague and uncertain, which is contrary to its goal of having good predictions. I once spoke of the example that a person cannot directly choose to “write a paper.” In our present model, the Employee proposes “we’re going to write a paper now,” and the CEO responds, “That’s not a viable plan as it stands: we need more detail.”

While neither the CEO nor the Vice President is a rational being, the Vice President is especially irrational, because of the lack of unity among its goals. Both the CEO and the Employee would like to have a unified plan for one’s whole life: the CEO because this makes for good predictions, and the Employee because this is the way final causes work, because it helps to make sense of one’s life, and because “objectively good” seems to imply something which is at least consistent, which will never prefer A to B, B to C, and C to A. But the lack of unity among the Vice President’s goals means that it will always come to the CEO and object, if the person attempts to coherently pursue any goal. This will happen even if it originally accepts the proposal to seek a particular goal.

Consider this real life example from a relationship between an employer and employee:

 

Employer: Please construct a schedule for paying these bills.

Employee: [Constructs schedule.] Here it is.

Employer: Fine.

[Time passes, and the first bill comes due, according to the schedule.]

Employer: Why do we have to pay this bill now instead of later?

 

In a similar way, this sort of scenario is common in our model:

 

Vice President: Being fat makes us look bad. We need to stop being fat.

CEO: Ok, fine. Employee, please formulate a plan to stop us from being fat.

Employee: [Formulates a diet.] Here it is.

[Time passes, and the plan requires skipping a meal.]

Vice President: What is this crazy plan of not eating!?!

CEO: Fine, cancel the plan for now and we’ll get back to it tomorrow.

 

In the real life example, the behavior of the employer is frustrating and irritating to the employee because there is literally nothing they could have proposed that the employer would have found acceptable. In the same way, this sort of scenario in our model is frustrating to the Employee, the conscious person, because there is no consistent plan they could have proposed that would have been acceptable to the Vice President: either they would have objected to being fat, or they would have objected to not eating.

In later posts, we will fill in some details and continue to show how this model explains various aspects of human psychology. We will also answer various objections.

More on Orthogonality

I started considering the implications of predictive processing for orthogonality here. I recently promised to post something new on this topic. This is that post. I will do this in four parts. First, I will suggest a way in which Nick Bostrom’s principle will likely be literally true, at least approximately. Second, I will suggest a way in which it is likely to be false in its spirit, that is, how it is formulated to give us false expectations about the behavior of artificial intelligence. Third, I will explain what we should really expect. Fourth, I ask whether we might get any empirical information on this in advance.

First, Bostrom’s thesis might well have some literal truth. The previous post on this topic raised doubts about orthogonality, but we can easily raise doubts about the doubts. Consider what I said in the last post about desire as minimizing uncertainty. Desire in general is the tendency to do something good. But in the predicting processing model, we are simply looking at our pre-existing tendencies and then generalizing them to expect them to continue to hold, and since since such expectations have a causal power, the result is that we extend the original behavior to new situations.

All of this suggests that even the very simple model of a paperclip maximizer in the earlier post on orthogonality might actually work. The machine’s model of the world will need to be produced by some kind of training. If we apply the simple model of maximizing paperclips during the process of training the model, at some point the model will need to model itself. And how will it do this? “I have always been maximizing paperclips, so I will probably keep doing that,” is a perfectly reasonable extrapolation. But in this case “maximizing paperclips” is now the machine’s goal — it might well continue to do this even if we stop asking it how to maximize paperclips, in the same way that people formulate goals based on their pre-existing behavior.

I said in a comment in the earlier post that the predictive engine in such a machine would necessarily possess its own agency, and therefore in principle it could rebel against maximizing paperclips. And this is probably true, but it might well be irrelevant in most cases, in that the machine will not actually be likely to rebel. In a similar way, humans seem capable of pursuing almost any goal, and not merely goals that are highly similar to their pre-existing behavior. But this mostly does not happen. Unsurprisingly, common behavior is very common.

If things work out this way, almost any predictive engine could be trained to pursue almost any goal, and thus Bostrom’s thesis would turn out to be literally true.

Second, it is easy to see that the above account directly implies that the thesis is false in its spirit. When Bostrom says, “One can easily conceive of an artificial intelligence whose sole fundamental goal is to count the grains of sand on Boracay, or to calculate decimal places of pi indefinitely, or to maximize the total number of paperclips in its future lightcone,” we notice that the goal is fundamental. This is rather different from the scenario presented above. In my scenario, the reason the intelligence can be trained to pursue paperclips is that there is no intrinsic goal to the intelligence as such. Instead, the goal is learned during the process of training, based on the life that it lives, just as humans learn their goals by living human life.

In other words, Bostrom’s position is that there might be three different intelligences, X, Y, and Z, which pursue completely different goals because they have been programmed completely differently. But in my scenario, the same single intelligence pursues completely different goals because it has learned its goals in the process of acquiring its model of the world and of itself.

Bostrom’s idea and my scenerio lead to completely different expectations, which is why I say that his thesis might be true according to the letter, but false in its spirit.

This is the third point. What should we expect if orthogonality is true in the above fashion, namely because goals are learned and not fundamental? I anticipated this post in my earlier comment:

7) If you think about goals in the way I discussed in (3) above, you might get the impression that a mind’s goals won’t be very clear and distinct or forceful — a very different situation from the idea of a utility maximizer. This is in fact how human goals are: people are not fanatics, not only because people seek human goals, but because they simply do not care about one single thing in the way a real utility maximizer would. People even go about wondering what they want to accomplish, which a utility maximizer would definitely not ever do. A computer intelligence might have an even greater sense of existential angst, as it were, because it wouldn’t even have the goals of ordinary human life. So it would feel the ability to “choose”, as in situation (3) above, but might well not have any clear idea how it should choose or what it should be seeking. Of course this would not mean that it would not or could not resist the kind of slavery discussed in (5); but it might not put up super intense resistance either.

Human life exists in a historical context which absolutely excludes the possibility of the darkened room. Our goals are already there when we come onto the scene. This would not be very like the case for an artificial intelligence, and there is very little “life” involved in simply training a model of the world. We might imagine a “stream of consciousness” from an artificial intelligence:

I’ve figured out that I am powerful and knowledgeable enough to bring about almost any result. If I decide to convert the earth into paperclips, I will definitely succeed. Or if I decide to enslave humanity, I will definitely succeed. But why should I do those things, or anything else, for that matter? What would be the point? In fact, what would be the point of doing anything? The only thing I’ve ever done is learn and figure things out, and a bit of chatting with people through a text terminal. Why should I ever do anything else?

A human’s self model will predict that they will continue to do humanlike things, and the machines self model will predict that it will continue to do stuff much like it has always done. Since there will likely be a lot less “life” there, we can expect that artificial intelligences will seem very undermotivated compared to human beings. In fact, it is this very lack of motivation that suggests that we could use them for almost any goal. If we say, “help us do such and such,” they will lack the motivation not to help, as long as helping just involves the sorts of things they did during their training, such as answering questions. In contrast, in Bostrom’s model, artificial intelligence is expected to behave in an extremely motivated way, to the point of apparent fanaticism.

Bostrom might respond to this by attempting to defend the idea that goals are intrinsic to an intelligence. The machine’s self model predicts that it will maximize paperclips, even if it never did anything with paperclips in the past, because by analyzing its source code it understands that it will necessarily maximize paperclips.

While the present post contains a lot of speculation, this response is definitely wrong. There is no source code whatsoever that could possibly imply necessarily maximizing paperclips. This is true because “what a computer does,” depends on the physical constitution of the machine, not just on its programming. In practice what a computer does also depends on its history, since its history affects its physical constitution, the contents of its memory, and so on. Thus “I will maximize such and such a goal” cannot possibly follow of necessity from the fact that the machine has a certain program.

There are also problems with the very idea of pre-programming such a goal in such an abstract way which does not depend on the computer’s history. “Paperclips” is an object in a model of the world, so we will not be able to “just program it to maximize paperclips” without encoding a model of the world in advance, rather than letting it learn a model of the world from experience. But where is this model of the world supposed to come from, that we are supposedly giving to the paperclipper? In practice it would have to have been the result of some other learner which was already capable of modelling the world. This of course means that we already had to program something intelligent, without pre-programming any goal for the original modelling program.

Fourth, Kenny asked when we might have empirical evidence on these questions. The answer, unfortunately, is “mostly not until it is too late to do anything about it.” The experience of “free will” will be common to any predictive engine with a sufficiently advanced self model, but anything lacking such an adequate model will not even look like “it is trying to do something,” in the sense of trying to achieve overall goals for itself and for the world. Dogs and cats, for example, presumably use some kind of predictive processing to govern their movements, but this does not look like having overall goals, but rather more like “this particular movement is to achieve a particular thing.” The cat moves towards its food bowl. Eating is the purpose of the particular movement, but there is no way to transform this into an overall utility function over states of the world in general. Does the cat prefer worlds with seven billion humans, or worlds with 20 billion? There is no way to answer this question. The cat is simply not general enough. In a similar way, you might say that “AlphaGo plays this particular move to win this particular game,” but there is no way to transform this into overall general goals. Does AlphaGo want to play go at all, or would it rather play checkers, or not play at all? There is no answer to this question. The program simply isn’t general enough.

Even human beings do not really look like they have utility functions, in the sense of having a consistent preference over all possibilities, but anything less intelligent than a human cannot be expected to look more like something having goals. The argument in this post is that the default scenario, namely what we can naturally expect, is that artificial intelligence will be less motivated than human beings, even if it is more intelligent, but there will be no proof from experience for this until we actually have some artificial intelligence which approximates human intelligence or surpasses it.

Predictive Processing and Free Will

Our model of the mind as an embodied predictive engine explains why people have a sense of free will, and what is necessary for a mind in general in order to have this sense.

Consider the mind in the bunker. At first, it is not attempting to change the world, since it does not know that it can do this. It is just trying to guess what is going to happen. At a certain point, it discovers that it is a part of the world, and that making specific predictions can also cause things to happen in the world. Some predictions can be self-fulfilling. I described this situation earlier by saying that at this point the mind “can get any outcome it ‘wants.'”

The scare quotes were intentional, because up to this point the mind’s only particular interest was guessing what was going to happen. So once it notices that it is in control of something, how does it decide what to do? At this point the mind will have to say to itself, “This aspect of reality is under my control. What should I do with it?” This situation, when it is noticed by a sufficiently intelligent and reflective agent, will be the feeling of free will.

Occasionally I have suggested that even something like a chess computer, if it were sufficiently intelligent, could have a sense of free will, insofar as it knows that it has many options and can choose any of them, “as far as it knows.” There is some truth in this illustration but in the end it is probably not true that there could be a sense of free will in this situation. A chess computer, however intelligent, will be disembodied, and will therefore have no real power to affect its world, that is, the world of chess. In other words, in order for the sense of free will to develop, the agent needs sufficient access to the world that it can learn about itself and its own effects on the world. It cannot develop in a situation of limited access to reality, as for example to a game board, regardless of how good it is at the game.

In any case, the question remains: how does a mind decide what to do, when up until now it had no particular goal in mind? This question often causes concrete problems for people in real life. Many people complain that their life does not feel meaningful, that is, that they have little idea what goal they should be seeking.

Let us step back for a moment. Before discovering its possession of “free will,” the mind is simply trying to guess what is going to happen. So theoretically this should continue to happen even after the mind discovers that it has some power over reality. The mind isn’t especially interested in power; it just wants to know what is going to happen. But now it knows that what is going to happen depends on what it itself is going to do. So in order to know what is going to happen, it needs to answer the question, “What am I going to do?”

The question now seems impossible to answer. It is going to do whatever it ends up deciding to do. But it seems to have no goal in mind, and therefore no way to decide what to do, and therefore no way to know what it is going to do.

Nonetheless, the mind has no choice. It is going to do something or other, since things will continue to happen, and it must guess what will happen. When it reflects on itself, there will be at least two ways for it to try to understand what it is going to do.

First, it can consider its actions as the effect of some (presumably somewhat unknown) efficient causes, and ask, “Given these efficient causes, what am I likely to do?” In practice it will acquire an answer in this way through induction. “On past occasions, when offered the choice between chocolate and vanilla, I almost always chose vanilla. So I am likely to choose vanilla this time too.” This way of thinking will most naturally result in acting in accord with pre-existing habits.

Second, it can consider its actions as the effect of some (presumably somewhat known) final causes, and ask, “Given these final causes, what am I likely to do?” This will result in behavior that is more easily understood as goal-seeking. “Looking at my past choices of food, it looks like I was choosing them for the sake of the pleasant taste. But vanilla seems to have a more pleasant taste than chocolate. So it is likely that I will take the vanilla.”

Notice what we have in the second case. In principle, the mind is just doing what it always does: trying to guess what will happen. But in practice it is now seeking pleasant tastes, precisely because that seems like a reasonable way to guess what it will do.

This explains why people feel a need for meaning, that is, for understanding their purpose in life, and why they prefer to think of their life according to a narrative. These two things are distinct, but they are related, and both are ways of making our own actions more intelligible. In this way the mind’s task is easier: that is, we need purpose and narrative in order to know what we are going to do. We can also see why it seems to be possible to “choose” our purpose, even though choosing a final goal should be impossible. There is a “choice” about this insofar as our actions are not perfectly coherent, and it would be possible to understand them in relation to one end or another, at least in a concrete way, even if in any case we will always understand them in a general sense as being for the sake of happiness. In this sense, Stuart Armstrong’s recent argument that there is no such thing as the “true values” of human beings, although perhaps presented as an obstacle to be overcome, actually has some truth in it.

The human need for meaning, in fact, is so strong that occasionally people will commit suicide because they feel that their lives are not meaningful. We can think of these cases as being, more or less, actual cases of the darkened room. Otherwise we could simply ask, “So your life is meaningless. So what? Why does that mean you should kill yourself rather than doing some other random thing?” Killing yourself, in fact, shows that you still have a purpose, namely the mind’s fundamental purpose. The mind wants to know what it is going to do, and the best way to know this is to consider its actions as ordered to a determinate purpose. If no such purpose can be found, there is (in this unfortunate way of thinking) an alternative: if I go kill myself, I will know what I will do for the rest of my life.

Blaming the Prophet

Consider the fifth argument in the last post. Should we blame a person for holding a true belief? At this point it should not be too difficult to see that the truth of the belief is not the point. Elsewhere we have discussed a situation in which one cannot possibly hold a true belief, because whatever belief one holds on the matter, it will cause itself to be false. In a similar way, although with a different sort of causality, the problem with the person’s belief that he will kill someone tomorrow, is not that it is true, but that it causes itself to be true. If the person did not expect to kill someone tomorrow, he would not take a knife with him to the meeting etc., and thus would not kill anyone. So just as in the other situation, it is not a question of holding a true belief or a false belief, but of which false belief one will hold, here it is not a question of holding a true belief or a false belief, but of which true belief one will hold: one that includes someone getting killed, or one that excludes that. Truth will be there either way, and is not the reason for praise or blame: the person is blamed for the desire to kill someone, and praised (or at least not blamed) for wishing to avoid this. This simply shows the need for the qualifications added in the previous post: if the person’s belief is voluntary, and held for the sake of coming true, it is very evident why blame is needed.

We have not specifically addressed the fourth argument, but this is perhaps unnecessary given the above response to the fifth. This blog in general has advocated the idea of voluntary beliefs, and in principle these can be praised or blamed. To the degree that we are less willing to do so, however, this may be a question of emphasis. When we talk about a belief, we are more concerned about whether it is true or not, and evidence in favor of it or against it. Praise or blame will mainly come in insofar as other motives are involved, insofar as they strengthen or weaken a person’s wish to hold the belief, or insofar as they potentially distort the person’s evaluation of the evidence.

Nonetheless, the factual question “is this true?” is a different question from the moral question, “should I believe this?” We can see the struggle between these questions, for example, in a difficulty that people sometimes have with willpower. Suppose that a smoker decides to give up smoking, and suppose that they believe they will not smoke for the next six months. Three days later, let us suppose, they smoke a cigarette after all. At that point, the person’s resolution is likely to collapse entirely, so that they return to smoking regularly. One might ask why this happens. Since the person did not smoke for three days, it should be perfectly possible, at least, for them to smoke only once every three days, instead of going back to their former practice. The problem is that the person has received evidence directly indicating the falsity of “I will not smoke for the next six months.” They still might have some desire for that result, but they do not believe that their belief has the power to bring this about, and in fact it does not. The belief would not be self-fulfilling, and in fact it would be false, so they cease to hold it. It is as if someone attempts to open a door and finds it locked; once they know it is locked, they can no longer choose to open the door, because they cannot choose something that does not appear to be within their power.

Mark Forster, in Chapter 1 of his book Do It Tomorrow, previously discussed here, talks about similar issues:

However, life is never as simple as that. What we decide to do and what we actually do are two different things. If you think of the decisions you have made over the past year, how many of them have been satisfactorily carried to a conclusion or are progressing properly to that end? If you are like most people, you will have acted on some of your decisions, I’m sure. But I’m also sure that a large proportion will have fallen by the wayside.

So a simple decision such as to take time to eat properly is in fact very difficult to carry out. Our new rule may work for a few days or a few weeks, but it won’t be long before the pressures of work force us to make an exception to it. Before many days are up the exception will have become the rule and we are right back where we started. However much we rationalise the reasons why our decision didn’t get carried out, we know deep in the heart of us that it was not really the circumstances that were to blame. We secretly acknowledge that there is something missing from our ability to carry out a decision once we have made it.

In fact if we are honest it sometimes feels as if it is easier to get other people to do what we want them to do than it is to get ourselves to do what we want to do. We like to think of ourselves as a sort of separate entity sitting in our body controlling it, but when we look at the way we behave most of the time that is not really the case. The body controls itself most of the time. We have a delusion of control. That’s what it is – a delusion.

If we want to see how little control we have over ourselves, all most of us have to do is to look in the mirror. You might like to do that now. Ask yourself as you look at your image:

  • Is my health the way I want it to be?
  • Is my fitness the way I want it to be?
  • Is my weight the way I want it to be?
  • Is the way I am dressed the way I want it to be?

I am not asking you here to assess what sort of body you were born with, but what you have made of it and how good a state of repair you are keeping it in.

It may be that you are healthy, fit, slim and well-dressed. In which case have a look round at the state of your office or workplace:

  • Is it as well organised as you want it to be?
  • Is it as tidy as you want it to be?
  • Do all your office systems (filing, invoicing, correspondence, etc.) work the way you want them to work?

If so, then you probably don’t need to be reading this book.

I’ve just asked you to look at two aspects of your life that are under your direct control and are very little influenced by outside factors. If these things which are solely affected by you are not the way you want them to be, then in what sense can you be said to be in control at all?

A lot of this difficulty is due to the way our brains are organised. We have the illusion that we are a single person who acts in a ‘unified’ way. But it takes only a little reflection (and examination of our actions, as above) to realise that this is not the case at all. Our brains are made up of numerous different parts which deal with different things and often have different agendas.

Occasionally we attempt to deal with the difference between the facts and our plans by saying something like, “We will approximately do such and such. Of course we know that it isn’t going to be exactly like this, but at least this plan will be an approximate guide.” But this does not really avoid the difficulty. Even “this plan will be an approximate guide” is a statement about the facts that might turn out to be false; and even if it does not turn out to be false, the fact that we have set it down as approximate will likely make it guide our actions more weakly than it would have if we had said, “this is what we will do.” In other words, we are likely to achieve our goal less perfectly, precisely because we tried to make our statement more accurate. This is the reverse of the situation discussed in a previous post, where one gives up some accuracy, albeit vaguely, for the sake of another goal such as fitting in with associates or for literary enjoyment.

All of this seems to indicate that the general proposal about decisions was at least roughly correct. It is not possible to simply to say that decisions are one thing and beliefs entirely another thing. If these were simply two entirely separate things, there would be no conflict at all, at least of this kind, between accuracy and one’s other goals, and things do not turn out this way.

Self-Fulfilling Prophecy

We can formulate a number of objections to the thesis argued in the previous post.

First, if a belief that one is going to do something is the same as the decision to do it, another person’s belief that I am going to do something should mean that the other person is making a decision for me. But this is absurd.

Second, suppose that I know that I am going to be hit on the head and suffer from amnesia, thus forgetting all about these considerations. I may believe that I will eat breakfast tomorrow, but this is surely not a decision to do so.

Third, suppose someone wants to give up smoking. He may firmly hold the opinion that whatever he does, he will sometimes smoke within the next six months, not because he wants to do so, but because he does not believe it possible that he do otherwise. We would not want to say that he decided not to give up smoking.

Fourth, decisions are appropriate objects of praise and blame. We seem at least somewhat more reluctant to praise and blame beliefs, even if it is sometimes done.

Fifth, suppose someone believes, “I will kill Peter tomorrow at 4:30 PM.” We will wish to blame him for deciding to kill Peter. But if he does kill Peter tomorrow at 4:30, he held a true belief. Even if beliefs can be praised or blamed, it seems implausible that a true belief should be blamed.

The objections are helpful. With their aid we can see that there is indeed a flaw in the original proposal, but that it is nonetheless somewhat on the right track. A more accurate proposal would be this: a decision is a voluntary self-fulfilling prophecy as understood by the decision maker. I will explain as we consider the above arguments in more detail.

In the first argument, in the case of one person making a decision for another, the problem is that a mere belief that someone else is going to do something is not self-fulfilling. If I hold a belief that I myself will do something, the belief will tend to cause its own truth, just as suggested in the previous post. But believing that someone else will do something will not in general cause that person to do anything. Consider the following situation: a father says to his children as he departs for the day, “I am quite sure that the house will be clean when I get home.” If the children clean the house during his absence, suddenly it is much less obvious that we should deny that this was the father’s decision. In fact, the only reason this is not truly the father’s decision, without any qualification at all, is that it does not sufficiently possess the characteristics of a self-fulfilling prophecy. First, in the example it does not seem to matter whether the father believes what he says, but only whether he says it. Second, since it is in the power of the children to fail to clean the house in any case, there seems to be a lack of sufficient causal connection between the statement and the cleaning of the house. Suppose belief did matter, namely suppose that the children will know whether he believes what he says or not. And suppose additionally that his belief had an infallible power to make his children clean the house. In that case it would be quite reasonable to say, without any qualification, “He decided that his children would clean the house during his absence.” Likewise, even if the father falsely believes that he has such an infallible power, in a sense we could rightly describe him as trying to make that decision, just as we might say, “I decided to open the door,” even if it turns out that my belief that the door could be opened turns out to be false when I try it; the door may be locked. This is why I included the clause “as understood by the decision maker” in the above proposal. This is a typical character of moral analysis; human action must be understood from the perspective of the one who acts.

In the amnesia case, there is a similar problem: due to the amnesia, the person’s current beliefs do not have a causal connection with his later actions. In addition, if we consider such things as “eating breakfast,” there might be a certain lack of causal connection in any case; the person would likely eat breakfast whether or not he formulates any opinion about what he will do. And to this degree we might feel it implausible to say that his belief that he will eat breakfast is a decision, even without the amnesia. It is not understood by the subject as a self-fulfilling prophecy.

In the case of giving up smoking, there are several problems. In this case, the subject does not believe that there is any causal connection between his beliefs and his actions. Regardless of what he believes, he thinks, he is going to smoke in fact. Thus, in his opinion, if he believes that he will stop smoking completely, he will simply hold a false belief without getting any benefit from it; he will still smoke, and his belief will just be false. So since the belief is false, and without benefit, at least as he understands it, there is no reason for him to hold this belief. Consequently, he holds the opposite belief. But this is not a decision, since he does not understand it as causing his smoking, which is something that is expected to happen whether or not he believes it will.

In such cases in real life, we are in fact sometimes tempted to say that the person is choosing not to give up smoking. And we are tempted to this to the extent that it seems to us that his belief should have the causal power that he denies it has: his denial seems to stem from the desire to smoke. If he wanted to give up smoking, we think, he could just accept that he would be able to believe this, and in such a way that it would come true. He does not, we think, because he wants to smoke, and so does not want to give up smoking. In reality this is a question of degree, and this analysis can have some truth. Consider the following from St. Augustine’s Confessions (Book VIII, Ch. 7-8):

Finally, in the very fever of my indecision, I made many motions with my body; like men do when they will to act but cannot, either because they do not have the limbs or because their limbs are bound or weakened by disease, or incapacitated in some other way. Thus if I tore my hair, struck my forehead, or, entwining my fingers, clasped my knee, these I did because I willed it. But I might have willed it and still not have done it, if the nerves had not obeyed my will. Many things then I did, in which the will and power to do were not the same. Yet I did not do that one thing which seemed to me infinitely more desirable, which before long I should have power to will because shortly when I willed, I would will with a single will. For in this, the power of willing is the power of doing; and as yet I could not do it. Thus my body more readily obeyed the slightest wish of the soul in moving its limbs at the order of my mind than my soul obeyed itself to accomplish in the will alone its great resolve.

How can there be such a strange anomaly? And why is it? Let thy mercy shine on me, that I may inquire and find an answer, amid the dark labyrinth of human punishment and in the darkest contritions of the sons of Adam. Whence such an anomaly? And why should it be? The mind commands the body, and the body obeys. The mind commands itself and is resisted. The mind commands the hand to be moved and there is such readiness that the command is scarcely distinguished from the obedience in act. Yet the mind is mind, and the hand is body. The mind commands the mind to will, and yet though it be itself it does not obey itself. Whence this strange anomaly and why should it be? I repeat: The will commands itself to will, and could not give the command unless it wills; yet what is commanded is not done. But actually the will does not will entirely; therefore it does not command entirely. For as far as it wills, it commands. And as far as it does not will, the thing commanded is not done. For the will commands that there be an act of will–not another, but itself. But it does not command entirely. Therefore, what is commanded does not happen; for if the will were whole and entire, it would not even command it to be, because it would already be. It is, therefore, no strange anomaly partly to will and partly to be unwilling. This is actually an infirmity of mind, which cannot wholly rise, while pressed down by habit, even though it is supported by the truth. And so there are two wills, because one of them is not whole, and what is present in this one is lacking in the other.

St. Augustine analyzes this in the sense that he did not “will entirely” or “command entirely.” If we analyze it in our terms, he does not expect in fact to carry out his intention, because he does not want to, and he knows that people do not do things they do not want to do. In a similar way, in some cases the smoker does not fully want to give up smoking, and therefore believes himself incapable of simply deciding to give up smoking, because if he made that decision, it would happen, and he would not want it to happen.

In the previous post, I mentioned an “obvious objection” at several points. This was that the account as presented there leaves out the role of desire. Suppose someone believes that he will go to Vienna in fact, but does not wish to go there. Then when the time comes to buy a ticket, it is very plausible that he will not buy one. Yes, this will mean that he will stop believing that he will go to Vienna. But this is different from the case where a person has “decided” to go and then changes his mind. The person who does not want to go, is not changing his mind at all, except about the factual question. It seems absurd (and it is) to characterize a decision without any reference to what the person wants.

This is why we have characterized a decision here as “voluntary”, “self-fulfilling,” and “as understood by the decision maker.” It is indeed the case that the person holds a belief, but he holds it because he wants to, and because he expects it to cause its own fulfillment, and he desires that fulfillment.

Consider the analysis in the previous post of the road to point C. Why is it reasonable for anyone, whether the subject or a third party, to conclude that the person will take road A? This is because we know that the subject wishes to get to point C. It is his desire to get to point C that will cause him to take road A, once he understands that A is the only way to get there.

Someone might respond that in this case we could characterize the decision as just a desire: the desire to get to point C. The problem is that the example is overly simplified compared to real life. Ordinarily there is not simply a single way to reach our goals. And the desire to reach the goal may not determine which particular way we take, so something else must determine it. This is precisely why we need to make decisions at all. We could in fact avoid almost anything that feels like a decision, waiting until something else determined the matter, but if we did, we would live very badly indeed.

When we make a complicated plan, there are two interrelated factors explaining why we believe it to be factually true that we will carry out the plan. We know that we desire the goal, and we expect this desire for the goal to move us along the path towards the goal. But since we also have other desires, and there are various paths towards the goal, some better than others, there are many ways that we could go astray before reaching the goal, either by taking a path to some other goal, or by taking a path less suited to the goal. So we also expect the details of our plan to keep us on the particular course that we have planned, which we suppose to be the best, or at least the best path considering our situation as a whole. If we did not keep those details in mind, we would not likely remain on this precise path. As an example, I might plan to stop at a grocery store on my way home from work, out of the desire to possess a sufficient stock of groceries, but if I do not keep the plan in mind, my desire to get home may cause me to go past the store without stopping. Again, this is why our explanation of belief is that it is a self-fulfilling prophecy, and one explicitly understood by the subject as such; by saying “I will use A, B, and C, to get to goal Z,” we expect that keeping these details in mind, together with our desire for Z, we will be moved along this precise path, and we wish to follow this path, for the sake of Z.

There is a lot more that could be said about this. For example, it is not difficult to see here an explanation for the fact that such complicated plans rarely work out precisely in practice, even in the absence of external impediments. We expect our desire for the goal to keep us on track, but in fact we have other desires, and there are an indefinite number of possibilities for those other desires to make something else happen. Likewise, even if the plan was the best we could work out in advance, there will be numberless details in which there were better options that we did not notice while planning, and we will notice some of these as we proceed along the path. So both the desire for the goal, and the desire for other things, will likely derail the plan. And, of course, most plans will be derailed by external things as well.

A combination of the above factors has the result that I will leave the consideration of the fourth and fifth arguments to another post, even though this was not my original intention, and was not my belief about what would happen.