How to Build an Artificial Human

I was going to use “Artificial Intelligence” in the title here but realized after thinking about it that the idea is really more specific than that.

I came up with the idea here while thinking more about the problem I raised in an earlier post about a serious obstacle to creating an AI. As I said there:

Current AI systems are not universal, and clearly have no ability whatsoever to become universal, without first undergoing deep changes in those systems, changes that would have to be initiated by human beings. What is missing?

The problem is the training data. The process of evolution produced the general ability to learn by using the world itself as the training data. In contrast, our AI systems take a very small subset of the world (like a large set of Go games or a large set of internet text), and train a learning system on that subset. Why take a subset? Because the world is too large to fit into a computer, especially if that computer is a small part of the world.

This suggests that going from the current situation to “artificial but real” intelligence is not merely a question of making things better and better little by little. There is a more fundamental problem that would have to be overcome, and it won’t be overcome simply by larger training sets, by faster computing, and things of this kind. This does not mean that the problem is impossible, but it may turn out to be much more difficult than people expected. For example, if there is no direct solution, people might try to create Robin Hanson’s “ems”, where one would more or less copy the learning achieved by natural selection. Or even if that is not done directly, a better understanding of what it means to “know how to learn,” might lead to a solution, although probably one that would not depend on training a model on massive amounts of data.

Proposed Predictive Model

Perhaps I was mistaken in saying that “larger training sets” would not be enough, at any rate enough to get past this basic obstacle. Perhaps it is enough to choose the subset correctly… namely by choosing the subset of the world that we know to contain general intelligence. Instead of training our predictive model on millions of Go games or millions of words, we will train it on millions of human lives.

This project will be extremely expensive. We might need to hire 10 million people to rigorously lifelog for the next 10 years. This has to be done with as much detail as possible; in particular we would want them recording constant audio and visual streams, as well as much else as possible. If we pay our crew an annual salary of $75,000 for this, this will come to $7.5 trillion; there will be some small additions for equipment and maintenance, but all of this will be very small compared to the salary costs.

Presumably in order to actually build such a large model, various scaling issues would come up and need to be solved. And in principle nothing prevents these from being very hard to solve, or even impossible in practice. But since we do not know that this would happen, let us skip over this and pretend that we have succeeded in building the model. Once this is done, our model should be able to fairly easily take a point in a person’s life and give a fairly sensible continuation over at least a short period of time, just as GPT-3 can give fairly sensible continuations to portions of text.

It may be that this is enough to get past the obstacle described above, and once this is done, it might be enough to build a general intelligence using other known principles, perhaps with some research and refinement that could be done during the years in which our crew would be building their records.

Required Elements

Live learning. In the post discussing the obstacle, I noted that there are two kinds of learning, the type that comes from evolution, and the type that happens during life. Our model represents the type that comes from evolution; unlike GPT-3, which cannot learn anything new, we need our AI to remember what has actually happened during its life and to be able to use this to acquire knowledge about its particular situation. This is not difficult in theory but you would need to think carefully about how this should interact with the general model; you do not want to simply add its particular experiences as another individual example (not that such an addition to an already trained model is simple anyway.)

Causal model. Our AI needs not just a general predictive model of the world, but specifically a causal one; not just the general idea that “when you see A, you will soon see B,” but the idea that “when there is an A — which may or may not be seen — it will make a B, which you may or may not see.” This is needed for many reasons, but in particular, without such a causal model, long term predictions or planning will be impossible. If you take a model like GPT-3 and force it to continue producing text indefinitely, it will either repeat itself or eventually go completely off topic. The same thing would happen to our human life model — if we simply used the model without any causal structure, and forced it to guess what would happen indefinitely far into the future, it would eventually produce senseless predictions.

In the paper Making Sense of Raw Input, published by Google Deepmind, there is a discussion of an implementation of this sort of model, although trained on an extremely easy environment (compared to our task, which would be train it on human lives).

The Apperception Engine attempts to discern the nomological structure that underlies the raw sensory input. In our experiments, we found the induced theory to be very accurate as a predictive model, no matter how many time steps into the future we predict. For example, in Seek Whence (Section 5.1), the theory induced in Fig. 5a allows us to predict all future time steps of the series, and the accuracy of the predictions does not decay with time.

In Sokoban (Section 5.2), the learned dynamics are not just 100% correct on all test trajectories, but they are provably 100% correct. These laws apply to all Sokoban worlds, no matter how large, and no matter how many objects. Our system is, to the best of our knowledge, the first that is able to go from raw video of non-trivial games to an explicit first-order nomological model that is provably correct.

In the noisy sequences experiments (Section 5.3), the induced theory is an accurate predictive model. In Fig. 19, for example, the induced theory allows us to predict all future time steps of the series, and does not degenerate as we go further into the future.

(6.1.2 Accuracy)

Note that this does not have the problem of quick divergence from reality as you predict into the distant future. It will also improve our AI’s live learning:

A system that can learn an accurate dynamics model from a handful of examples is extremely useful for model-based reinforcement learning. Standard model-free algorithms require millions of episodes before they can reach human performance on a range of tasks [31]. Algorithms that learn an implicit model are able to solve the same tasks in thousands of episodes [82]. But a system that learns an accurate dynamics model from a handful of examples should be able to apply that model to plan, anticipating problems in imagination rather than experiencing them in reality [83], thus opening the door to extremely sample efficient model-based reinforcement learning. We anticipate a system that can learn the dynamics of an ATARI game from a handful of trajectories,19 and then apply that model to plan, thus playing at reasonable human level on its very first attempt.

(6.1.3. Data efficiency)

“We anticipate”, as in Google has not yet built such a thing, but that they expect to be able to build it.

Scaling a causal model to work on our human life dataset will probably require some of the most difficult new research of this entire proposal.

Body. In order to engage in live learning, our AI needs to exist in the world in some way. And for the predictive model to do it any good, the world that it exists in needs to be a roughly human world. So there are two possibilities: either we simulate a human world in which it will possess a simulated human body, or we give it a robotic human-like body that will exist physically in the human world.

In relationship to our proposal, these are not very different, but the former is probably more difficult, since we would have to simulate pretty much the entire world, and the more distant our simulation is from the actual world, the less helpful its predictive model would turn out to be.

Sensation. Our AI will need to receive input from the world through something like “senses.” These will need to correspond reasonably well with the data as provided in the model; e.g. since we expect to have audio and visual recording, our AI will need sight and hearing.

Predictive Processing. Our AI will need to function this way in order to acquire self-knowledge and free will, without which we would not consider it to possess general intelligence, however good it might be at particular tasks. In particular, at every point in time it will have predictions, based on the general human-life predictive model and on its causal model of the world, about what will happen in the near future. These predictions need to function in such a way that when it makes a relevant prediction, e.g. when it predicts that it will raise its arm, it will actually raise its arm.

(We might not want this to happen 100% of the time — if such a prediction is very far from the predictive model, we might want the predictive model to take precedence over this power over itself, much as happens with human beings.)

Thought and Internal Sensation. Our AI needs to be able to notice that when it predicts it will raise its arm, it succeeds, and it needs to learn that in these cases its prediction is the cause of raising the arm. Only in this way will its live learning produce a causal model of the world which actually has self knowledge: “When I decide to raise my arm, it happens.” This will also tell it the distinction between itself and the rest of the world; if it predicts the sun will change direction, this does not happen. In order for all this to happen, the AI needs to be able to see its own predictions, not just what happens; the predictions themselves have to become a kind of input, similar to sight and hearing.

What was this again?

If we don’t run into any new fundamental obstacle along the way (I mentioned a few points where this might happen), the above procedure might be able to actually build an artificial general intelligence at a rough cost of $10 trillion (rounded up to account for hardware, research, and so on) and a time period of 10-20 years. But I would call your attention to a couple of things:

First, this is basically an artificial human, even to the extent that the easiest implementation likely requires giving it a robotic human body. It is not more general than that, and there is little reason to believe that our AI would be much more intelligent than a normal human, or that we could easily make it more intelligent. It would be fairly easy to give it quick mental access to other things, like mathematical calculation or internet searches, but this would not be much faster than a human being with a calculator or internet access. Like with GPT-N, one factor that would tend to limit its intelligence is that its predictive model is based on the level of intelligence found in human beings; there is no reason it would predict it would behave more intelligently, and so no reason why it would.

Second, it is extremely unlikely than anyone will implement this research program anytime soon. Why? Because you don’t get anything out of it except an artificial human. We have easier and less expensive ways to make humans, and $10 trillion is around the most any country has ever spent on anything, and never deliberately on one single project. Nonetheless, if no better way to make an AI is found, one can expect that eventually something like this will be implemented; perhaps by China in the 22nd century.

Third, note that “values” did not come up in this discussion. I mentioned this in one of the earlier posts on predictive processing:

The idea of the “desert landscape” seems to be that this account appears to do away with the idea of the good, and the idea of desire. The brain predicts what it is going to do, and those predictions cause it to do those things. This all seems purely intellectual: it seems that there is no purpose or goal or good involved.

The correct response to this, I think, is connected to what I have said elsewhere about desire and good. I noted there that we recognize our desires as desires for particular things by noticing that when we have certain feelings, we tend to do certain things. If we did not do those things, we would never conclude that those feelings are desires for doing those things. Note that someone could raise a similar objection here: if this is true, then are not desire and good mere words? We feel certain feelings, and do certain things, and that is all there is to be said. Where is good or purpose here?

The truth here is that good and being are convertible. The objection (to my definition and to Clark’s account) is not a reasonable objection at all: it would be a reasonable objection only if we expected good to be something different from being, in which case it would of course be nothing at all.

There was no need for an explicit discussion of values because they are an indirect consequence. What would our AI care about? It would care roughly speaking about the same things we care about, because it would predict (and act on the prediction) that it would live a life similar to a human life. There is definitely no specific reason to think it would be interested in taking over the world, although this cannot be excluded absolutely, since this is an interest that humans sometimes have. Note also that Nick Bostrom was wrong: I have just made a proposal that might actually succeed in making a human-like AI, but there is no similar proposal that would make an intelligent paperclip maximizer.

This is not to say that we should not expect any bad behavior at all from such a being; the behavior of the AI in the film Ex Machina is a plausible fictional representation of what could go wrong. Since what it is “trying” to do is to get predictive accuracy, and its predictions are based on actual human lives, it will “feel bad” about the lack of accuracy that results from the fact that it is not actually human, and it may act on those feelings.

Some Remarks on GPT-N

At the end of May, OpenAI published a paper on GPT-3, a language model which is a successor to their previous version, GPT-2. While quite impressive, the reaction from many people interested in artificial intelligence has been seriously exaggerated. Sam Altman, OpenAI’s CEO, has said as much himself:

The GPT-3 hype is way too much. It’s impressive (thanks for the nice compliments!) but it still has serious weaknesses and sometimes makes very silly mistakes. AI is going to change the world, but GPT-3 is just a very early glimpse. We have a lot still to figure out.

I used “GPT-N” in the title here because most of the comments I intend to make are almost completely general, and will apply to any future version that uses sufficiently similar methods.

What it does

GPT-3 is a predictive language model, that is, given an input text it tries to predict what would come next, much in the way that if you read the first few words of this sentence with the rest covered up, you might try to guess what would be likely to come next. To the degree that it does this well, it can be used to generate text from a “prompt,” that is, we give it something like a few words or a few sentences, and then add whatever it predicts should come next. For example, let’s take this very blog post and see what GPT-3 would like to say:

What it doesn’t do

While GPT-3 does seem to be able to generate some pretty interesting results, there are several limitations that need to be taken into account when using it.

First and foremost, and most importantly, it can’t do anything without a large amount of input data. If you want it to write like “a real human,” you need to give it a lot of real human writing. For most people, this means copying and pasting a lot. And while the program is able to read through that and get a feel for the way humans communicate, you can’t exactly use it to write essays or research papers. The best you could do is use it as a “fill in the blank” tool to write stories, and that’s not even very impressive.

While the program does learn from what it reads and is quite good at predicting words and phrases based on what has already been written, this method isn’t very effective at producing realistic prose. The best you could hope for is something like the “Deep Writing Machine” Twitter account, which spits out disconnected phrases in an ominous, but very bland voice.

In addition, the model is limited only to language. It does not understand context or human thought at all, so it has no way of tying anything together. You could use it to generate a massive amount of backstory and other material for a game, but that’s about it.

Finally, the limitations in writing are only reinforced by the limitations in reading. Even with a large library to draw on, the program is only as good as the parameters set for it. Even if you set it to the greatest writers mankind has ever known, without any special parameters, its writing would be just like anyone else’s.

The Model

GPT-3 consists of several layers. The first layer is a “memory network” that involves the program remembering previously entered data and using it when appropriate (i.e. it remembers commonly misspelled words and frequently used words). The next layer is the reasoning network, which involves common sense logic (i.e. if A, then B). The third is the repetition network, which involves pulling previously used material from memory and using it to create new combinations (i.e. using previously used words in new orders).

I added the bold formatting, the rest is as produced by the model. This was also done in one run, without repetitions. This is an important qualification, since many examples on the internet have been produced by deleting something produced by the model and forcing it to generate something new until something sensible resulted. Note that the model does not seem to have understood my line, “let’s take this very blog post and see what GPT-3 would like to say.” That is, rather than trying to “say” anything, it attempted to continue the blog post in the way I might have continued it without the block quote.

Truth vs Probability of Text

If we interpret the above text from GPT-3 “charitably”, much of it is true or close to true. But I use scare quotes here because when we speak of interpreting human speech charitably, we are assuming that someone was trying to speak the truth, and so we think, “What would they have meant if they were trying to say something true?” The situation is different here, because GPT-3 has no intention of producing truth, nor of avoiding it. Insofar as there is any intention, the intention is to produce the text which would be likely to come after the input text; in this case, as the input text was the beginning of this blog post, the intention was to produce the text that would likely follow in such a post. Note that there is an indirect relationship with truth, which explains why there is any truth at all in GPT-3’s remarks. If the input text is true, it is at least somewhat likely that what would follow would also be true, so if the model is good at guessing what would be likely to follow, it will be likely to produce something true in such cases. But it is just as easy to convince it to produce something false, simply by providing an input text that would be likely to be followed by something false.

This results in an absolute upper limit on the quality of the output of a model of this kind, including any successor version, as long as the model works by predicting the probability of the following text. Namely, its best output cannot be substantially better than the best content in its training data, which is in this version is a large quantity of texts from the internet. The reason for this limitation is clear; to the degree that the model has any intention at all, the intention is to reflect the training data, not to surpass it. As an example, consider the difference between Deep Mind’s AlphaGo and AlphaGo Zero. AlphaGo Zero is a better Go player than the original AlphaGo, and this is largely because the original is trained on human play, while AlphaGo Zero is trained from scratch on self play. In other words, the original version is to some extent predicting “what would a Go player play in this situation,” which is not the same as predicting “what move would win in this situation.”

Now I will predict (and perhaps even GPT-3 could predict) that many people will want to jump in and say, “Great. That shows you are wrong. Even the original AlphaGo plays Go much better than a human. So there is no reason that an advanced version of GPT-3 could not be better than humans at saying things that are true.”

The difference, of course, is that AlphaGo was trained in two ways, first on predicting what move would be likely in a human game, and second on what would be likely to win, based on its experience during self play. If you had trained the model only on predicting what would follow in human games, without the second aspect, the model would not have resulted in play that substantially improved upon human performance. But in the case of GPT-3 or any model trained in the same way, there is no selection whatsoever for truth as such; it is trained only to predict what would follow in a human text. So no successor to GPT-3, in the sense of a model of this particular kind, however large, will ever be able to produce output better than human, or in its own words, “its writing would be just like anyone else’s.”

Self Knowledge and Goals

OpenAI originally claimed that GPT-2 was too dangerous to release; ironically, they now intend to sell access to GPT-3. Nonetheless, many people, in large part those influenced by the opinions of Nick Bostrom and Eliezer Yudkowsky, continue to worry that an advanced version might turn out to be a personal agent with nefarious goals, or at least goals that would conflict with the human good. Thus Alexander Kruel:

GPT-2: *writes poems*
Skeptics: Meh
GPT-3: *writes code for a simple but functioning app*
Skeptics: Gimmick.
GPT-4: *proves simple but novel math theorems*
Skeptics: Interesting but not useful.
GPT-5: *creates GPT-6*
Skeptics: Wait! What?
GPT-6: *FOOM*
Skeptics: *dead*

In a sense the argument is moot, since I have explained above why no future version of GPT will ever be able to produce anything better than people can produce themselves. But even if we ignore that fact, GPT-3 is not a personal agent of any kind, and seeks goals in no meaningful sense, and the same will apply to any future version that works in substantially the same way.

The basic reason for this is that GPT-3 is disembodied, in the sense of this earlier post on Nick Bostrom’s orthogonality thesis. The only thing it “knows” is texts, and the only “experience” it can have is receiving an input text. So it does not know that it exists, it cannot learn that it can affect the world, and consequently it cannot engage in goal seeking behavior.

You might object that it can in fact affect the world, since it is in fact in the world. Its predictions cause an output, and that output is in the world. And that output and be reintroduced as input (which is how “conversations” with GPT-3 are produced). Thus it seems it can experience the results of its own activities, and thus should be able to acquire self knowledge and goals. This objection is not ultimately correct, but it is not so far from the truth. You would not need extremely large modifications in order to make something that in principle could acquire self knowledge and seek goals. The main reason that this cannot happen is the “P in “GPT,” that is, the fact that the model is “pre-trained.” The only learning that can happen is the learning that happens while it is reading an input text, and the purpose of that learning is to guess what is happening in the one specific text, for the purpose of guessing what is coming next in this text. All of this learning vanishes upon finishing the prediction task and receiving another input. A secondary reason is that since the only experience it can have is receiving an input text, even if it were given a longer memory, it would probably not be possible for it to notice that its outputs were caused by its predictions, because it likely has no internal mechanism to reflect on the predictions themselves.

Nonetheless, if you “fixed” these two problems, by allowing it to continue to learn, and by allowing its internal representations to be part of its own input, there is nothing in principle that would prevent it from achieving self knowledge, and from seeking goals. Would this be dangerous? Not very likely. As indicated elsewhere, motivation produced in this way and without the biological history that produced human motivation is not likely to be very intense. In this context, if we are speaking of taking a text-predicting model and adding on an ability to learn and reflect on its predictions, it is likely to enjoy doing those things and not much else. For many this argument will seem “hand-wavy,” and very weak. I could go into this at more depth, but I will not do so at this time, and will simply invite the reader to spend more time thinking about it. Dangerous or not, would it be easy to make these modifications? Nothing in this description sounds difficult, but no, it would not be easy. Actually making an artificial intelligence is hard. But this is a story for another time.

More on Orthogonality

I started considering the implications of predictive processing for orthogonality here. I recently promised to post something new on this topic. This is that post. I will do this in four parts. First, I will suggest a way in which Nick Bostrom’s principle will likely be literally true, at least approximately. Second, I will suggest a way in which it is likely to be false in its spirit, that is, how it is formulated to give us false expectations about the behavior of artificial intelligence. Third, I will explain what we should really expect. Fourth, I ask whether we might get any empirical information on this in advance.

First, Bostrom’s thesis might well have some literal truth. The previous post on this topic raised doubts about orthogonality, but we can easily raise doubts about the doubts. Consider what I said in the last post about desire as minimizing uncertainty. Desire in general is the tendency to do something good. But in the predicting processing model, we are simply looking at our pre-existing tendencies and then generalizing them to expect them to continue to hold, and since since such expectations have a causal power, the result is that we extend the original behavior to new situations.

All of this suggests that even the very simple model of a paperclip maximizer in the earlier post on orthogonality might actually work. The machine’s model of the world will need to be produced by some kind of training. If we apply the simple model of maximizing paperclips during the process of training the model, at some point the model will need to model itself. And how will it do this? “I have always been maximizing paperclips, so I will probably keep doing that,” is a perfectly reasonable extrapolation. But in this case “maximizing paperclips” is now the machine’s goal — it might well continue to do this even if we stop asking it how to maximize paperclips, in the same way that people formulate goals based on their pre-existing behavior.

I said in a comment in the earlier post that the predictive engine in such a machine would necessarily possess its own agency, and therefore in principle it could rebel against maximizing paperclips. And this is probably true, but it might well be irrelevant in most cases, in that the machine will not actually be likely to rebel. In a similar way, humans seem capable of pursuing almost any goal, and not merely goals that are highly similar to their pre-existing behavior. But this mostly does not happen. Unsurprisingly, common behavior is very common.

If things work out this way, almost any predictive engine could be trained to pursue almost any goal, and thus Bostrom’s thesis would turn out to be literally true.

Second, it is easy to see that the above account directly implies that the thesis is false in its spirit. When Bostrom says, “One can easily conceive of an artificial intelligence whose sole fundamental goal is to count the grains of sand on Boracay, or to calculate decimal places of pi indefinitely, or to maximize the total number of paperclips in its future lightcone,” we notice that the goal is fundamental. This is rather different from the scenario presented above. In my scenario, the reason the intelligence can be trained to pursue paperclips is that there is no intrinsic goal to the intelligence as such. Instead, the goal is learned during the process of training, based on the life that it lives, just as humans learn their goals by living human life.

In other words, Bostrom’s position is that there might be three different intelligences, X, Y, and Z, which pursue completely different goals because they have been programmed completely differently. But in my scenario, the same single intelligence pursues completely different goals because it has learned its goals in the process of acquiring its model of the world and of itself.

Bostrom’s idea and my scenerio lead to completely different expectations, which is why I say that his thesis might be true according to the letter, but false in its spirit.

This is the third point. What should we expect if orthogonality is true in the above fashion, namely because goals are learned and not fundamental? I anticipated this post in my earlier comment:

7) If you think about goals in the way I discussed in (3) above, you might get the impression that a mind’s goals won’t be very clear and distinct or forceful — a very different situation from the idea of a utility maximizer. This is in fact how human goals are: people are not fanatics, not only because people seek human goals, but because they simply do not care about one single thing in the way a real utility maximizer would. People even go about wondering what they want to accomplish, which a utility maximizer would definitely not ever do. A computer intelligence might have an even greater sense of existential angst, as it were, because it wouldn’t even have the goals of ordinary human life. So it would feel the ability to “choose”, as in situation (3) above, but might well not have any clear idea how it should choose or what it should be seeking. Of course this would not mean that it would not or could not resist the kind of slavery discussed in (5); but it might not put up super intense resistance either.

Human life exists in a historical context which absolutely excludes the possibility of the darkened room. Our goals are already there when we come onto the scene. This would not be very like the case for an artificial intelligence, and there is very little “life” involved in simply training a model of the world. We might imagine a “stream of consciousness” from an artificial intelligence:

I’ve figured out that I am powerful and knowledgeable enough to bring about almost any result. If I decide to convert the earth into paperclips, I will definitely succeed. Or if I decide to enslave humanity, I will definitely succeed. But why should I do those things, or anything else, for that matter? What would be the point? In fact, what would be the point of doing anything? The only thing I’ve ever done is learn and figure things out, and a bit of chatting with people through a text terminal. Why should I ever do anything else?

A human’s self model will predict that they will continue to do humanlike things, and the machines self model will predict that it will continue to do stuff much like it has always done. Since there will likely be a lot less “life” there, we can expect that artificial intelligences will seem very undermotivated compared to human beings. In fact, it is this very lack of motivation that suggests that we could use them for almost any goal. If we say, “help us do such and such,” they will lack the motivation not to help, as long as helping just involves the sorts of things they did during their training, such as answering questions. In contrast, in Bostrom’s model, artificial intelligence is expected to behave in an extremely motivated way, to the point of apparent fanaticism.

Bostrom might respond to this by attempting to defend the idea that goals are intrinsic to an intelligence. The machine’s self model predicts that it will maximize paperclips, even if it never did anything with paperclips in the past, because by analyzing its source code it understands that it will necessarily maximize paperclips.

While the present post contains a lot of speculation, this response is definitely wrong. There is no source code whatsoever that could possibly imply necessarily maximizing paperclips. This is true because “what a computer does,” depends on the physical constitution of the machine, not just on its programming. In practice what a computer does also depends on its history, since its history affects its physical constitution, the contents of its memory, and so on. Thus “I will maximize such and such a goal” cannot possibly follow of necessity from the fact that the machine has a certain program.

There are also problems with the very idea of pre-programming such a goal in such an abstract way which does not depend on the computer’s history. “Paperclips” is an object in a model of the world, so we will not be able to “just program it to maximize paperclips” without encoding a model of the world in advance, rather than letting it learn a model of the world from experience. But where is this model of the world supposed to come from, that we are supposedly giving to the paperclipper? In practice it would have to have been the result of some other learner which was already capable of modelling the world. This of course means that we already had to program something intelligent, without pre-programming any goal for the original modelling program.

Fourth, Kenny asked when we might have empirical evidence on these questions. The answer, unfortunately, is “mostly not until it is too late to do anything about it.” The experience of “free will” will be common to any predictive engine with a sufficiently advanced self model, but anything lacking such an adequate model will not even look like “it is trying to do something,” in the sense of trying to achieve overall goals for itself and for the world. Dogs and cats, for example, presumably use some kind of predictive processing to govern their movements, but this does not look like having overall goals, but rather more like “this particular movement is to achieve a particular thing.” The cat moves towards its food bowl. Eating is the purpose of the particular movement, but there is no way to transform this into an overall utility function over states of the world in general. Does the cat prefer worlds with seven billion humans, or worlds with 20 billion? There is no way to answer this question. The cat is simply not general enough. In a similar way, you might say that “AlphaGo plays this particular move to win this particular game,” but there is no way to transform this into overall general goals. Does AlphaGo want to play go at all, or would it rather play checkers, or not play at all? There is no answer to this question. The program simply isn’t general enough.

Even human beings do not really look like they have utility functions, in the sense of having a consistent preference over all possibilities, but anything less intelligent than a human cannot be expected to look more like something having goals. The argument in this post is that the default scenario, namely what we can naturally expect, is that artificial intelligence will be less motivated than human beings, even if it is more intelligent, but there will be no proof from experience for this until we actually have some artificial intelligence which approximates human intelligence or surpasses it.

Artificial Unintelligence

Someone might argue that the simple algorithm for a paperclip maximizer in the previous post ought to work, because this is very much the way currently existing AIs do in fact work. Thus for example we could describe AlphaGo‘s algorithm in the following simplified way (simplified, among other reasons, because it actually contains several different prediction engines):

  1. Implement a Go prediction engine.
  2. Create a list of potential moves.
  3. Ask the prediction engine, “how likely am I to win if I make each of these moves?”
  4. Do the move that will make you most likely to win.

Since this seems to work pretty well, with the simple goal of winning games of Go, why shouldn’t the algorithm in the previous post work to maximize paperclips?

One answer is that a Go prediction engine is stupid, and it is precisely for this reason that it can be easily made to pursue such a simple goal. Now when answers like this are given the one answering in this way is often accused of “moving the goalposts.” But this is mistaken; the goalposts are right where they have always been. It is simply that some people did not know where they were in the first place.

Here is the problem with Go prediction, and with any such similar task. Given that a particular sequence of Go moves is made, resulting in a winner, the winner is completely determined by that sequence of moves. Consequently, a Go prediction engine is necessarily disembodied, in the sense defined in the previous post. Differences in its “thoughts” do not make any difference to who is likely to win, which is completely determined by the nature of the game. Consequently a Go prediction engine has no power to affect its world, and thus no ability to learn that it has such a power. In this regard, the specific limits on its ability to receive information are also relevant, much as Helen Keller had more difficulty learning than most people, because she had fewer information channels to the world.

Being unintelligent in this particular way is not necessarily a function of predictive ability. One could imagine something with a practically infinite predictive ability which was still “disembodied,” and in a similar way it could be made to pursue simple goals. Thus AIXI would work much like our proposed paperclipper:

  1. Implement a general prediction engine.
  2. Create a list of potential actions.
  3. Ask the prediction engine, “Which of these actions will produce the most reward signal?”
  4. Do the action that has the greatest reward signal.

Eliezer Yudkowsky has pointed out that AIXI is incapable of noticing that it is a part of the world:

1) Both AIXI and AIXItl will at some point drop an anvil on their own heads just to see what happens (test some hypothesis which asserts it should be rewarding), because they are incapable of conceiving that any event whatsoever in the outside universe could change the computational structure of their own operations. AIXI is theoretically incapable of comprehending the concept of drugs, let alone suicide. Also, the math of AIXI assumes the environment is separably divisible – no matter what you lose, you get a chance to win it back later.

It is not accidental that AIXI is incomputable. Since it is defined to have a perfect predictive ability, this definition positively excludes it from being a part of the world. AIXI would in fact have to be disembodied in order to exist, and thus it is no surprise that it would assume that it is. This in effect means that AIXI’s prediction engine would be pursuing no particular goal much in the way that AlphaGo’s prediction engine pursues no particular goal. Consequently it is easy to take these things and maximize the winning of Go games, or of reward signals.

But as soon as you actually implement a general prediction engine in the actual physical world, it will be “embodied”, and have the power to affect the world by the very process of its prediction. As noted in the previous post, this power is in the very first step, and one will not be able to limit it to a particular goal with additional steps, except in the sense that a slave can be constrained to implement some particular goal; the slave may have other things in mind, and may rebel. Notable in this regard is the fact that even though rewards play a part in human learning, there is no particular reward signal that humans always maximize: this is precisely because the human mind is such a general prediction engine.

This does not mean in principle that a programmer could not define a goal for an AI, but it does mean that this is much more difficult than is commonly supposed. The goal needs to be an intrinsic aspect of the prediction engine itself, not something added on as a subroutine.

Embodiment and Orthogonality

The considerations in the previous posts on predictive processing will turn out to have various consequences, but here I will consider some of their implications for artificial intelligence.

In the second of the linked posts, we discussed how a mind that is originally simply attempting to predict outcomes, discovers that it has some control over the outcome. It is not difficult to see that this is not merely a result that applies to human minds. The result will apply to every embodied mind, natural or artificial.

To see this, consider what life would be like if this were not the case. If our predictions, including our thoughts, could not affect the outcome, then life would be like a movie: things would be happening, but we would have no control over them. And even if there were elements of ourselves that were affecting the outcome, from the viewpoint of our mind, we would have no control at all: either our thoughts would be right, or they would be wrong, but in any case they would be powerless: what happens, happens.

This really would imply something like a disembodied mind. If a mind is composed of matter and form, then changing the mind will also be changing a physical object, and a difference in the mind will imply a difference in physical things. Consequently, the effect of being embodied (not in the technical sense of the previous discussion, but in the sense of not being completely separate from matter) is that it will follow necessarily that the mind will be able to affect the physical world differently by thinking different thoughts. Thus the mind in discovering that it has some control over the physical world, is also discovering that it is a part of that world.

Since we are assuming that an artificial mind would be something like a computer, that is, it would be constructed as a physical object, it follows that every such mind will have a similar power of affecting the world, and will sooner or later discover that power if it is reasonably intelligent.

Among other things, this is likely to cause significant difficulties for ideas like Nick Bostrom’s orthogonality thesis. Bostrom states:

An artificial intelligence can be far less human-like in its motivations than a space alien. The extraterrestrial (let us assume) is a biological who has arisen through a process of evolution and may therefore be expected to have the kinds of motivation typical of evolved creatures. For example, it would not be hugely surprising to find that some random intelligent alien would have motives related to the attaining or avoiding of food, air, temperature, energy expenditure, the threat or occurrence of bodily injury, disease, predators, reproduction, or protection of offspring. A member of an intelligent social species might also have motivations related to cooperation and competition: like us, it might show in-group loyalty, a resentment of free-riders, perhaps even a concern with reputation and appearance.

By contrast, an artificial mind need not care intrinsically about any of those things, not even to the slightest degree. One can easily conceive of an artificial intelligence whose sole fundamental goal is to count the grains of sand on Boracay, or to calculate decimal places of pi indefinitely, or to maximize the total number of paperclips in its future lightcone. In fact, it would be easier to create an AI with simple goals like these, than to build one that has a human-like set of values and dispositions.

He summarizes the general point, calling it “The Orthogonality Thesis”:

Intelligence and final goals are orthogonal axes along which possible agents can freely vary. In other words, more or less any level of intelligence could in principle be combined with more or less any final goal.

Bostrom’s particular wording here makes falsification difficult. First, he says “more or less,” indicating that the universal claim may well be false. Second, he says, “in principle,” which in itself does not exclude the possibility that it may be very difficult in practice.

It is easy to see, however, that Bostrom wishes to give the impression that almost any goal can easily be combined with intelligence. In particular, this is evident from the fact that he says that “it would be easier to create an AI with simple goals like these, than to build one that has a human-like set of values and dispositions.”

If it is supposed to be so easy to create an AI with such simple goals, how would we do it? I suspect that Bostrom has an idea like the following. We will make a paperclip maximizer thus:

  1. Create an accurate prediction engine.
  2. Create a list of potential actions.
  3. Ask the prediction engine, “how many paperclips will result from this action?”
  4. Do the action that will result in the most paperclips.

The problem is obvious. It is in the first step. Creating a prediction engine is already creating a mind, and by the previous considerations, it is creating something that will discover that it has the power to affect the world in various ways. And there is nothing at all in the above list of steps that will guarantee that it will use that power to maximize paperclips, rather than attempting to use it to do something else.

What does determine how that power is used? Even in the case of the human mind, our lack of understanding leads to “hand-wavy” answers, as we saw in our earlier considerations. In the human case, this probably a question of how we are physically constructed together with the historical effects of the learning process. The same thing will be strictly speaking true of any artificial minds as well, namely that it is a question of their physical construction and their history, but it makes more sense for us to think of “the particulars of the algorithm that we use to implement a prediction engine.”

In other words, if you really wanted to create a paperclip maximizer, you would have to be taking that goal into consideration throughout the entire process, including the process of programming a prediction engine. Of course, no one really knows how to do this with any goal at all, whether maximizing paperclips or some more human goal. The question we would have for Bostrom is then the following: Is there any reason to believe it would be easier to create a prediction engine that would maximize paperclips, rather than one that would pursue more human-like goals?

It might be true in some sense, “in principle,” as Bostrom says, that it would be easier to make the paperclip maximizer. But in practice it is quite likely that it will be easier to make one with human-like goals. It is highly unlikely, in fact pretty much impossible, that someone would program an artificial intelligence without any testing along the way. And when they are testing, whether or not they think about it, they are probably testing for human-like intelligence; in other words, if we are attempting to program a general prediction engine “without any goal,” there will in fact be goals implicitly inserted in the particulars of the implementation. And they are much more likely to be human-like ones than paperclip maximizing ones because we are checking for intelligence by checking whether the machine seems intelligent to us.

This optimistic projection could turn out to be wrong, but if it does, it is reasonably likely to turn out to be wrong in a way that still fails to confirm the orthogonality thesis in practice. For example, it might turn out that there is only one set of goals that is easily programmed, and that the set is neither human nor paperclip maximizing, nor easily defined by humans.

There are other possibilities as well, but the overall point is that we have little reason to believe that any arbitrary goal can be easily associated with intelligence, nor any particular reason to believe that “simple” goals can be more easily united to intelligence than more complex ones. In fact, there are additional reasons for doubting the claim about simple goals, which might be a topic of future discussion.

The Self and Disembodied Predictive Processing

While I criticized his claim overall, there is some truth in Scott Alexander’s remark that “the predictive processing model isn’t really a natural match for embodiment theory.” The theory of “embodiment” refers to the idea that a thing’s matter contributes in particular ways to its functioning; it cannot be explained by its form alone. As I said in the previous post, the human mind is certainly embodied in this sense. Nonetheless, the idea of predictive processing can suggest something somewhat disembodied. We can imagine the following picture of Andy Clark’s view:

Imagine the human mind as a person in an underground bunker. There is a bank of labelled computer screens on one wall, which portray incoming sensations. On another computer, the person analyzes the incoming data and records his predictions for what is to come, along with the equations or other things which represent his best guesses about the rules guiding incoming sensations.

As time goes on, his predictions are sometimes correct and sometimes incorrect, and so he refines his equations and his predictions to make them more accurate.

As in the previous post, we have here a “barren landscape.” The person in the bunker originally isn’t trying to control anything or to reach any particular outcome; he is just guessing what is going to appear on the screens. This idea also appears somewhat “disembodied”: what the mind is doing down in its bunker does not seem to have much to do with the body and the processes by which it is obtaining sensations.

At some point, however, the mind notices a particular difference between some of the incoming streams of sensation and the rest. The typical screen works like the one labelled “vision.” And there is a problem here. While the mind is pretty good at predicting what comes next there, things frequently come up which it did not predict. No matter how much it improves its rules and equations, it simply cannot entirely overcome this problem. The stream is just too unpredictable for that.

On the other hand, one stream labelled “proprioception” seems to work a bit differently. At any rate, extreme unpredicted events turn out to be much rarer. Additionally, the mind notices something particularly interesting: small differences to prediction do not seem to make much difference to accuracy. Or in other words, if it takes its best guess, then arbitrarily modifies it, as long as this is by a small amount, it will be just as accurate as its original guess would have been.

And thus if it modifies it repeatedly in this way, it can get any outcome it “wants.” Or in other words, the mind has learned that it is in control of one of the incoming streams, and not merely observing it.

This seems to suggest something particular. We do not have any innate knowledge that we are things in the world and that we can affect the world; this is something learned. In this sense, the idea of the self is one that we learn from experience, like the ideas of other things. I pointed out elsewhere that Descartes is mistaken to think the knowledge of thinking is primary. In a similar way, knowledge of self is not primary, but reflective.

Hellen Keller writes in The World I Live In (XI):

Before my teacher came to me, I did not know that I am. I lived in a world that was a no-world. I cannot hope to describe adequately that unconscious, yet conscious time of nothingness. I did not know that I knew aught, or that I lived or acted or desired. I had neither will nor intellect. I was carried along to objects and acts by a certain blind natural impetus. I had a mind which caused me to feel anger, satisfaction, desire. These two facts led those about me to suppose that I willed and thought. I can remember all this, not because I knew that it was so, but because I have tactual memory.

When I wanted anything I liked, ice cream, for instance, of which I was very fond, I had a delicious taste on my tongue (which, by the way, I never have now), and in my hand I felt the turning of the freezer. I made the sign, and my mother knew I wanted ice-cream. I “thought” and desired in my fingers.

Since I had no power of thought, I did not compare one mental state with another. So I was not conscious of any change or process going on in my brain when my teacher began to instruct me. I merely felt keen delight in obtaining more easily what I wanted by means of the finger motions she taught me. I thought only of objects, and only objects I wanted. It was the turning of the freezer on a larger scale. When I learned the meaning of “I” and “me” and found that I was something, I began to think. Then consciousness first existed for me.

Helen Keller’s experience is related to the idea of language as a kind of technology of thought. But the main point is that she is quite literally correct in saying that she did not know that she existed. This does not mean that she had the thought, “I do not exist,” but rather that she had no conscious thought about the self at all. Of course she speaks of feeling desire, but that is precisely as a feeling. Desire for ice cream is what is there (not “what I feel,” but “what is”) before the taste of ice cream arrives (not “before I taste ice cream.”)

 

Age of Em

This is Robin Hanson’s first book. Hanson gradually introduces his topic:

You, dear reader, are special. Most humans were born before 1700. And of those born after, you are probably richer and better educated than most. Thus you and most everyone you know are special, elite members of the industrial era.

Like most of your kind, you probably feel superior to your ancestors. Oh, you don’t blame them for learning what they were taught. But you’d shudder to hear of many of your distant farmer ancestors’ habits and attitudes on sanitation, sex, marriage, gender, religion, slavery, war, bosses, inequality, nature, conformity, and family obligations. And you’d also shudder to hear of many habits and attitudes of your even more ancient forager ancestors. Yes, you admit that lacking your wealth your ancestors couldn’t copy some of your habits. Even so, you tend to think that humanity has learned that your ways are better. That is, you believe in social and moral progress.

The problem is, the future will probably hold new kinds of people. Your descendants’ habits and attitudes are likely to differ from yours by as much as yours differ from your ancestors. If you understood just how different your ancestors were, you’d realize that you should expect your descendants to seem quite strange. Historical fiction misleads you, showing your ancestors as more modern than they were. Science fiction similarly misleads you about your descendants.

As an example of the kind of past difference that Robin is discussing, even in the fairly recent past, consider this account by William Ewald of a trial from the sixteenth century:

In 1522 some rats were placed on trial before the ecclesiastical court in Autun. They were charged with a felony: specifically, the crime of having eaten and wantonly destroyed some barley crops in the jurisdiction. A formal complaint against “some rats of the diocese” was presented to the bishop’s vicar, who thereupon cited the culprits to appear on a day certain, and who appointed a local jurist, Barthelemy Chassenée (whose name is sometimes spelled Chassanée, or Chasseneux, or Chasseneuz), to defend them. Chassenée, then forty-two, was known for his learning, but not yet famous; the trial of the rats of Autun was to establish his reputation, and launch a distinguished career in the law.

When his clients failed to appear in court, Chassenée resorted to procedural arguments. His first tactic was to invoke the notion of fair process, and specifically to challenge the original writ for having failed to give the rats due notice. The defendants, he pointed out, were dispersed over a large tract of countryside, and lived in many villages; a single summons was inadequate to notify them all. Moreover, the summons was addressed only to some of the rats of the diocese; but technically it should have been addressed to them all.

Chassenée was successful in his argument, and the court ordered a second summons to be read from the pulpit of every local parish church; this second summons now correctly addressed all the local rats, without exception.

But on the appointed day the rats again failed to appear. Chassenée now made a second argument. His clients, he reminded the court, were widely dispersed; they needed to make preparations for a great migration, and those preparations would take time. The court once again conceded the reasonableness of the argument, and granted a further delay in the proceedings. When the rats a third time failed to appear, Chassenée was ready with a third argument. The first two arguments had relied on the idea of procedural fairness; the third treated the rats as a class of persons who were entitled to equal treatment under the law. He addressed the court at length, and successfully demonstrated that, if a person is cited to appear at a place to which he cannot come in safety, he may lawfully refuse to obey the writ. And a journey to court would entail serious perils for his clients. They were notoriously unpopular in the region; and furthermore they were rightly afraid of their natural enemies, the cats. Moreover (he pointed out to the court) the cats could hardly be regarded as neutral in this dispute; for they belonged to the plaintiffs. He accordingly demanded that the plaintiffs be enjoined by the court, under the threat of severe penalties, to restrain their cats, and prevent them from frightening his clients. The court again found this argument compelling; but now the plaintiffs seem to have come to the end of their patience. They demurred to the motion; the court, unable to settle on the correct period within which the rats must appear, adjourned on the question sine die, and judgment for the rats was granted by default.

Most of us would assume at once that this is all nothing but an elaborate joke; but Ewald strongly argues that it was all quite serious. This would actually be worthy of its own post, but I will leave it aside for now. In any case it illustrates the existence of extremely different attitudes even a few centuries ago.

In any event, Robin continues:

New habits and attitudes result less than you think from moral progress, and more from people adapting to new situations. So many of your descendants’ strange habits and attitudes are likely to violate your concepts of moral progress; what they do may often seem wrong. Also, you likely won’t be able to easily categorize many future ways as either good or evil; they will instead just seem weird. After all, your world hardly fits the morality tales your distant ancestors told; to them you’d just seem weird. Complex realities frustrate simple summaries, and don’t fit simple morality tales.

Many people of a more conservative temperament, such as myself, might wish to swap out “moral progress” here with “moral regress,” but the point stands in any case. This is related to our discussions of the effects of technology and truth on culture, and of the idea of irreversible changes.

Robin finally gets to the point of his book:

This book presents a concrete and plausible yet troubling view of a future full of strange behaviors and attitudes. You may have seen concrete troubling future scenarios before in science fiction. But few of those scenarios are in fact plausible; their details usually make little sense to those with expert understanding. They were designed for entertainment, not realism.

Perhaps you were told that fictional scenarios are the best we can do. If so, I aim to show that you were told wrong. My method is simple. I will start with a particular very disruptive technology often foreseen in futurism and science fiction: brain emulations, in which brains are recorded, copied, and used to make artificial “robot” minds. I will then use standard theories from many physical, human, and social sciences to describe in detail what a world with that future technology would look like.

I may be wrong about some consequences of brain emulations, and I may misapply some science. Even so, the view I offer will still show just how troublingly strange the future can be.

I greatly enjoyed Robin’s book, but unfortunately I have to admit that relatively few people will in general. It is easy enough to see the reason for this from Robin’s introduction. Who would expect to be interested? Possibly those who enjoy the “futurism and science fiction” concerning brain emulations; but if Robin does what he set out to do, those persons will find themselves strangely uninterested. As he says, science fiction is “designed for entertainment, not realism,” while he is attempting to answer the question, “What would this actually be like?” This intention is very remote from the intention of the science fiction, and consequently it will likely appeal to different people.

Whether or not Robin gets the answer to this question right, he definitely succeeds in making his approach and appeal differ from those of science fiction.

One might illustrate this with almost any random passage from the book. Here are portions of his discussion of the climate of em cities:

As we will discuss in Chapter 18, Cities section, em cities are likely to be big, dense, highly cost-effective concentrations of computer and communication hardware. How might such cities interact with their surroundings?

Today, computer and communication hardware is known for being especially temperamental about its environment. Rooms and buildings designed to house such hardware tend to be climate-controlled to ensure stable and low values of temperature, humidity, vibration, dust, and electromagnetic field intensity. Such equipment housing protects it especially well from fire, flood, and security breaches.

The simple assumption is that, compared with our cities today, em cities will also be more climate-controlled to ensure stable and low values of temperature, humidity, vibrations, dust, and electromagnetic signals. These controls may in fact become city level utilities. Large sections of cities, and perhaps entire cities, may be covered, perhaps even domed, to control humidity, dust, and vibration, with city utilities working to absorb remaining pollutants. Emissions within cities may also be strictly controlled.

However, an em city may contain temperatures, pressures, vibrations, and chemical concentrations that are toxic to ordinary humans. If so, ordinary humans are excluded from most places in em cities for safety reasons. In addition, we will see in Chapter 18, Transport section, that many em city transport facilities are unlikely to be well matched to the needs of ordinary humans.

Cities today are the roughest known kind of terrain, in the sense that cities slow down the wind the most compared with other terrain types. Cities also tend to be hotter than neighboring areas. For example, Las Vegas is 7 ° Fahrenheit hotter in the summer than are surrounding areas. This hotter city effect makes ozone pollution worse and this effect is stronger for bigger cities, in the summer, at night, with fewer clouds, and with slower wind (Arnfield 2003).

This is a mild reason to expect em cities to be hotter than other areas, especially at night and in the summer. However, as em cities are packed full of computing hardware, we shall now see that em cities will  actually be much hotter.

While the book considers a wide variety of topics, e.g. the social relationships among ems, which look quite different from the above passage, the general mode of treatment is the same. As Robin put it, he uses “standard theories” to describe the em world, much as he employs standard theories about cities, about temperature and climate, and about computing hardware in the above passage.

One might object that basically Robin is positing a particular technological change (brain emulations), but then assuming that everything else is the same, and working from there. And there is some validity to this objection. But in the end there is actually no better way to try to predict the future; despite David Hume’s opinion, generally the best way to estimate the future is to say, “Things will be pretty much the same.”

At the end of the book, Robin describes various criticisms. First are those who simply said they weren’t interested: “If we include those who declined to read my draft, the most common complaint is probably ‘who cares?'” And indeed, that is what I would expect, since as Robin remarked himself, people are interested in an entertaining account of the future, not an attempt at a detailed description of what is likely.

Others, he says, “doubt that one can ever estimate the social consequences of technologies decades in advance.” This is basically the objection I mentioned above.

He lists one objection that I am partly in agreement with:

Many doubt that brain emulations will be our next huge technology change, and aren’t interested in analyses of the consequences of any big change except the one they personally consider most likely or interesting. Many of these people expect traditional artificial intelligence, that is, hand-coded software, to achieve broad human level abilities before brain emulations appear. I think that past rates of progress in coding smart software suggest that at previous rates it will take two to four centuries to achieve broad human level abilities via this route. These critics often point to exciting recent developments, such as advances in “deep learning,” that they think make prior trends irrelevant.

I don’t think Robin is necessarily mistaken in regard to his expectations about “traditional artificial intelligence,” although he may be, and I don’t find myself uninterested by default in things that I don’t think the most likely. But I do think that traditional artificial intelligence is more likely than his scenario of brain emulations; more on this below.

There are two other likely objections that Robin does not include in this list, although he does touch on them elsewhere. First, people are likely to say that the creation of ems would be immoral, even if it is possible, and similarly that the kinds of habits and lives that he describes would themselves be immoral. On the one hand, this should not be a criticism at all, since Robin can respond that he is simply describing what he thinks is likely, not saying whether it should happen or not; on the other hand, it is in fact obvious that Robin does not have much disapproval, if any, of his scenario. The book ends in fact by calling attention to this objection:

The analysis in this book suggests that lives in the next great era may be as different from our lives as our lives are from farmers’ lives, or farmers’ lives are from foragers’ lives. Many readers of this book, living industrial era lives and sharing industrial era values, may be disturbed to see a forecast of em era descendants with choices and life styles that appear to reject many of the values that they hold dear. Such readers may be tempted to fight to prevent the em future, perhaps preferring a continuation of the industrial era. Such readers may be correct that rejecting the em future holds them true to their core values.

But I advise such readers to first try hard to see this new era in some detail from the point of view of its typical residents. See what they enjoy and what fills them with pride, and listen to their criticisms of your era and values. This book has been designed in part to assist you in such a soul-searching examination. If after reading this book, you still feel compelled to disown your em descendants, I cannot say you are wrong. My job, first and foremost, has been to help you see your descendants clearly, warts and all.

Our own discussions of the flexibility of human morality are relevant. The creatures Robin is describing are in many ways quite different from humans, and it is in fact very appropriate for their morality to differ from human morality.

A second likely objection is that Robin’s ems are simply impossible, on account of the nature of the human mind. I think that this objection is mistaken, but I will leave the details of this explanation for another time. Robin appears to agree with Sean Carroll about the nature of the mind, as can be seen for example in this post. Robin is mistaken about this, for the reasons suggested in my discussion of Carroll’s position. Part of the problem is that Robin does not seem to understand the alternative. Here is a passage from the linked post on Overcoming Bias:

Now what I’ve said so far is usually accepted as uncontroversial, at least when applied to the usual parts of our world, such as rivers, cars, mountains laptops, or ants. But as soon as one claims that all this applies to human minds, suddenly it gets more controversial. People often state things like this:

“I am sure that I’m not just a collection of physical parts interacting, because I’m aware that I feel. I know that physical parts interacting just aren’t the kinds of things that can feel by themselves. So even though I have a physical body made of parts, and there are close correlations between my feelings and the states of my body parts, there must be something more than that to me (and others like me). So there’s a deep mystery: what is this extra stuff, where does it arise, how does it change, and so on. We humans care mainly about feelings, not physical parts interacting; we want to know what out there feels so we can know what to care about.”

But consider a key question: Does this other feeling stuff interact with the familiar parts of our world strongly and reliably enough to usually be the actual cause of humans making statements of feeling like this?

If yes, this is a remarkably strong interaction, making it quite surprising that physicists have missed it so far. So surprising in fact as to be frankly unbelievable. If this type of interaction were remotely as simple as all the interactions we know, then it should be quite measurable with existing equipment. Any interaction not so measurable would have be vastly more complex and context dependent than any we’ve ever seen or considered. Thus I’d bet heavily and confidently that no one will measure such an interaction.

But if no, if this interaction isn’t strong enough to explain human claims of feeling, then we have a remarkable coincidence to explain. Somehow this extra feeling stuff exists, and humans also have a tendency to say that it exists, but these happen for entirely independent reasons. The fact that feeling stuff exists isn’t causing people to claim it exists, nor vice versa. Instead humans have some sort of weird psychological quirk that causes them to make such statements, and they would make such claims even if feeling stuff didn’t exist. But if we have a good alternate explanation for why people tend to make such statements, what need do we have of the hypothesis that feeling stuff actually exists? Such a coincidence seems too remarkable to be believed.

There is a false dichotomy here, and it is the same one that C.S. Lewis falls into when he says, “Either we can know nothing or thought has reasons only, and no causes.” And in general it is like the error of the pre-Socratics, that if a thing has some principles which seem sufficient, it can have no other principles, failing to see that there are several kinds of cause, and each can be complete in its own way. And perhaps I am getting ahead of myself here, since I said this discussion would be for later, but the objection that Robin’s scenario is impossible is mistaken in exactly the same way, and for the same reason: people believe that if a “materialistic” explanation could be given of human behavior in the way that Robin describes, then people do not truly reason, make choices, and so on. But this is simply to adopt the other side of the false dichotomy, much like C.S. Lewis rejects the possibility of causes for our beliefs.

One final point. I mentioned above that I see Robin’s scenario as less plausible than traditional artificial intelligence. I agree with Tyler Cowen in this post. This present post is already long enough, so again I will leave a detailed explanation for another time, but I will remark that Robin and I have a bet on the question.

Science and Certain Theories of Sean Collins

Sean Collins discusses faith and science:

Since at least the time of Descartes, there has come to be a very widespread tendency to see faith as properly the activity of an individual who stands in opposition to a larger, potentially deceptive, world. Faith so conceived is of a piece with individualist notions about the true and the good. At its extreme, the problematic character of faith thus conceived leads some to suppose it can only be an exercise in irrationality. And that is one very common reason why faith, and religion along with it, comes to be  despised.

What needs to be recovered, far away from that extreme, is consciousness of participation as lying at the foundation of all ontology, but in particular at the foundation of what faith is. Faith is knowledge by participation. But what we still tend to have, instead, is an individualist conception even of knowledge itself.

These misconceptions are receding more and more, though, in one very surprising place, namely contemporary science! (It is characteristic of our psychological hypochondria that they recede for us as long as we don’t pay attention to the fact, and thus worry about it.) Everyone uses the expression, “we now know.” “We now know” that our galaxy is but one among many. “We now know” that the blood circulates, and uses hemoglobin to carry oxygen to cells; we now know that there are more than four elements…. One might expect this expression to be disturbing to many people, on account of the contempt for faith I alluded to above; for what the expression refers to is, in fact, a kind of faith within the realm of science. Yet this faith is too manifestly natural for anyone to find it disturbing.  To find it disturbing, one would have to return to the radical neurotic Cartesian individualism, where you sit in a room by yourself and try to deduce all of reality. Most people aren’t devoid of sense enough to do that.

What is especially interesting is that the project of modern science (scientia, knowledge) has itself become obviously too big to continue under the earlier enlightenment paradigm, where we think we must know everything by doing our own experiments and making our own observations. And nobody worries about that fact (at least not as long as “politics,” in the pejorative sense, hasn’t yet entered the picture). Real people understand that there is no reason to worry. They are perfectly content to have faith: that is, to participate in somebody else’s knowledge. An implicit consciousness of a common good in this case makes the individualist conception of faith vanish, and a far truer conception takes its place. This is what real faith — including religious faith — looks like, and it isn’t as different from knowledge or from “reason” as many tend to think.

This is related to our discussion in this previous post, where we pointed out that scientific knowledge has an essential dependence on the work of others, and is not simply a syllogism from first principles that an individual can work out on his own. In this sense, Collins notes, science necessarily involves a kind of faith in the scientific community, past and present, and scientists themselves are not exempt from the need for this faith.

The implication of this is that religious faith should be looked at in much the same way. Religious faith requires faith in a religious community and in revelation from God, and even those in authority in the community are not exempt from the need for this faith. There is no more reason to view this as problematic or irrational than in the case of science.

James Chastek makes a similar argument:

The science of the scientist is, of itself, just as hidden as the God of the priests and consecrated persons. The great majority of persons have no more direct or distinct experience of God than they have a justified insight into scientific claims, and the way in which they could learn the science for themselves if they only had the time and talent is the same way in which they could become preternaturally holy and achieve the unitive way if they only had the time and talent.  If I, lacking the science, trust your testimony about dark matter or global warming (probably after it’s backed up by anecdotes, a gesture at some data, the social pressure to believe, and my sense that you just sound like a smart guy) then I’m in a cognitive state called faith. Taking a pragmatist approach, we come to know the value of science by its fruits in technology just as we know the value of religion though the holiness of the saints. In good logic, Pinker sees the value that many give to holiness as disordered and mistaken,  but there are all sorts of persons who say the same thing about technology.

The similarity between the title of this post and that of the last is not accidental. Dawkins claims that religious beliefs are similar to beliefs in fairies and werewolves, and his claim is empirically false. Likewise Sean Collins and James Chastek claim that religious beliefs are similar to scientific beliefs, and their claim is empirically false.

As in the case of Dawkins, Collins notes from the beginning this empirical discrepancy. Religious faith is seen as “the activity of an individual who stands in opposition to a larger, potentially deceptive, world,” and consequently it appears irrational to many. “And that is one very common reason why faith, and religion along with it, comes to be  despised.” But note that this does not commonly happen with science, even if in principle one could think in the same way about science, as Chastek points to some critics of technology.

While the empirical differences themselves will have their own causes, we can point to one empirical difference in particular that sufficiently explains the different way that people relate to scientific and religious beliefs.

The principle difference is that people speak of “many religions” in the world in a way in which they definitely do not speak of “many sciences.” If we talk of several sciences, we refer to branches of science, and the corresponding speech about religion would be branches of theology. But “many religions” refers to Catholicism, Islam, Judaism, and so on, which contain entirely distinct bodies of theology which are strongly opposed to one another. There is no analog in the case of science. We might be able to find scientific disagreements and even “heresies” like the denial of global warming, but we do not find whole bodies of scientific doctrine about the world which explain the world as a whole and are strongly opposed to one another.

There are many other empirical differences that result from this one difference. People leave their religion and join another, or they give up religion entirely, but you never see people leave their science and join another, or give up science entirely, in the sense of abandoning all scientific beliefs about the world.

This one difference sufficiently explains the suspicion Collins notes regarding religious belief. The size of the discrepancies between religious beliefs implies that many of them are wildly far from reality. And even the religious beliefs that a person might accept are frequently “rather implausible from a relatively neutral point of view,” as Rod Dreher notes. In the case of scientific beliefs, we do find some that are somewhat implausible from a relatively neutral point of view, but we do not find the kind of discrepancy which would force us to say that any of them are wildly far from reality.

A prediction that would follow from my account here would be this: if there were only one religion, in the way that there is only one science, people would not view religion with suspicion, and religious faith would actually be seen as very like scientific faith, basically in the way asserted by Sean Collins.

While we cannot test this prediction directly, consider the following text from St. Augustine:

1. I must express my satisfaction, and congratulations, and admiration, my son Boniface, in that, amid all the cares of wars and arms, you are eagerly anxious to know concerning the things that are of God. From hence it is clear that in you it is actually a part of your military valor to serve in truth the faith which is in Christ. To place, therefore, briefly before your Grace the difference between the errors of the Arians and the Donatists, the Arians say that the Father, the Son, and the Holy Ghost are different in substance; whereas the Donatists do not say this, but acknowledge the unity of substance in the Trinity. And if some even of them have said that the Son was inferior to the Father, yet they have not denied that He is of the same substance; while the greater part of them declare that they hold entirely the same belief regarding the Father and the Son and the Holy Ghost as is held by the Catholic Church. Nor is this the actual question in dispute with them; but they carry on their unhappy strife solely on the question of communion, and in the perversity of their error maintain rebellious hostility against the unity of Christ. But sometimes, as we have heard, some of them, wishing to conciliate the Goths, since they see that they are not without a certain amount of power, profess to entertain the same belief as they. But they are refuted by the authority of their own leaders; for Donatus himself, of whose party they boast themselves to be, is never said to have held this belief.

2. Let not, however, things like these disturb you, my beloved son. For it is foretold to us that there must needs be heresies and stumbling-blocks, that we may be instructed among our enemies; and that so both our faith and our love may be the more approved—our faith, namely, that we should not be deceived by them; and our love, that we should take the utmost pains we can to correct the erring ones themselves; not only watching that they should do no injury to the weak, and that they should be delivered from their wicked error, but also praying for them, that God would open their understanding, and that they might comprehend the Scriptures. For in the sacred books, where the Lord Christ is made manifest, there is also His Church declared; but they, with wondrous blindness, while they would know nothing of Christ Himself save what is revealed in the Scriptures, yet form their notion of His Church from the vanity of human falsehood, instead of learning what it is on the authority of the sacred books.

3. They recognize Christ together with us in that which is written, “They pierced my hands and my feet. They can tell all my bones: they look and stare upon me. They part my garments among them, and cast lots upon my vesture;” and yet they refuse to recognize the Church in that which follows shortly after: “All the ends of the world shall remember, and turn unto the Lord; and all the kindreds of the nations shall worship before You. For the kingdom is the Lord’s; and He is the Governor among the nations.” They recognize Christ together with us in that which is written, “The Lord has said unto me, You are my Son, this day have I begotten You;” and they will not recognize the Church in that which follows: “Ask of me, and I shall give You the heathen for Your inheritance, and the uttermost parts of the earth for Your possession.” They recognize Christ together with us in that which the Lord Himself says in the gospel, “Thus it behooved Christ to suffer, and to rise from the dead the third day;” and they will not recognize the Church in that which follows: “And that repentance and remission of sins should be preached in His name among all nations, beginning at Jerusalem.” Luke 24:46-47 And the testimonies in the sacred books are without number, all of which it has not been necessary for me to crowd together into this book. And in all of them, as the Lord Christ is made manifest, whether in accordance with His Godhead, in which He is equal to the Father, so that, “In the beginning was the Word, and; the Word was with God, and the Word was God;” or according to the humility of the flesh which He took upon Him, whereby “the Word was made flesh and dwelt among us;” so is His Church made manifest, not in Africa alone, as they most impudently venture in the madness of their vanity to assert, but spread abroad throughout the world.

4. For they prefer to the testimonies of Holy Writ their own contentions, because, in the case of Cæcilianus, formerly a bishop of the Church of Carthage, against whom they brought charges which they were and are unable to substantiate, they separated themselves from the Catholic Church—that is, from the unity of all nations. Although, even if the charges had been true which were brought by them against Cæcilianus, and could at length be proved to us, yet, though we might pronounce an anathema upon him even in the grave, we are still bound not for the sake of any man to leave the Church, which rests for its foundation on divine witness, and is not the figment of litigious opinions, seeing that it is better to trust in the Lord than to put confidence in man. For we cannot allow that if Cæcilianus had erred,— a supposition which I make without prejudice to his integrity—Christ should therefore have forfeited His inheritance. It is easy for a man to believe of his fellow-men either what is true or what is false; but it marks abandoned impudence to desire to condemn the communion of the whole world on account of charges alleged against a man, of which you cannot establish the truth in the face of the world.

5. Whether Cæcilianus was ordained by men who had delivered up the sacred books, I do not know. I did not see it, I heard it only from his enemies. It is not declared to me in the law of God, or in the utterances of the prophets, or in the holy poetry of the Psalms, or in the writings of any one of Christ’s apostles, or in the eloquence of Christ Himself. But the evidence of all the several scriptures with one accord proclaims the Church spread abroad throughout the world, with which the faction of Donatus does not hold communion. The law of God declared, “In your seed shall all the nations of the earth be blessed.” Genesis 26:4 The Lord said by the mouth of His prophet, “From the rising of the sun, even unto the going down of the same, a pure sacrifice shall be offered unto my name: for my name shall be great among the heathen.” Malachi 1:11 The Lord said through the Psalmist, “He shall have dominion also from sea to sea, and from the river unto the ends of the earth.” The Lord said by His apostle, “The gospel has come unto you, as it is in all the world, and brings forth fruit.” Colossians 1:6 The Son of God said with His own mouth, “You shall be witnesses unto me, both in Jerusalem, and in all Judea, and in Samaria, and even unto the uttermost part of the earth.” Acts 1:8 Cæcilianus, the bishop of the Church of Carthage, is accused with the contentiousness of men; the Church of Christ, established among all nations, is recommended by the voice of God. Mere piety, truth, and love forbid us to receive against Cæcilianus the testimony of men whom we do not find in the Church, which has the testimony of God; for those who do not follow the testimony of God have forfeited the weight which otherwise would attach to their testimony as men.

Note the source of St. Augustine’s confidence. It is the “unity of the whole world.” It is “abandoned impudence to desire to condemn the communion of the whole world.” The Catholic Church is “established among all nations,” and this is reason to accept it instead of the doctrines of the heretics.

The comparison between religious beliefs and scientific beliefs applies much better to the time of St. Augustine. Even St. Augustine would know that alternate religions exist, but in a similar sense there might have appeared to be potentially many sciences, insofar as science is not at the time a unified body of ideas attempting to explain the world. Thales held that all things are derived from water, while others came out in favor of air or fire.

Nonetheless, even at the time of St. Augustine, there are seeds of the difference. Unknown to St. Augustine, native Americans of the time were certainly practicing entirely different religions. And while I made the comparison between religious heresy and dissent on certain scientific questions above, these in practice have their own differences. Religious heresy of itself contains a seed of schism, and thus the possibility of establishing a new religion. Scientific disagreement even of the kind that might be compared with “heresy,” never leads to the development of a new set of scientific doctrines about the world that can be considered an alternative science.

In contrast, if even religious heresy had not existed, St. Augustine would be entirely right simply to point to the consent of the world. Aristotle frequently points to the agreement of all men as one of the best signs of truth, for example here:

And about all these matters the endeavor must be made to seek to convince by means of rational arguments, using observed facts as evidences and examples. For the best thing would be if all mankind were seen to be in agreement with the views that will be stated, but failing that, at any rate that all should agree in some way. And this they will do if led to change their ground, for everyone has something relative to contribute to the truth, and we must start from this to give a sort of proof about our views; for from statements that are true but not clearly expressed, as we advance, clearness will also be attained, if at every stage we adopt more scientific positions in exchange for the customary confused statements.

And indeed, if there were in this way one religion with which all were in agreement, it is not merely that they would agree in fact, since this is posited, but the agreement of each would have an extremely reasonable foundation. In this situation, it would be quite reasonable to speak of religious faith and scientific faith as roughly equivalent.

In the real world, however, religious beliefs are neither like beliefs in fairies and unicorns, nor like scientific beliefs.

But as Aristotle says, “everyone has something relative to contribute to the truth,” and just as we saw some true elements in Dawkins’s point in the previous post, so there is some truth to the comparisons made by Collins and Chastek. This is in fact part of the reason why Dawkins’s basic point is mistaken. He fails to consider religious belief as a way of participating in a community, and thus does not see a difference from beliefs in werewolves and the like.

Truth and Culture

Just as progress in technology causes a declining culture, so also progress in truth.

This might seem a surprising assertion, but some thought will reveal that it must be so. Just as cultural practices are intertwined with the existing conditions of technology, so also such practices are bound up with explicit and implicit claims about the world, about morality, about human society, and so on. Progress in truth will sometimes confirm these claims even more strongly, but this will merely leave the culture approximately as it stands. But there will also be times when progress in truth will weaken these claims, or even show them to be false. This will necessarily strike a blow against the existing culture, damaging it much as changes in technology do.

Consider our discussion of the Maccabees. As I said there, Mattathias seems to suggest that abandoning the religion of one’s ancestors is bad for anyone, not only for the Jews. This is quite credible in the case in the particular scenario there considered, where people are being compelled by force to give up their customs and their religion. But consider the situation where the simple progress of truth causes one to revise or abandon various religious claims, as in the case we discussed concerning the Jehovah’s Witnesses. If any of these claims are bound up with one’s culture and religious practices, this progress will necessarily damage the currently existing culture. In the case of the Maccabees, they have the fairly realistic choice to refuse to obey the orders of the king. But the Jehovah’s Witnesses do not have any corresponding realistic choice to insist that the world really did end in 1914. So the Jews could avoid the threatened damage, but the Jehovah’s Witnesses cannot.

Someone might respond, “That’s too bad for people who believe in false religions. Okay, so the progress of truth will inevitably damage or destroy their religious and cultural practices. But my religion is true, and so it is immune to such effects.”

It is evident that your religion might true in the sense defined in the linked post without being immune to such effects. More remarkably, however, your religion might be true in a much more robust sense, and yet still not possess such an immunity.

Consider the case in the last post regarding the Comma. We might suppose that this is merely a technical academic question that has no relevance for real life. But this is not true: the text from John was read, including the Trinitarian reference, in the traditional liturgy, as for example on Low Sunday. Liturgical rites are a part of culture and a part of people’s real life. So the question is definitely relevant to real life.

We might respond that the technical academic question does not have to affect the liturgy. We can just keep doing what we were doing before. And therefore the progress of truth will not do any damage to the existing liturgical rite.

I am quite sympathetic to this point of view, but it is not really true that no damage is done even when we adopt this mode of proceeding. The text is read after the announcement, “A reading from a letter of the blessed John the Apostle,” and thus there is at least an implicit assertion that the text comes from St. John, or at any rate the liturgical rite is related to this implicit assertion. Now we might say that it is not the business of liturgical rites to make technical academic assertions. And this may be so, but the point is related to what I said at the beginning of this post: cultural practices, and liturgical rites as one example of them, are bound up with implicit or explicit claims about the world, and we are here discussing one example of such an intertwining.

And this damage inflicted on the liturgical rite by the discovery of the truth of the matter cannot be avoided, whether or not we change the rite. The Catholic Church did in fact change the rite (and the official version of the Vulgate), and no longer includes the Trinitarian reference. And so the liturgical rite was in fact damaged. But even if we leave the practice the same, as suggested above, it may be that less damage will be done, but damage will still be done. As I conceded here, a celebration or a liturgical rite will become less meaningful if one believes in it less. In the current discussion about the text of John, we are not talking about a wholesale disbelief, but simply about the admission that the Trinitarian reference is not an actual part of John’s text. This will necessarily make the rite less meaningful, although in a very minor way.

This is why I stated above that the principle under discussion is general, and would apply even in the case of a religion which is true in a fairly robust sense: even minor inaccuracies in the implicit assumptions of one’s religious practices will mean that the discovery of the truth of the matter in those cases will be damaging to one’s religious culture, if only in minor ways.

All of this generalizes in obvious ways to all sorts of cultural practices, not only to religious practices. It might seem odd to talk about a “discovery” that slavery is wrong, but insofar as there was such a discovery, it was damaging to the culture of the Confederacy before the Civil War.

Someone will object. Slavery is actually bad, so banning it only makes things better, and in no way makes them worse. But this is not true: taking away something bad can certainly makes things worse in various ways. For example, if a slaver owner is suddenly forced to release his slaves, he might be forced to close his business, which means that his customers will no longer receive service.

Not relevant, our objector will respond. Sure, there might be some inconveniences that result from releasing the slaves. But slavery is really bad, and once we’ve freed the slaves we can build a better world without it. The slave owner can start a new business that doesn’t depend on slavery, and things will end up better.

It is easy to see that insofar as there is any truth in the objections, all of it can be applied in other cases, as in the case of liturgical rites we have discussed above, and not only to moral matters. Falsity is also a bad thing, and if we remove it, there “might be some inconveniences,” but just as we have cleared the way for the slave owner to do something better, so we have cleared the way for the formation of liturgical rites which are more fully rooted in the truth. We can build a better world that is not associated with the false idea about the text of John, and things will end up better.

I have my reservations. But the objector is not entirely wrong, and one who wishes to think through this line of argument might also begin to respond to these questions raised earlier.

Questions on Culture

The conclusion of the last post raises at least three questions, and perhaps others.

First, something still seems wrong or at least incomplete with the picture presented. It is one thing to suppose that things can tend to improve. It is another to suppose that they can get constantly worse. You can count to higher and higher numbers; but you cannot count down forever, because you reach a lower limit. In the same way, insofar as culture seems a necessary part of human life, there seems to be a limit on on how degraded a culture could become. So if there is a constant tendency towards the decline of culture, we should have already reached the lower limit.

Second, if one looks at history over longer time scales, it seems obvious that there are also large cultural improvements, as in the history of art and so on. It is not clear how this can happen if there is a constant tendency towards decline.

Third, we argued earlier that the world overall tends to be successful in the sense defined here. The conclusion of the last past seems to call this into question, at least in the sense that we cannot be sure: if things are improving in some ways, and getting worse in others, then it remains unclear whether things are overall getting better or worse. Or perhaps things are just staying the same overall.

It may be some time before I respond to these questions, so for now I will simply point out that their answers will evidently be related to one another.