Bias vs. Variance

Scott Fortmann-Roe explains the difference between error due to bias and error due to variance:

  • Error due to Bias: The error due to bias is taken as the difference between the expected (or average) prediction of our model and the correct value which we are trying to predict. Of course you only have one model so talking about expected or average prediction values might seem a little strange. However, imagine you could repeat the whole model building process more than once: each time you gather new data and run a new analysis creating a new model. Due to randomness in the underlying data sets, the resulting models will have a range of predictions. Bias measures how far off in general these models’ predictions are from the correct value.
  • Error due to Variance: The error due to variance is taken as the variability of a model prediction for a given data point. Again, imagine you can repeat the entire model building process multiple times. The variance is how much the predictions for a given point vary between different realizations of the model.

Later in the essay, he suggests that there is a natural tendency to overemphasize minimizing bias:

A gut feeling many people have is that they should minimize bias even at the expense of variance. Their thinking goes that the presence of bias indicates something basically wrong with their model and algorithm. Yes, they acknowledge, variance is also bad but a model with high variance could at least predict well on average, at least it is not fundamentally wrong.

This is mistaken logic. It is true that a high variance and low bias model can perform well in some sort of long-run average sense. However, in practice modelers are always dealing with a single realization of the data set. In these cases, long run averages are irrelevant, what is important is the performance of the model on the data you actually have and in this case bias and variance are equally important and one should not be improved at an excessive expense to the other.

Fortmann-Roe is concerned here with bias and variance in a precise mathematical sense, relative to the project of fitting a curve to a set of data points. However, his point could be generalized to apply much more generally, to interpreting and understanding the world overall. Tyler Cowen makes such a generalized point:

Arnold Kling summarizes Robin’s argument:

If you have a cause, then other people probably disagree with you (if nothing else, they don’t think your cause is as important as you do). When other people disagree with you, they are usually more right than you think they are. So you could be wrong. Before you go and attach yourself to this cause, shouldn’t you try to reduce the chances that you are wrong? Ergo, shouldn’t you work on trying to overcome bias? Therefore, shouldn’t overcoming bias be your number one cause?

Here is Robin’s very similar statement.  I believe these views are tautologically true and they simply boil down to saying that any complaint can be expressed as a concern about error of some kind or another.  I cannot disagree with this view, for if I do, I am accusing Robin of being too biased toward eliminating bias, thus reaffirming that bias is in fact the real problem.

I find it more useful to draw an analogy with statistics.  Biased estimators are one problem but not the only problem.  There is also insufficient data, lazy researchers, inefficient estimators, and so on.  Then I don’t see why we should be justified in holding a strong preference for overcoming bias, relative to other ends.

Tyler is arguing, for example, that someone may be in error because he is biased, but he can also be in error because he is too lazy to seek out the truth, and it may be more important in a particular case to overcome laziness than to overcome bias.

This is true, no doubt, but we can make a stronger point: In the mathematical discussion of bias and variance, insisting on a completely unbiased model will result in a very high degree of variance, with the nearly inevitable consequence of a higher overall error rate. Thus, for example, we can create a polynomial which will go through every point of the data exactly. Such a method of predicting data is completely unbiased. Nonetheless, such a model tends to be highly inaccurate in predicting new data due to its very high variance: the exact curve is simply too sensitive to the exact points found in the original data. In a similar way, even in the more general non-mathematical case, we will likely find that insisting on a completely unbiased method will result in greater error overall: the best way to find the truth may be to adopt a somewhat simplified model, just as in the mathematical case it is best not to try to fit the data exactly. Simplifying the model will introduce some bias, but it will also reduce variance.

To the best of my knowledge, no one has a demonstrably perfect method of adopting the best model, even in the mathematical case. Much less, therefore, can we come up with a perfect trade-off between bias and variance in the general case. We can simply use our best judgment. But we have some reason for thinking that there must be some such trade-off, just as there is in the mathematical case.

The Actual Infinite

There are good reasons to think that actual infinities are possible in the real world. In the first place, while the size and shape of the universe are not settled issues, the generally accepted theory fits better with the idea that the universe is physically infinite than with the idea that it is finite.

Likewise, the universe is certainly larger than the size of the observable universe, namely about 93 billion light years in diameter. Supposing you have a probability distribution which assigns a finite probability to the claim that the universe is physically infinite, there is no consistent probability distribution which will not cause the probability of an infinite universe to go to 100% at the limit, as you exclude smaller finite sizes. But if someone had assigned a reasonable probability distribution before modern physical science existed, it would very likely have been one that make the probability of an infinite universe go very high by the time the universe was confirmed to be its present size. Therefore we too should think that the universe is very probably infinite. In principle, this argument is capable of refuting even purported demonstrations of the impossibility of an actual infinite, since there is at least some small chance that these purported demonstrations are all wrong.

Likewise, almost everyone accepts the possibility of an infinite future. Even the heat death of the universe would not prevent the passage of infinite time, and a religious view of the future also generally implies the passage of infinite future time. Even if heaven is supposed to be outside time in principle, in practice there would still be an infinite number of future human acts. If eternalism or something similar is true, then an infinite future in itself implies an actual infinite. And even if such a theory is not true, it is likely that a potentially infinite future implies the possibility of an actual infinite, because any problematic or paradoxical results from an actual infinite can likely be imitated in some way in the case of an infinite future.

On the other hand, there are good reasons to think that actual infinities are not possible in the real world. Positing infinities results in paradoxical or contradictory results in very many cases, and the simplest and therefore most likely way to explain this is to admit that infinities are simply impossible in general, even in the cases where we have not yet verified this fact.

An actual infinite also seems to imply an infinite regress in causality, and such a regress is impossible. We can see this by considering the material cause. Suppose the universe is physically infinite, and contains an infinite number of stars and planets. Then the universe is composed of the solar system together with the rest of the universe. But the rest of the universe will be composed of another stellar system together with the remainder, and so on. So there will be an infinite regress of material causality, which is just as impossible with material causality as with any other kind of causality.

Something similar is implied by St. Thomas’s argument against an infinite multitude:

This, however, is impossible; since every kind of multitude must belong to a species of multitude. Now the species of multitude are to be reckoned by the species of numbers. But no species of number is infinite; for every number is multitude measured by one. Hence it is impossible for there to be an actually infinite multitude, either absolute or accidental.

We can look at this in terms of our explanation of defining numbers. This explanation works only for finite numbers, and an infinite number could not be defined in such a way, precisely because it would result in an infinite regress. This leads us back to the first argument above against infinities: an infinity is intrinsically undefined and unintelligible, and for that reason leads to paradoxes. Someone might say that something unintelligible cannot be understood but is not impossible; but this is no different from Bertrand Russell saying that there is no reason for things not to come into being from nothing, without a cause. Such a position is unreasonable and untrue.

Spinoza’s Geometrical Ethics

Benedict Spinoza, admiring the certainty of geometry, writes his Ethics Demonstrated in Geometrical Order in a manner imitating that of Euclid’s Elements.

Omitting his definitions and axioms for the moment, we can look at his proofs. Thus we have the first:

1: A substance is prior in nature to its states. This is evident from D3 and D5.

The two definitions are of “substance” and “mode,” which latter he equates with “state of a substance.” However, neither definition explains “prior in nature,” nor is this found in any of the other definitions and axioms.

Thus his argument does not follow. But we can grant that the claim is fairly reasonable in any case, and would follow according to many reasonable definitions of “prior in nature,” and according to reasonable axioms.

He proceeds to his second proof:

2: Two substances having different attributes have nothing in common with one another. This is also evident from D3. For each ·substance· must be in itself and be conceived through itself, which is to say that the concept of the one doesn’t involve the concept of the other.

D3 and D4 (which must be used here although he does not cite it explicitly in the proof) say:

D3: By ‘substance’ I understand: what is in itself and is conceived through itself, i.e. that whose concept doesn’t have to be formed out of the concept of something else. D4: By ‘attribute’ I understand: what the intellect perceives of a substance as constituting its essence.

Thus when he speaks of “substances having different attributes,” he means ones which are intellectually perceived as being different in their essence.

Once again, however, “have nothing in common” is not found in his definitions. However, it occurs once in his axioms, namely in A5:

A5: If two things have nothing in common, they can’t be understood through one another—that is, the concept of one doesn’t involve the concept of the other.

The axiom is pretty reasonable, at least taken in a certain way. If there is no idea common to the ideas of two things, the idea of one won’t be included in the idea of the other. But Spinoza is attempting to draw the conclusion that “if two substances have different attributes, i.e. are different in essence, then they have nothing in common.” But this does not seem to follow from a reasonable understanding of D3 and D4, nor from the definitions together with the axioms. “Dog” and “cat” might be substances, and the idea of dog does not include that of cat, nor cat the idea of dog, but they have “animal” in common. So his conclusion is not evident from the definition, nor does it follow logically from his definitions and axioms, nor does it seem to be true.

And this is only the second supposed proof out of 36 in part 1 of his book.

I would suggest that there are at least two problems with his whole project. First, Spinoza knows where he wants to get, and it is not somewhere good. Among other things, he is aiming for proposition 14:

14: God is the only substance that can exist or be conceived.

This is closely related to proposition 2, since if it is true that two different things can have nothing in common, then it is impossible for more than one thing to exist, since otherwise existence would be something in common to various things.

Proposition 14 is absolutely false taken in any reasonable way. Consequently, since Spinoza is absolutely determined to arrive at a false proposition, he will necessarily employ falsehoods or logical mistakes along the way.

There is a second problem with his project. Geometry speaks about a very limited portion of reality. For this reason it is possible to come to most of its conclusions using a limited variety of definitions and axioms. But ethics and metaphysics, the latter of which is the actual topic of his first book, are much wider in scope. Consequently, if you want to say much that is relevant about them, it is impossible in principle to proceed from a small number of axioms and definitions. A small number of axioms and definitions will necessarily include only a small number of terms, and speaking about ethics and metaphysics requires a large number of terms. For example, suppose I wanted to prove everything on this blog using the method of definitions and axioms. Since I have probably used thousands of terms, hundreds or thousands of definitions and axioms would be required. There would simply be no other way to get the desired conclusions. In a similar way, we saw even in the first few proofs that Spinoza has a similar problem; he wants to speak about a very broad subject, but he wants to start with just a few definitions and axioms.

And if you do employ hundreds of axioms, of course, there is very little chance that anyone is going to grant all of them. They will at least argue that some of them might be mistaken, and thus your proofs will lose the complete certainty that you were looking for from the geometrical method.

 

Numbering The Good

The book Theory of Games and Economic Behavior, by John Von Neumann and Oskar Morgenstern, contains a formal mathematical theory of value. In the first part of the book they discuss some objections to such a project, as well as explaining why they are hopeful about it:

1.2.2. It is not that there exists any fundamental reason why mathematics should not be used in economics. The arguments often heard that because of the human element, of the psychological factors etc., or because there is allegedly no measurement of important factors, mathematics will find no application, can all be dismissed as utterly mistaken. Almost all these objections have been made, or might have been made, many centuries ago in fields where mathematics is now the chief instrument of analysis. This “might have been” is meant in the following sense: Let us try to imagine ourselves in the period which preceded the mathematical or almost mathematical phase of the development in physics, that is the 16th century, or in chemistry and biology, that is the 18th century. Taking for granted the skeptical attitude of those who object to mathematical economics in principle, the outlook in the physical and biological sciences at these early periods can hardly have been better than that in economics, mutatis mutandis, at present.

As to the lack of measurement of the most important factors, the example of the theory of heat is most instructive; before the development of the mathematical theory the possibilities of quantitative measurements were less favorable there than they are now in economics. The precise measurements of the quantity and quality of heat (energy and temperature) were the outcome and not the antecedents of the mathematical theory. This ought to be contrasted with the fact that the quantitative and exact notions of prices, money and the rate of interest were already developed centuries ago.

A further group of objections against quantitative measurements in economics, centers around the lack of indefinite divisibility of economic quantities. This is supposedly incompatible with the use of the infinitesimal calculus and hence (!) of mathematics. It is hard to see how such objections can be maintained in view of the atomic theories in physics and chemistry, the theory of quanta in electrodynamics, etc., and the notorious and continued success of mathematical analysis within these disciplines.

This project requires the possibility of treating the value of things as a numerically measurable quantity. Calling this value “utility”, they discuss the difficulty of this idea:

3.1.2. Historically, utility was first conceived as quantitatively measurable, i.e. as a number. Valid objections can be and have been made against this view in its original, naive form. It is clear that every measurement, or rather every claim of measurability, must ultimately be based on some immediate sensation, which possibly cannot and certainly need not be analyzed any further. In the case of utility the immediate sensation of preference, of one object or aggregate of objects as against another, provides this basis. But this permits us only to say when for one person one utility is greater than another. It is not in itself a basis for numerical comparison of utilities for one person nor of any comparison between different persons. Since there is no intuitively significant way to add two utilities for the same person, the assumption that utilities are of non-numerical character even seems plausible. The modern method of indifference curve analysis is a mathematical procedure to describe this situation.

They note however that the original situation was no different with the idea of quantitatively measuring heat:

3.2.1. All this is strongly reminiscent of the conditions existent at the beginning of the theory of heat: that too was based on the intuitively clear concept of one body feeling warmer than another, yet there was no immediate way to express significantly by how much, or how many times, or in what sense.

Beginning the derivation of their particular theory, they say:

3.3.2. Let us for the moment accept the picture of an individual whose system of preferences is all-embracing and complete, i.e. who, for any two objects or rather for any two imagined events, possesses a clear intuition of preference.

More precisely we expect him, for any two alternative events which are put before him as possibilities, to be able to tell which of the two he prefers.

It is a very natural extension of this picture to permit such an individual to compare not only events, but even combinations of events with stated probabilities.

By a combination of two events we mean this: Let the two events be denoted by B and C and use, for the sake of simplicity, the probability 50%-50%. Then the “combination” is the prospect of seeing B occur with a probability of 50% and (if B does not occur) C with the (remaining) probability of 50%. We stress that the two alternatives are mutually exclusive, so that no possibility of complementarity and the like exists. Also, that an absolute certainty of the occurrence of either B or C exists.

To restate our position. We expect the individual under consideration to possess a clear intuition whether he prefers the event A to the 50-50 combination of B or C, or conversely. It is clear that if he prefers A to B and also to C, then he will prefer it to the above combination as well; similarly, if he prefers B as well as C to A, then he will prefer the combination too. But if he should prefer A to, say B, but at the same time C to A, then any assertion about his preference of A against the combination contains fundamentally new information. Specifically: If he now prefers A to the 50-50 combination of B and C, this provides a plausible base for the numerical estimate that his preference of A over B is in excess of his preference of C over A.

If this standpoint is accepted, then there is a criterion with which to compare the preference of C over A with the preference of A over B. It is well known that thereby utilities, or rather differences of utilities, become numerically measurable. That the possibility of comparison between A, B, and C only to this extent is already sufficient for a numerical measurement of “distances” was first observed in economics by Pareto. Exactly the same argument has been made, however, by Euclid for the position of points on a line in fact it is the very basis of his classical derivation of numerical distances.

It is important to note that the the things being assigned values are described as events. They should not be considered to be actions or choices, or at any rate, only insofar as actions or choices are themselves events that happen in the world. This is important because a person might very well think, “It would be better if A happened than if B happened. But making A happen is vicious, while making B happen is virtuous, so I will make B happen.” He prefers A as an outcome, but the actions which cause these events do not line up, in their moral value, with the external value of the outcomes. Of course, just as the person says that A happening is a better outcome than B happening, he can say that “choosing to make B happen” is a better outcome than “choosing to make A happen.” So in this sense there is nothing to exclude actions from being included in this system of value. But they can only be included insofar as actions themselves are events that happen in the world.

Von Neumann and Morgenstern continue:

The introduction of numerical measures can be achieved even more directly if use is made of all possible probabilities. Indeed: Consider three events, C, A, B, for which the order of the individual’s preferences is the one stated. Let a be a real number between 0 and 1, such that A is exactly equally desirable with the combined event consisting of a chance of probability 1 – a for B and the remaining chance of probability a for C. Then we suggest the use of a as a numerical estimate for the ratio of the preference of A over B to that of C over B.

So for example, suppose that C is an orange (or as an event, eating an orange). is eating a plum, and is eating an apple. The person prefers the orange to the plum, and the plum to the apple. The person prefers a combination of a 20% chance of an apple and an 80% chance of an orange to a plum, while he prefers a plum to a combination of a 40% chance of an apple and a 60% chance of an orange. Since this indicates that his preference changes sides at some point, we suppose that this happens at a 30% chance of an apple and a 70% chance of an orange. All the combinations giving more than a 70% chance of the orange, he prefers to the plum; and he prefers the plum to all the combinations giving less than a 70% chance of the orange. The authors are suggesting that if we assign numerical values to the plum, the apple, and the orange, we should do this in such a way that the difference between the values of the plum and the apple, divided by the difference between the values of the orange and the apple, should be 0.7.

The basic intuition here is that since the combinations of various probabilities of the orange and apple vary continuously from (100% orange, 0% apple) to (0% orange, 100% apple), the various combinations should go continuously through every possible value between the value of the orange and the value of the apple. Since we are passing through those values by changing a probability, they are suggesting mapping that probability directly onto a value. Thus if the value of the orange is 1 and the value of the apple is 0, we say that the value of the plum is 0.7, because the plum is basically equivalent in value to a combination of a 70% chance of the orange and a 30% chance of the apple.

Working this out formally in the later parts of the paper, they show that given that a person’s preferences satisfy certain fairly reasonable axioms, it will be possible to assign values to each of his preferences, and these values are necessarily uniquely determined up to the point of a linear transformation.

I will not describe the axioms themselves here, although they are described in the book, as well as perhaps more simply elsewhere.

Note that according to this system, if you want to know the value of a combination, e.g. (60% chance of A and 40% chance of B), the value will always be 0.6(value of A)+0.4(value of B). The authors comment on this result:

3.7.1. At this point it may be well to stop and to reconsider the situation. Have we not shown too much? We can derive from the postulates (3:A)-(3:C) the numerical character of utility in the sense of (3:2:a) and (3:1:a), (3:1:b) in 3.5.1.; and (3:1:b) states that the numerical values of utility combine (with probabilities) like mathematical expectations! And yet the concept of mathematical expectation has been often questioned, and its legitimateness is certainly dependent upon some hypothesis concerning the nature of an “expectation.” Have we not then begged the question? Do not our postulates introduce, in some oblique way, the hypotheses which bring in the mathematical expectation?

More specifically: May there not exist in an individual a (positive or negative) utility of the mere act of “taking a chance,” of gambling, which the use of the mathematical expectation obliterates?

The objection is this: according to this system of value, if something has a value v, and something else has the double value 2v, the person should consider getting the thing with value v to be completely equal with a deal where he has an exactly 50% chance of getting the thing with value 2v, and a 50% chance of getting nothing. That seems objectionable because many people would prefer a certainty of getting something, to a situation where there is a good chance of getting nothing, even if there is also a chance of getting something more valuable. So for example, if you were now offered the choice of $100,000 directly, or $200,000 if you flip a coin and get heads, and nothing if you get tails, you would probably not only prefer the $100,000, but prefer it to a very high degree.

Morgenstern and Von Neumann continue:

How did our axioms (3:A)-(3:C) get around this possibility?

As far as we can see, our postulates (3:A)-(3:C) do not attempt to avoid it. Even that one which gets closest to excluding a “utility of gambling” (3:C:b) (cf. its discussion in 3.6.2.), seems to be plausible and legitimate, unless a much more refined system of psychology is used than the one now available for the purposes of economics. The fact that a numerical utility, with a formula amounting to the use of mathematical expectations, can be built upon (3:A)-(3:C), seems to indicate this: We have practically defined numerical utility as being that thing for which the calculus of mathematical expectations is legitimate. Since (3:A)-(3:C) secure that the necessary construction can be carried out, concepts like a “specific utility of gambling” cannot be formulated free of contradiction on this level.

“We have practically defined numerical utility as being that thing for which the calculus of mathematical expectations is legitimate.” In other words, the reason for the strange result is that calling a value “double” very nearly simply means that a 50% chance of that value, and a 50% chance of nothing, is considered equal to the original value which was to be doubled.

Considering the case of the $100,000 and $200,000, perhaps it is not so strange after all, even if we think of value in the terms of Von Neumann and Morgenstern. You are benefited if you receive $100,000. But if you receive $100,000, and then another $100,000, how much benefit do you get from the second gift? Just as much? Not at all. The first gift will almost certainly make a much bigger change in your life than the second gift. So even by ordinary standards, getting $200,000 is not twice as valuable as getting $100,000, but less than twice as valuable.

There might be something such that it would have exactly twice the value of $100,000 for you in the Von Neumann-Morgenstern sense. If you care about money enough, perhaps $300,000, or $1,000,000. If so, then you would consider the deal where you flip a coin for this amount of money just as good (considered in advance) as directly receiving $100,000. If you don’t care enough about money for such a thing to be true, there will be something else that you do consider to have twice the value, or more, in this sense. For example, if you have a brother dying of cancer, you would probably prefer that he have a 50% chance of survival, to receiving the $100,000. This means that in the relevant sense, you consider the survival of your brother to have more than double the value of $100,000.

This system of value does not in fact prevent one from assigning a “specific utility of gambling,” even within the system, as long as the fact that I am gambling or not is considered as a distinct event which is an additional result. If the only value that matters is money, then it is indeed a contradiction to speak of a specific utility of gambling. But if I care both about money and about whether I am gambling or not, there is no contradiction.

Something else is implied by all of this, something which is frequently not noticed. Suppose you have a choice of two events in this way. One of them is something that you would want or would like, as small or big as you like. It could be having a nice day at the beach, or $100, or whatever you please. The other is a deal where you have a virtual certainty of getting nothing, and a very small probability of some extremely large reward. For example, it may be that your brother dying of cancer is also on the road to hell. The second event is to give your brother a chance of one in a googolplex of attaining eternal salvation.

Of course, the second event here is worthless. Nobody is going to do anything or give up anything for the sake of such a deal. What this implies is this: if a numerical value is assigned to something in the Von Neumann-Morgenstern manner, no matter what that thing is, that value must be low enough (in comparison to other values) that it won’t have any significant value after it is divided by a googolplex.

In other words, even eternal salvation does not have an infinite value, but a finite value (measured in this way), and low enough that it can be made worthless by enough division.

If we consider the value to express how much we care about something, then this actually makes intuitive sense, because we do not care infinitely about anything, not even about things which might be themselves infinite.

Pascal, in his wager, assumes a probability of 50% for God and for the truth of religious beliefs, and seems to assume a certainty of salvation, given that you accept those beliefs and that they happen to be true. He also seems to assume a certain loss of salvation, if you do not accept those beliefs and they happen to be true, and that nothing in particular will happen if the beliefs are not true.

These assumptions are not very reasonable, considered as actual probability assignments and actual expectations of what is going to happen. However, some set of assignments will be reasonable, and this will certainly affect the reasonableness of the wager. If the probability of success is too low, the wager will be unreasonable, just as above we noted that it would be unreasonable to accept the deal concerning your brother. On the other hand, if the probability of success is high enough, it may well be reasonable to take the deal.

Erroneous Responses to Pascal

Many arguments which are presented against accepting Pascal’s wager are mistaken, some of them in obvious ways. For example, the argument is made that the multiplicity of religious beliefs or potential religious beliefs invalidates the wager:

But Pascal’s argument is seriously flawed. The religious environment that Pascal lived in was simple. Belief and disbelief only boiled down to two choices: Roman Catholicism and atheism. With a finite choice, his argument would be sound. But on Pascal’s own premise that God is infinitely incomprehensible, then in theory, there would be an infinite number of possible theologies about God, all of which are equally probable.

First, let us look at the more obvious possibilities we know of today – possibilities that were either unknown to, or ignored by, Pascal. In the Calvinistic theological doctrine of predestination, it makes no difference what one chooses to believe since, in the final analysis, who actually gets rewarded is an arbitrary choice of God. Furthermore we know of many more gods of many different religions, all of which have different schemes of rewards and punishments. Given that there are more than 2,500 gods known to man, and given Pascal’s own assumptions that one cannot comprehend God (or gods), then it follows that, even the best case scenario (i.e. that God exists and that one of the known Gods and theologies happen to be the correct one) the chances of making a successful choice is less than one in 2,500.

Second, Pascal’s negative theology does not exclude the possibility that the true God and true theology is not one that is currently known to the world. For instance it is possible to think of a God who rewards, say, only those who purposely step on sidewalk cracks. This sounds absurd, but given the premise that we cannot understand God, this possible theology cannot be dismissed. In such a case, the choice of what God to believe would be irrelevant as one would be rewarded on a premise totally distinct from what one actually believes. Furthermore as many atheist philosophers have pointed out, it is also possible to conceive of a deity who rewards intellectual honesty, a God who rewards atheists with eternal bliss simply because they dared to follow where the evidence leads – that given the available evidence, no God exists! Finally we should also note that given Pascal’s premise, it is possible to conceive of a God who is evil and who punishes the good and rewards the evil.

Thus Pascal’s call for us not to consider the evidence but to simply believe on prudential grounds fails.

There is an attempt here to base the response on Pascal’s mistaken claim that the probability of the existence of God (and of Catholic doctrine as a whole) is 50%. This would presumably be because we can know nothing about theological truth. According to this, the website reasons that all possible theological claims should be equally probable, and consequently one will be in any case very unlikely to find the truth, and therefore very unlikely to attain the eternal reward, using Pascal’s apparent assumption that only believers in a specific theology can attain the reward.

The problem with this is that it reasons for Pascal’s mistaken assumptions (as well as changing them in unjustified ways), while in reality the effectiveness of the wager does not precisely depend on these assumptions. If there is a 10% chance that God exists, and the rest is true as Pascal states it, it would still seem to be a good bet that God exists, in terms of the practical consequences. You will probably be wrong, but the gain if you are right will be so great that it will almost certainly outweigh the probable loss.

In reality different theologies are not equally probable, and there will be one which is most probable. Theologies such as the “God who rewards atheism”, which do not have any actual proponents, have very little evidence for them, since they do not even have the evidence resulting from a claim. One cannot expect that two differing positions will randomly have exactly the same amount of evidence for them, so one theology will have more evidence than any other. And even if it did not have overall a probability of more than 50%, it could still be a good bet, given the possibility of the reward, and better than any of the other potential wagers.

The argument is also made that once one admits an infinite reward, it is not possible to distinguish between actions with differing values. This is described here:

If you regularly brush your teeth, there is some chance you will go to heaven and enjoy infinite bliss. On the other hand, there is some chance you will enjoy infinite heavenly bliss even if you do not brush your teeth. Therefore the expectation of brushing your teeth (infinity plus a little extra due to oral health = infinity) is the same as that of not brushing your teeth (infinity minus a bit due to cavities and gingivitis = infinity), from which it follows that dental hygiene is not a particularly prudent course of action. In fact, as soon as we allow infinite utilities, decision theory tells us that any course of action is as good as any other (Duff 1986). Hence we have a reductio ad absurdum against decision theory, at least when it’s extended to infinite cases.

As actually applied, someone might argue that even if the God who rewards atheism is less probable than the Christian God, the expected utility of being Christian or atheist will be infinite in each case, and therefore one will not be a more reasonable choice than another. Some people actually seem to believe that this is a good response, but it is not. The problem here is that decision theory is a mathematical formalism and does not have to correspond precisely with real life. The mathematics does not work when infinity is introduced, but this does not mean there cannot be such an infinity in reality, nor that the two choices would be equal in reality. It simply means you have not chosen the right mathematics to express the situation. To see this clearly, consider the following situation.

You are in a room with two exits, a green door and a red door. The green door has a known probability of 99% of leading to an eternal heaven, and a known probability of 1% of leading to an eternal hell. The red door has a known probability of 99% of leading to an eternal hell, and a known probability of 1% of leading to an eternal heaven.

The point is that if your mathematics says that going out the red door is just as good as going out the green door, your mathematics is wrong. The correct solution is to go out the green door.

I would consider all such arguments, namely arguing that all religious beliefs are equally probable, or that being rewarded for atheism is as probable as being rewarded for Christianity, or that all infinite expectations are equal, are examples of not very serious thinking. These arguments are not only wrong. They are obviously wrong, and obviously motivated by the desire not to believe. Earlier I quoted Thomas Nagel on the fear of religion. After the quoted passage, he continues:

My guess is that this cosmic authority problem is not a rare condition and that it is responsible for much of the scientism and reductionism of our time. One of the tendencies it supports is the ludicrous overuse of evolutionary biology to explain everything about life, including everything about the human mind. Darwin enabled modern secular culture to heave a great collective sigh of relief, by apparently providing a way to eliminate purpose, meaning, and design as fundamental features of the world. Instead they become epiphenomena, generated incidentally by a process that can be entirely explained by the operation of the nonteleological laws of physics on the material of which we and our environments are all composed. There might still be thought to be a religious threat in the existence of the laws of physics themselves, and indeed the existence of anything at all— but it seems to be less alarming to most atheists.

This is a somewhat ridiculous situation.

This fear of religion is very likely the cause of such unreasonable responses. Scott Alexander notes in this comment that such explanations are mistaken:

I find all of the standard tricks used against Pascal’s Wager intellectually unsatisfying because none of them are at the root of my failure to accept it. Yes, it might be a good point that there could be an “atheist God” who punishes anyone who accepts Pascal’s Wager. But even if a super-intelligent source whom I trusted absolutely informed me that there was definitely either the Catholic God or no god at all, I feel like I would still feel like Pascal’s Wager was a bad deal. So it would be dishonest of me to say that the possibility of an atheist god “solves” Pascal’s Wager.

The same thing is true for a lot of the other solutions proposed. Even if this super-intelligent source assured me that yes, if there is a God He will let people into Heaven even if their faith is only based on Pascal’s Wager, that if there is a God He will not punish you for your cynical attraction to incentives, and so on, and re-emphasized that it was DEFINITELY either the Catholic God or nothing, I still wouldn’t happily become a Catholic.

Whatever the solution, I think it’s probably the same for Pascal’s Wager, Pascal’s Mugging, and the Egyptian mummy problem I mentioned last month. Right now, my best guess for that solution is that there are two different answers to two different questions:

Why do we believe Pascal’s Wager is wrong? Scope insensitivity. Eternity in Hell doesn’t sound that much worse, to our brains, than a hundred years in Hell, and we quite rightly wouldn’t accept Pascal’s Wager to avoid a hundred years in Hell. Pascal’s Mugger killing 3^^^3 people doesn’t sound too much worse than him killing 3,333 people, and we quite rightly wouldn’t give him a dollar to get that low a probability of killing 3,333 people.

Why is Pascal’s Wager wrong? From an expected utility point of view, it’s not. In any particular world, not accepting Pascal’s Wager has a 99.999…% chance of leading to a higher payoff. But averaged over very large numbers of possible worlds, accepting Pascal’s Wager or Pascal’s Mugging will have a higher payoff, because of that infinity going into the averages. It’s too bad that doing the rational thing leads to a lower payoff in most cases, but as everyone who’s bought fire insurance and not had their house catch on fire knows, sometimes that happens.

I realize that this position commits me, so far as I am rational, to becoming a theist. But my position that other people are exactly equal in moral value to myself commits me, so far as I am rational, to giving almost all my salary to starving Africans who would get a higher marginal value from it than I do, and I don’t do that either.

While a far more reasonable response, there is wishful thinking going here as well, with the assumption that the probability that a body of religious beliefs is true as a whole is extremely small. This will not generally speaking be the case, or at any rate it will not be as small as he suggests, once the evidence derived from the claim itself is taken into account, just as it is not extremely improbable that a particular book is mostly historical, even though if one considered the statements contained in the book as a random conjunction, one would suppose it to be very improbable.

Remote From My Senses

Earlier we saw that opinions about things more remote from the senses are more likely to be influenced by motives apart from truth. However, even if in principle a thing would have many obvious empirical consequences, it is possible that those consequences are quite unclear to me, or perhaps those consequences could only be seen by others. In such a case the matter may be remote from the senses in a personal way; I do not personally see how it would make a difference to me either way, or it can make such a difference to others, but not to me.

For example, Fermat’s Last Theorem was proven by Andrew Wiles in 1994. If the theorem were false, in principle this would surely have empirical consequences. But the proof is complex enough that this is basically a theoretical rather than a practical statement. Someone who is not a mathematician, or anyone who was not verified the proof for himself, simply has to trust mathematicians as a body about the fact that the proof is valid. Even for those mathematicians who have verified the proof for themselves, most likely they are more confident that it is true based on their trust in the community of mathematicians than in their own effort to verify it. If I am a mathematician who has verified it, I could easily have made a mistake. But it would be less likely that the same or similar mistakes were made by every single mathematician who tried.

In a sense, then, Fermat’s Last Theorem is somewhat remote from the senses for every individual person, including mathematicians. So why do we not see widespread disagreement about it, disagreement of the kind we see in politics and religion?

If Fermat’s Last Theorem were false, this would require either a conspiracy theory, or a quasi-conspiracy theory.

The conspiracy theory, of course, would be that mathematicians as a body know that Fermat’s Last Theorem is false, but do not want everyone else to know this, so they claim that they have verified the proof and found it valid, while in reality there are flaws in it and they know about them.

The quasi-conspiracy theory would be that mathematicians as a body believe that Fermat’s Last Theorem is true, but that they consistently fail in their attempt to verify the proof. There is a mistake in it, but each time someone tries to verify it, they fail to notice the mistake.

The reason to call this a quasi-conspiracy theory is that the most reasonable way for this to happen is if mathematicians as a body have motivations similar to the mathematicians in the case of the actual conspiracy, motivations that cause them to behave in much the same ways in practice.

We can see this by considering a case where you would have an actual conspiracy. Suppose a seven year old child is told by his parents that Santa Claus is the one who brings presents on Christmas Eve. The child believes them. When he speaks with his playmates, they tell him the same thing. If he notices something odd, his parents explain it away. He asks other adults about it, and they say the same thing.

The adults as a body are deceiving the child about the fact that Santa Claus does not exist, and they are doing this by means of an actual conspiracy. They know there is no Santa Claus, but they are working together to ensure that the child believes that there is one.

What is necessary for this to happen? It is necessary that the adults have a motive quite remote from truth for wishing the child to believe that there is a Santa Claus, and it is on account of this motive that they engage in the conspiracy.

In a similar way, suppose that mathematicians as a body were deluded about Fermat’s Last Theorem. Since they are actually deluded, there is no actual conspiracy. But how did this happen? Why do they all make mistakes when they try to verify the theorem? In principle it might simply be that the question is very hard, and there is a mistake that is extremely difficult to notice. And in reality, this may be the only likely way for this to happen in the case of mathematics. But in other cases, there may be a more plausible mechanism to generate consistent mistakes, and this is wishful thinking of one kind or another. If mathematicians as a body want Fermat’s Last Theorem to be true and to be a settled question, they may carelessly overlook mistakes in the proof, in order to say that it is true. Technically they are not making a deliberate mistake. But in practice it is the lack of care about truth, and the interest in something opposed to truth, which makes them act as a body to deceive others, just as an actual conspiracy does.

Scientists as a body believe that the theory of evolution is true, and that it is very certain. Wikipedia illustrates this:

The Discovery Institute announced that over 700 scientists had expressed support for intelligent design as of February 8, 2007. This prompted the National Center for Science Education to produce a “light-hearted” petition called “Project Steve” in support of evolution. Only scientists named “Steve” or some variation (such as Stephen, Stephanie, and Stefan) are eligible to sign the petition. It is intended to be a “tongue-in-cheek parody” of the lists of alleged “scientists” supposedly supporting creationist principles that creationist organizations produce. The petition demonstrates that there are more scientists who accept evolution with a name like “Steve” alone (over 1370) than there are in total who support intelligent design.

But there are many, like Fr. Brian Harrison, who think that the scientists are wrong about this. The considerations of this post make clear why it is possible for someone to believe this. If Fr. Harrison is right, scientists as a body would be engaging in a quasi-conspiracy. Many scientists are atheists, and perhaps they would like evolution to be true because they think it makes atheism more plausible. Perhaps such motivations, together with the motive of sticking together with other scientists, sufficiently explain why scientists are misinterpreting the evidence to support evolution, even though it does not actually support it.

If I have not studied the evidence for evolution myself, this argument is much more plausible than the same claim about Fermat’s Last Theorem, simply because there is no actually plausible motive in the mathematical case. But if there were a plausible motive, one would be likely to see such quasi-conspiracy theories about mathematical claims as well.

Quick to Hear and Slow to Speak

St. James says in 1:19-20 of his letter,Let every man be quick to hear, slow to speak, slow to anger, for the anger of man does not work the righteousness of God.”

What does he mean? How is it possible for every man to be quick to hear and slow to speak? A conversation needs to have an approximately equal amount of listening and speaking. If each of two conversational partners insists on listening instead of speaking, the conversation will go nowhere. Whenever one is speaking, the other should be listening, and if one is listening, the other must be speaking, since it is not listening if neither of the two is saying anything.

The reference to anger is a clue. St. James is speaking of our natural tendencies, and saying that the natural tendency to anger is excessive and must be resisted. Likewise, we tend to have more of a desire to speak than to listen. We would rather explain our own position than listen to that of another. In order for a conversation to go well, each of the partners should restrain his own desire to express his own opinion, in order to listen to the other. This does not imply anything impossible any more than restraining anger is impossible; people have a naturally excessive desire to express themselves in the same way that they have a naturally excessive tendency to become angry. Thus St. James is not against a conversation which is equally composed of listening and speaking; but he is saying that such a conversation requires restraint on both parties to the conversation. A conversation without such restraint leads to situations where someone thinks, “he’s not listening to me,” which then leads precisely to the anger that St. James is opposing.

Robert Aumann has a paper, “Agreeing to Disagree”, which mathematically demonstrates that people having the same prior probability distribution and following the laws of probability, cannot have a different posterior probability regarding any matter, assuming that their opinions of the matter are common knowledge between them. He begins his paper:

If two people have the same priors, and their posteriors for a given event A are common knowledge, then these posteriors must be equal. This is so even though they may base their posteriors on quite different information. In brief, people with the same priors cannot agree to disagree.

We publish this observation with some diffidence, since once one has the appropriate framework, it is mathematically trivial.

The implication is something like this: one person may believe that there is a 50% chance it will rain tomorrow. Another person, having access to other information, such as having seen the weather channel, thinks that there is a 70% chance of rain. Currently these estimates are not common knowledge. But if the two people converse until they both know each other’s current opinion (which will possibly no longer be 50% and 70%), they must agree on the probability of rain, given that they have the same prior distribution.

There are several reasons why this does not apply to real human beings. First of all, people do not have an actual prior probability distribution; such a distribution means having an estimate of the original probability of every possible statement, and obviously people do not actually have such a thing. So not having a prior distribution at all, they cannot possibly have the same prior distribution.

Second, the theorem presumes that each of the two knows that each of the two is reasonable in exactly the sense required, namely having such a prior and updating on it according to the laws of probability. In real life no one does this, even apart from the fact that they do not have such a prior.

Various extensions of the theorem have been published by others, some of which come closer to having a bearing on real human beings. Possibly I will consider some of these results in the future. Even without such extensions, however, Aumann’s result does have some relationship with real disagreements.

We have all had good conversations and bad conversations when we disagreed with someone, and it is not so difficult to recognize the difference. In the best conversations, we may have actually come to partial or even full agreement, even if not exactly on the original position of either partner. In the worst conversations, neither partner budged, and both concluded that the other was being stubborn and unreasonable. Possibly the conversation descended to the point of anger, insults and attributing bad will to the other. And on the other hand we have also had conversations which were somewhat in the middle between these two extremes.

These facts are related to Aumann’s result because his result is that reasonable conversational partners must end up agreeing, this being understood in a simplified mathematical sense. Because of the simplifications it does not strictly apply to real life, but something like it is also true in real life, and we can see that in our experiences with conversations with others involving disagreements. In other words, basically whenever we get to the point where neither partner will budge, we begin to think that someone is being stubborn and at least somewhat unreasonable.

St. James is explaining how to avoid the bad conversations and have the good conversations. And that is by being “quick to hear.” It is a question of listening to the other. And basically that implies asking the question, “How is this right, in what way is it true?” If someone approaches a conversation with the idea that he is going to prove that the other is wrong, the other will get the impression that he is not being listened to. And this impression, in this case, is basically correct. A person sees what he is saying as true, not as false, so one who does not see how it could be true does not even understand it. If you say something, I do not even understand you, until I see a way that what you are saying could be so. And on the other hand, if I do approach a conversation with the idea of seeing what is true in the position that is in disagreement with mine, the conversation will be far more likely to end up as one of the good conversations, and far more likely to end in agreement.

Often even if a person is wrong in his conclusion, part of that conclusion is correct, and it is important for someone speaking with him to acknowledge the part that is correct before criticizing the part that is wrong. And on the other hand, even if a person’s conclusion is completely wrong, insofar as that is possible, there will always be some evidence for his conclusion. It is important to acknowledge that evidence, rather than simply pointing out the evidence against his conclusion.

Extraordinary Claims and Extraordinary Evidence

Marcello Truzzi states in an article On the Extraordinary: An Attempt At Clarification“An extraordinary claim requires extraordinary proof.” This was later restated by Carl Sagan as, “Extraordinary claims require extraordinary evidence.” This is frequently used to argue against things such as “gods, ghosts, the paranormal, and UFOs.”

However, this kind of argument, at least as it is usually made, neglects to take into account the fact that claims themselves are already evidence.

Here is one example: while writing this article, I used an online random number generator to pick a random integer between one and a billion inclusive. The number was 422,819,208.

Suppose we evaluate my claim with the standard that extraordinary claims require extraordinary evidence, and neglect to consider the evidence contained within the claim itself. In this case, given that I did in fact pick a number in the manner stated, the probability that the number would be 422,819,208 is one in a billion. So readers should respond, “Either he didn’t pick the number in the manner stated, or the number was not 422,819,208. The probability that both of those were true is one in a billion. I simply don’t believe him.”

There is obviously a problem here, since in fact I did pick the number in the way stated, and that was actually the number. And the problem is precisely leaving out of consideration the evidence contained within the claim itself. Given that I make a claim that I picked a random number between one and a billion, the probability that I would claim 422,819,208 in particular is approximately one in a billion. So when you see me claim that I picked that number, you are seeing evidence (namely the fact that I am making the claim) which is very unlikely in itself. The fact that I made that claim is much more likely, however, if I actually picked that number, rather than some other number. Thus the very fact that I made the claim is strong evidence that I did pick the number 422,819,208 rather than some other number.

In this sense, extraordinary claims are already extraordinary evidence, and thus do not require some special justification.

However, we can consider another case, a hypothetical one. Suppose that in the above paragraphs, instead of the number 422,819,208, I had used the number 500,000,000, claiming that this was in fact the number that I got from the random number generator.

In that case you might have found the argument much less credible. Why?

Assuming that I did in fact pick the number randomly, the probability of picking 422,819,208 is one in a billion. And again, assuming that I did in fact pick the number randomly, the probability of picking 500,000,000 is one in a billion. So no difference here.

But both of those assume that I did pick the number randomly. And if I did not, the probabilities would not be the same. Instead, the fact that simpler things are more probable would come into play. At least with the language and notation that we are actually using, the number 500,000,000 is much simpler than the number 422,819,208. Consequently, assuming that I picked a number non-randomly and then told you about it,  is significantly more probable than one in a billion that I would pick the number 500,000,000, and thus less probable than one in a billion that I would pick 422,819,208 (this is why I said above that the probability of the claim was only approximately one in a billion; because in fact it is even less than that.)

For that reason, if I had actually claimed to have picked 500,000,000, you might well have concluded that the most reasonable explanation of the facts was that I did not actually use the random number generator, or that it had malfunctioned, rather than that the number was actually picked randomly.

This is relevant to the kinds of things where the postulate that “extraordinary claims require extraordinary evidence” is normally used. Consider the claim, “I was down in the graveyard at midnight last night and saw a ghost there.”

How often have you personally seen a ghost? Probably never, and even if you have, surely not many times. And if so, seeing a ghost is not exactly an everyday occurrence. Considered in itself, therefore, this is an improbable occurrence, and if we evaluated the claim without considering the evidence included within the claim itself, we would simply assume the account is mistaken.

However, part of the reason that we know that seeing ghosts is not a common event is that people do not often make such claims. Apparently 18% of Americans say that they have seen a ghost at one time or another. But this still means that 82% of Americans have never seen one, and even most of the 18% presumably do not mean to say that it has happened often. So this would still leave seeing ghosts as a pretty rare event. Consider how it would be if 99.5% of people said they had seen ghosts, but you personally had never seen one. Instead of thinking that seeing ghosts is rare, you would likely think that you were just unlucky (or lucky, as the case may be.)

Instead of this situation, however, seeing ghosts is rare, and claiming to see ghosts is also rare. This implies that the claim to have seen a ghost is already extraordinary evidence that a person in fact saw a ghost, just as my claiming to have picked 422,819,208 was extraordinary evidence that I actually picked that number.

Nonetheless, there is a difference between the case of the ghost and the case of the number between one and a billion. We already know that there are exactly one billion numbers between one and a billion inclusive. So given that I pick a number within this range, the probability of each number must be on average one in a billion. If it is more probable that I would pick certain numbers, such as 500,000,000, it must be less probable that I would pick others, such as 422,819,208. We don’t have an equivalent situation with the case of the ghost, because we don’t know in advance how often people actually see ghosts. Even if we can find an exact measure of how often people claim to see ghosts, that will not tell us how often people lie or are mistaken about it. Thus although we can say that claiming to see a ghost is good evidence of someone actually having seen a ghost, we don’t know in advance whether or not the evidence is good enough. It is “extraordinary evidence,” but is it extraordinary enough? Or in other words, is claiming to have seen a ghost more like claiming to have picked 422,819,208, or is it more like claiming to have picked 500,000,000?

That remains undetermined, at least by the considerations which we have given here. But unless you have good reasons to suspect that seeing ghosts is significantly more rare than claiming to see a ghost, it is misguided to dismiss such claims as requiring some special evidence apart from the claim itself.

100%

Given that probability is a formalization of subjective degree of belief, it would reasonable to consider absolute subjective certainty to correspond to a probability of 100%. Likewise, being absolutely certain that something is false would correspond to assigning it a probability of 0%.

According to Bayes’ theorem, if something has a probability of 100%, that must remain unchanged no matter what evidence is observed, as long as that evidence has a finite probability of being observed. If the probability of the evidence being observed is 0%, then Bayes’ formula results in a division by zero. This happens because a probability of 0% should mean that it is impossible for this evidence to come up, and indicates that one was simply wrong to claim that there was no chance of this, and a different probability should have been assigned.

The fact that logical consistency requires a probability of 100% to remain permanently fixed, no matter what happens, implies that it is generally a bad idea to claim such certainty, even in cases where you have absolute objective certainty such as mathematical demonstration. Thus in the previously cited anecdote about prime numbers, if SquallMage claimed to be absolutely certain that 51 was a prime number, he should never admit that it is not, not even after dividing it by 3 and getting 17. Instead, he should claim that there is a mistake in the derivation showing that it is not prime. Since this is absurd, it follows that in fact he should never have assigned a 100% probability to the claim that the number was prime. And since there was subjectively probably not much difference between 41 and 51 for him at the time, with respect to the claim, neither should he have claimed a 100% probability that 41 was prime.

Mathematics and the Laws of Nature

In his essay The Unreasonable Effectiveness of Mathematics in the Natural Sciences, Eugene Wigner says, “The miracle of the appropriateness of the language of mathematics for the formulation of the laws of physics is a wonderful gift which we neither understand nor deserve.” But in reality, it can be proved that a physical world — a world which has an order of place, with one part beside another, and an order of time, with one thing before another — must of necessity either follow mathematical natural laws, or it must be more or less intentionally designed in order to avoid this.

For example, suppose we attempt to determine how long it takes a ball to fall a certain distance. We do not need any particularly exact method to measure distances; for example, we could be measuring a fall of ten feet, taking foot in the presumably original sense of “the length of an adult human foot,” despite the noisiness of this measure. Nor do we need any particularly exact method to measure time; we could for example measure time in blinks. Something took 10 blinks if it took so long that I blinked 10 times before it was over. This would be even noisier than measuring in feet. But the point is that it does not matter how exact or inexact the measures are. If we have a world with place and time in it, we can find ways to make such measurements, even if they are inexact ones. Nor again do we need a way to get an extremely precise measure in blinks or in feet or in whatever of the physical quantity we are measuring; it is enough if we get a best estimate.

Now suppose we repeatedly measure, in some such way, how long it takes for a ball to fall a certain distance. After we have made many measurements, we can add them together and divide by the total number of measurements, getting an average amount of time for the fall. The question that arises is this: as we increase the number of measurements indefinitely, will that average converge to a finite value? or will it diverge to infinity or go back and forth infinitely many times?

Evidently it will not diverge to infinity. It is difficult to see any reason in principle why it could not go back and forth infinitely many times, for example the average fall time might tend toward 1/4 of a blink for a long time, then start tending toward 1/5 of a blink for a long time, and then go back to 1/4, and so on. But we should notice the kind of pattern that is necessary in order for this to happen. Suppose the average is 1/4 of a blink after 100 measurements. In order to get the average to 1/5, there must be a great many measurements 1/5 or below, or at least many measurements which are very much below 1/5. And the more measurements we have taken to get the average, the more such especially low measures are needed. So if we are at an average of 1/4 of a blink after 1,000,000 measurements, this average will be very stable, and it will require an extremely long series, more or less continuous, of especially low measurements in order to get the average down to 1/5 again. And the length of the “especially low” or “especially high” series which is needed to move the average will be increasing each time we want to move it again. In other words, in order to get the average to go back and forth infinitely many times, we need to have a rather pathological series of measurements, namely one that looks like it was designed intentionally to prevent the series from converging to an average value.

Thus the “natural” result, when things are not designed to prevent convergence to an average, is that such measures of distance and time and basically anything else we might think of measuring, like “how much food does an adult eat in a year”, will always converge to an average value as we increase the number of measurements indefinitely. Given this result it follows that it is possible to express the behavior of the physical world using mathematical laws.

Several things however do not necessarily follow from this:

It does not follow that such laws cannot have “exceptions”, since they are only statistical laws from the beginning, and thus are only expected to work approximately. So it is not possible to rule out miracles in the way supposed by David Hume.

It also does not follow that such laws have to be particularly simple. A simpler law will be more likely than a more complex one, for the reasons given in a previous post, but theoretically the laws governing a falling body could have 500 variables, which would be simpler than ones having 50,000 variables. In practice however this does not tend to be the case, or at least we can find extremely good approximate laws with very few variables. It may simply be the case that in order to have a world with animals in it, the world needs to be fairly predictable to them, and this may require that fairly simple laws work at least as a good approximation. But a mathematical demonstration of this would be extremely difficult, if it turns out to be possible at all.