Scott Fortmann-Roe explains the difference between error due to bias and error due to variance:
- Error due to Bias: The error due to bias is taken as the difference between the expected (or average) prediction of our model and the correct value which we are trying to predict. Of course you only have one model so talking about expected or average prediction values might seem a little strange. However, imagine you could repeat the whole model building process more than once: each time you gather new data and run a new analysis creating a new model. Due to randomness in the underlying data sets, the resulting models will have a range of predictions. Bias measures how far off in general these models’ predictions are from the correct value.
- Error due to Variance: The error due to variance is taken as the variability of a model prediction for a given data point. Again, imagine you can repeat the entire model building process multiple times. The variance is how much the predictions for a given point vary between different realizations of the model.
Later in the essay, he suggests that there is a natural tendency to overemphasize minimizing bias:
A gut feeling many people have is that they should minimize bias even at the expense of variance. Their thinking goes that the presence of bias indicates something basically wrong with their model and algorithm. Yes, they acknowledge, variance is also bad but a model with high variance could at least predict well on average, at least it is not fundamentally wrong.
This is mistaken logic. It is true that a high variance and low bias model can perform well in some sort of long-run average sense. However, in practice modelers are always dealing with a single realization of the data set. In these cases, long run averages are irrelevant, what is important is the performance of the model on the data you actually have and in this case bias and variance are equally important and one should not be improved at an excessive expense to the other.
Fortmann-Roe is concerned here with bias and variance in a precise mathematical sense, relative to the project of fitting a curve to a set of data points. However, his point could be generalized to apply much more generally, to interpreting and understanding the world overall. Tyler Cowen makes such a generalized point:
Arnold Kling summarizes Robin’s argument:
If you have a cause, then other people probably disagree with you (if nothing else, they don’t think your cause is as important as you do). When other people disagree with you, they are usually more right than you think they are. So you could be wrong. Before you go and attach yourself to this cause, shouldn’t you try to reduce the chances that you are wrong? Ergo, shouldn’t you work on trying to overcome bias? Therefore, shouldn’t overcoming bias be your number one cause?
Here is Robin’s very similar statement. I believe these views are tautologically true and they simply boil down to saying that any complaint can be expressed as a concern about error of some kind or another. I cannot disagree with this view, for if I do, I am accusing Robin of being too biased toward eliminating bias, thus reaffirming that bias is in fact the real problem.
I find it more useful to draw an analogy with statistics. Biased estimators are one problem but not the only problem. There is also insufficient data, lazy researchers, inefficient estimators, and so on. Then I don’t see why we should be justified in holding a strong preference for overcoming bias, relative to other ends.
Tyler is arguing, for example, that someone may be in error because he is biased, but he can also be in error because he is too lazy to seek out the truth, and it may be more important in a particular case to overcome laziness than to overcome bias.
This is true, no doubt, but we can make a stronger point: In the mathematical discussion of bias and variance, insisting on a completely unbiased model will result in a very high degree of variance, with the nearly inevitable consequence of a higher overall error rate. Thus, for example, we can create a polynomial which will go through every point of the data exactly. Such a method of predicting data is completely unbiased. Nonetheless, such a model tends to be highly inaccurate in predicting new data due to its very high variance: the exact curve is simply too sensitive to the exact points found in the original data. In a similar way, even in the more general non-mathematical case, we will likely find that insisting on a completely unbiased method will result in greater error overall: the best way to find the truth may be to adopt a somewhat simplified model, just as in the mathematical case it is best not to try to fit the data exactly. Simplifying the model will introduce some bias, but it will also reduce variance.
To the best of my knowledge, no one has a demonstrably perfect method of adopting the best model, even in the mathematical case. Much less, therefore, can we come up with a perfect trade-off between bias and variance in the general case. We can simply use our best judgment. But we have some reason for thinking that there must be some such trade-off, just as there is in the mathematical case.