Simon Newcomb (1835-1909), the Canadian-American astronomer and mathematician, published in 1881 a "Note on the Frequency of Use of the Different Digits in Natural Numbers." For Newcomb, natural numbers were those occurring "in nature," i.e. the kind of numbers one would run into in the course of everyday life. He discovered, for example, that not all the digits (1, 2, ..., 9) occur with the same frequency in the first place of such a number; he formulated a law (see below) and gave a rough proof which I will attempt to present. This law was rediscovered by Frank Benford ("The law of anomalous numbers," 1938) and is now somewhat unfairly known as "Benford's Law." A mathematically sound and complete proof was published by Theodore Hill in 1995.
To get some experimental feeling for the phenomenon, I looked at all the numbers given as numerals in the first 15 pages of the New York Times for Saturday, February 21, 2009. I omitted dates and advertisements, and repeats (in the same context, in captions or in tables). For each of those 213 numbers I recorded the first digit, and tabulated the data as follows:
digit | occurrences | frequency |
1 | 56 | .26 |
2 | 48 | .23 |
3 | 27 | .13 |
4 | 20 | .09 |
5 | 30 | .14 |
6 | 11 | .05 |
7 | 8 | .04 |
8 | 9 | .04 |
9 | 4 | .02 |
For some of the flavor of Newcomb's "natural number" concept, here are the 8 numbers from this set with initial digit 7:
page | number | reference |
A3 | 71 | age of Jane Fonda |
A10 | 70,000 | illegal gambling proceeds, Wilkes-Barre, Pa. |
A12 | 787 billion | U.S. economic stimulus package, 2/09 |
A12 | 7,365.67 | Dow Jones Industrial Average, 2/20/09 |
A14 | 7.2 | magnitude of hypothetical earthquake | A14 | 744,000 | population of San Francisco |
A14 | 71 | age of Senator Ronald W. Burris |
A15 | 70,000 | low-end starting salary for butler, New York City |
Clearly the distribution is very unsymmetrical. Newcomb tells us how he was led to his discovery: "That the ten digits do not occur with equal frequency must be evident to anyone making much use of logarithmic tables, and noticing how much faster the first pages wear out than the last ones. The first significant figure is oftener 1 than any other digit, and the frequency diminishes up to 9." The place where he noticed the phenomenon gave him a clue to its explanation, which he formulated thus:
The law of probability of the occurrence of numbers is such that all mantissae of their logarithms are equally probable.
|
Newcomb first argues that all his "natural numbers" are ratios. This makes sense because most natural numbers are given in units, and the number exhibited is the ratio of some measurement to the same measurement taken on some more or less arbitrary token, e.g. the standard kilogram, the solar year. Then he argues that the set of natural numbers must be closed under further formation of ratios, i.e. under multiplication and division. This implies that the set of logarithms of natural numbers is closed under addition and subtraction; and in particular that the set of mantissae of logarithms of natural numbers is closed under addition and subtraction modulo 1, since as in the example above, when a sum of mantissae is greater than 1 the integer part is moved over to the characteristic; and similarly when it is less than -1. In Newcomb's words: "Since these exponents [the mantissae] are formed by casting off all the integers from a series of numbers, we may suppose them arranged around a circle ..." where we can add and subtract them like angles, except modulo 1 instead of modulo 2π.
Next Newcomb asks the question (translated into our notation): Given a number of points on the circle distributed "according to any arbitrary law," choose n of them at random, say s1, s2, ... sn and form the sum s1 ± s2 ± ... ±sn (modulo 1). What is the probability that this sum will be contained in a given interval of length ds? And he answers: "It is evident that, whatever may be the original law of arrangement," the set of such sums "will approach to an equal distribution around the circle as n is increased," or, in other words, "the required probability will be equal to ds." In other words, The law of probability of the occurrence of numbers is such that all mantissae of their logarithms are equally probable.
This is not evident, but it is plausible. The following figure shows a small simulation of the phenomenon. Here just two "mantissae" s and t, corresponding say to natural numbers m and n, are chosen; the mantissae corresponding to the products minj are plotted around the circle of numbers modulo 1, for i, j running from 0 to 8. Comparison with the logarithms of numbers starting with 1, 2, etc. suggests an explanation for the distribution of these numbers among natural numbers.
a. An illustration of the equal distribution phenomenon Newcomb refers to. Here two numbers s and t are chosen on the circle of circumference 1 (I took numbers corresponding to angles 41o and 95o); the green angles correspond to all the numbers of the form i s + j t (modulo 1), for i and j integers between 0 and 8. b. The mantissae corresponding to the integers 1, 2, ..., 9. This is the same display that occurs on a circular slide-rule (see below).
Part of a circular slide-rule designed by John W. Mauchly. Mauchly was one of the designers of the ENIAC, the first large-scale general-purpose electronic computer. There was presumably another, smaller, paper disc with similar gradations that could rotate on top of this one, and probably a rotating pointer for keeping track of locations. Image courtesy of University of Pennsylvania Libraries.
It took more than a hundred years for a satisfactory explanation of Newcomb's observation. The main stumbling block was the lack of a precise mathematical concept corresponding to Newcomb's "natural numbers." Theodore Hill realized that base-invariance was the key property: the uniform distribution of mantissae of natural numbers in any base (not only in base 10); this had been already been remarked by Newcomb. As Hill states it, "there is a unique countably-additive base-invariant probability measure on the positive reals."
Theodore P. Hill, Base-invariance implies Benford's law, Proceedings of the A. M. S. 123 (1995) 887-895
Simon Newcomb, Note on the Frequency of Use of the Different Digits in Natural Numbers, American Journal of Mathematics 4 (1881) 39-40