More precisely, Benford's Law states that the leading digit n (n = 1, ..., 9) occurs with probability log10(n + 1) − log10(n), or
Mathematical statement
Leading digit | Probability |
---|---|
1 | 30.1 % |
2 | 17.6 % |
3 | 12.5 % |
4 | 9.7 % |
5 | 7.9 % |
6 | 6.7 % |
7 | 5.8 % |
8 | 5.1 % |
9 | 4.6 % |
One can also formulate a law for the first two digits: the probability that the first two-digit block is equal to n (n = 10, ..., 99) is log10(n+1) − log10(n).
That in general the leading digit 1 should be more common than the other digits can be understood as follows: start counting from 1: 1, 2, 3, ... As you reach 9, every digit will have been equally likely. But then, from 10 to 19, you only have the leading digit 1, so 1 gets a huge head start. Only when you reach 99 will all digits be equally likely again. But then 1 gets another huge head start from 100 to 199. And so it continues: 1 has always a lead, except for very rare exceptions (9, 99, 999, 9999, ...).
Perhaps somewhat more precisely, suppose (capital) X is a random variable whose probability of being equal to any positive integer (lower-case) x is a constant times x−s, where s > 1. The aforementioned "constant" must then be 1/ζ(s), where ζ is the Riemann zeta function (see zeta distribution). The probability that the first digit of X is n approaches log10(n + 1) − log10(n) as s approaches 1.
The precise form of Benford's law can be explained if one assumes that the logarithms of the numbers are uniformly distributed; this means that a number is for instance just as likely to be between 100 and 1000 (logarithm between 2 and 3) as it is between 10,000 and 100,000 (logarithm between 4 and 5). For many sets of numbers, especially ones that grow geometrically such as incomes and stock prices, this is a reasonable assumption.
Note that for numbers drawn from many distributions, for example IQ scores, human heights or other variables following normal distributions, the law is not valid. However, if one "mixes" number from those distributions, as occurs for example when taking numbers from newspaper articles, Benford's law reappears. This can be proven mathematically: if one repeatedly "randomly" choses a probability distribution and then randomly choses a number according to that distribution, the resulting list of numbers will obey Benford's law.
Income tax agencies and accounting businessess use Benford's Law to spot fraud, as people who make up figures tend to distribute their digits more uniformly.
The discovery of this fact goes back to 1881, when the American astronomer Simon Newcomb noticed that the first pages of logarithm books (used at that time to perform calculations), the ones containing numbers that started with 1, were much more worn than the other pages. The phenomenon was rediscovered in 1938 by the physicist Frank Benford, who checked it on a wide variety on data sets and was credited for it. In 1996, Ted Hill proved the result about mixed distributions mentioned above.
References:
Explanation
Applications
History
See also: