If you take the measure of anything, such as the height of a mountain, the length of a river, or the price of a company’s share, then the probability that this number begins with a one is about 30 percent. The probability of this number beginning with a nine, on the other hand, is only five percent. This phenomenon is known as “Benford’s Law” and has frequently been used to disprove fake statistics.
Simon Newcomb discovers Benford’s law
In 1881, the Canadian astronomer and mathematician Simon Newcomb discovered a peculiarity which at first seemed trivial. For some of his calculations, he used logarithm tables, i.e. tables that could be used to calculate the logarithm of countless decimal numbers. He noticed that the first pages – for numbers where the first digit was a one – were more worn-out than the later ones. Logically, he concluded that they must have been used more often. From this observation, he deduced the hypothesis that numbers beginning with a one had to occur more frequently than others. He then published his formulated discovery in the American Journal of Mathematics. But although his astute observation would, later on, prove groundbreaking, it was unfortunately ignored and forgotten.
The rediscovery by Frank Benford
Luckily, 57 years later, in 1938, the physicist Frank Benford made the same discovery. He also published it – only this time the hypothesis did not simply slip through the cracks but was even named after him. Even then, however, it would not have been considered terribly well-known, as most statisticians at the time would never hear of Benford’s law. This only changed after US mathematician Theodore Hill discovered how to apply the law now known as Newcomb-Benford’s Law (NBL) to practical problems.
The meaning of Benford’s Law
Benford’s law states how high the probability of the occurrence of a certain digit at a certain position in a number is. For the first digit, for example, the following frequencies apply: One (30.1%), two (17.6%), three (12.5%), four (9.7%), five (7.9%), six (6.7%), seven (5.8%), eight (5.1%) and nine (4.6%).
Naturally, the law is much easier to verify for large data sets than for small ones. In addition, it cannot, of course, be proven in just any data set. But numerical material that is subject to natural growth processes does indeed usually show the so-called Benford distribution.
Since Benford’s law specifies a frequency distribution of individual digits in the numbers of a large data set, it also makes it possible to determine whether numbers may have been manipulated. For example, the accounting fraud at US energy company Enron was discovered using Benford’s law. The company falsified its balance sheets several times and was finally exposed in 2001. Due to this success, Benford’s law today serves tax investigators and auditors as an important basis for the algorithmic verification of economic data.