# SectionA_Group4_Vanamamalai(13PGP058)

Today in our class discussion on sampling we came across a law named “Benford’s Law”. As it sounded interesting I am writing this post to describe what it is about and how is it applicable in detecting statistical frauds in data.

Benford’s law states that the frequency of distribution of digits in many (not all) real scenarios of life begins with number 1 as the leading digit in 30% of the time while larger numbers occur in that position less frequently: 9 as the first digit less than 5% of the time. Benford’s Law also concerns the expected distribution for digits beyond the first, which approach a uniform distribution.

This law was found to be applicable in a wide variety of data sets like addresses, electricity bills, population sizes, areas…Etc. It is more accurate when the values are distributed over across multiple orders of magnitude.

A set of numbers is said to satisfy Benford’s Law if the leading digit d (d ∈ {1… 9}) occurs with probability.

Numerically, the leading digits have the following distribution in Benford’s Law, where d is the leading digit and P(d) the probability

This law can be generalized to any number system with a base greater than or equal to 2.

Application of Benford’s law:

Benford’s law is used to detect statistical frauds. As the law states people who tend to give fraudulent data provide numerical data usually tend to distribute the leading digits contradicting to the law. Thus in any socio-economic, auditing or accounting data we can detect fraud by applying Benford’s law and  finding out the anomalies in the leading digit.

Benford’s law can be expected in a distribution when the mean is greater than the median, the distribution is positively skewed. It can be seen in distribution of numbers which arise as a product of two numbers and in distributions containing transactional data.

However in distributions which are sequential as in case of bills; distributions with maximum and minimum and areas where transactions are not recorded the law cannot be expected to be followed.

Though this law is not universal and applicable to all real scenarios it still plays a major role in evaluating the quality of numeric data in statistics.