Performing Benford analysis

Concept Information

BENFORD command

Benford analysis counts the number of times each leading digit (1–9) or leading digit combination occurs in a field, and compares the actual count to the expected count.

The expected count, calculated using the Benford formula, provides the Benford distribution. In a naturally occurring set of numbers, the frequency distribution of the actual count of leading digits should approximate the Benford distribution.

If one or more leading digits or digit combinations in the data being tested deviate significantly from the Benford distribution, it may indicate that the numbers have been manipulated. Deviations may also have simple and reasonable explanations and are not necessarily indicative of manipulation.

What data can I test using Benford analysis?

You should only use Benford analysis for testing numeric data composed of "naturally occurring numbers", such as accounting amounts, transaction amounts, expenses, or address numbers. Benford analysis is not suitable for numeric data that is constrained in any way.

Follow these guidelines for identifying numeric data that is suitable for Benford analysis:

  • Size of the data set The data set must be large enough to support a valid distribution. Benford analysis may not give reliable results for fewer than 500 records.
  • Leading digit requirement All numbers from 1 to 9 must have the possibility of occurring as the leading digit.
  • Leading digit combination requirement All numbers from 0 to 9 must have the possibility of occurring as the second leading digit, and as any additional digits being analyzed.
  • Constrained data Numeric data that is assigned or generated according to a pre-ordained pattern is not suitable for Benford analysis. For example, do not use Benford to analyze:
    • sequential check or invoice numbers
    • social security numbers or telephone numbers that map to a specific pattern
    • any numbering scheme with a range that prevents certain numbers from appearing
  • Random numbers Numbers generated by a random number generator are not suitable for Benford analysis.

Usage details

The table below provides details about using the Benford analysis feature in Analytics.

Number of leading digits You can analyze up to six leading digits. When analyzing four or more leading digits, Benford analysis output must be sent to a file instead of displayed on screen or sent to a printer.
Processing time Depending on the number of records you are working with, analyzing five or more leading digits may take several minutes. Regardless of how many digits you are analyzing, you can press Esc to terminate the command at any time.
Size of data set Effective Benford analysis requires large data sets. Analytics displays a warning in the results output when a data set may be too small for the specified number of digits.
Positive and negative values Anomalous data is more apparent when you analyze positive and negative values separately. You can use a filter to separate the two before beginning your analysis.
Zeros and non-numeric characters

Records with values of zero are ignored, but the number of zero-value records bypassed is reported.

Leading zeros, numeric formatting such as decimals and dollar signs, other non-numeric digits, and records that fail to meet test criteria are also ignored. If the resulting number of digits is less than specified, Analytics adds zeros to the right of the result.

Benford analysis output results

Benford analysis produces the following output results:

Leading Digits Displays the leading digits that were tested. For example, if you specify one leading digit, the numbers 1 to 9 are displayed. If you specify two leading digits, the numbers 10 to 99 are displayed.
Actual Count Displays the actual count of each leading digit or leading digit combination in the field.
Expected Count Displays the expected count of each leading digit or leading digit combination calculated by the Benford formula.
Zstat Ratio Displays the Z-Stat ratio for each digit or digit combination, which is a measurement in standard deviations of the distance between the actual count and the expected count. For example, a Z-statistic of 0.500 represents one-half of a standard deviation.

Lower Bound

Upper Bound

(optional)

Displays the computed lower and upper bound values for the count of each leading digit or digit combination.

If the actual count of more than one digit or digit combination in the output results exceeds either of the bounds, the data may have been manipulated and should be investigated.

Note

The Lower Bound and Upper Bound values are included only if the Include Upper and Lower Bounds checkbox is selected in the Benford dialog box.

Steps

Perform Benford analysis on a field to discover if one or more leading digits or digit combinations deviate significantly from the Benford distribution.