Performing Benford analysis
Concept Information
Benford analysis counts the number of times each leading digit (1–9) or leading digit combination occurs in a field, and compares the actual count to the expected count.
The expected count, calculated using the Benford formula, provides the Benford distribution. In a naturally occurring set of numbers, the frequency distribution of the actual count of leading digits should approximate the Benford distribution.
If one or more leading digits or digit combinations in the data being tested deviate significantly from the Benford distribution, it may indicate that the numbers have been manipulated. Deviations may also have simple and reasonable explanations and are not necessarily indicative of manipulation.
What data can I test using Benford analysis?
You should only use Benford analysis for testing numeric data composed of "naturally occurring numbers", such as accounting amounts, transaction amounts, expenses, or address numbers. Benford analysis is not suitable for numeric data that is constrained in any way.
Follow these guidelines for identifying numeric data that is suitable for Benford analysis:
- Size of the data set The data set must be large enough to support a valid distribution. Benford analysis may not give reliable results for fewer than 500 records.
- Leading digit requirement All numbers from 1 to 9 must have the possibility of occurring as the leading digit.
- Leading digit combination requirement All numbers from 0 to 9 must have the possibility of occurring as the second leading digit, and as any additional digits being analyzed.
- Constrained data Numeric data that is assigned or generated according to a pre-ordained pattern is not suitable for Benford analysis. For example, do not use Benford to analyze:
- sequential check or invoice numbers
- social security numbers or telephone numbers that map to a specific pattern
- any numbering scheme with a range that prevents certain numbers from appearing
- Random numbers Numbers generated by a random number generator are not suitable for Benford analysis.
Usage details
The table below provides details about using the Benford analysis feature in Analytics.
Number of leading digits | You can analyze up to six leading digits. When analyzing four or more leading digits, Benford analysis output must be sent to a file instead of displayed on screen or sent to a printer. |
---|---|
Processing time | Depending on the number of records you are working with, analyzing five or more leading digits may take several minutes. Regardless of how many digits you are analyzing, you can press Esc to terminate the command at any time. |
Size of data set | Effective Benford analysis requires large data sets. Analytics displays a warning in the results output when a data set may be too small for the specified number of digits. |
Positive and negative values | Anomalous data is more apparent when you analyze positive and negative values separately. You can use a filter to separate the two before beginning your analysis. |
Zeros and non-numeric characters |
Records with values of zero are ignored, but the number of zero-value records bypassed is reported. Leading zeros, numeric formatting such as decimals and dollar signs, other non-numeric digits, and records that fail to meet test criteria are also ignored. If the resulting number of digits is less than specified, Analytics adds zeros to the right of the result. |
Benford analysis output results
Benford analysis produces the following output results:
Leading Digits | Displays the leading digits that were tested. For example, if you specify one leading digit, the numbers 1 to 9 are displayed. If you specify two leading digits, the numbers 10 to 99 are displayed. |
---|---|
Actual Count | Displays the actual count of each leading digit or leading digit combination in the field. |
Expected Count | Displays the expected count of each leading digit or leading digit combination calculated by the Benford formula. |
Zstat Ratio | Displays the Z-Stat ratio for each digit or digit combination, which is a measurement in standard deviations of the distance between the actual count and the expected count. For example, a Z-statistic of 0.500 represents one-half of a standard deviation. |
Lower Bound Upper Bound (optional) |
Displays the computed lower and upper bound values for the count of each leading digit or digit combination. If the actual count of more than one digit or digit combination in the output results exceeds either of the bounds, the data may have been manipulated and should be investigated. Note The Lower Bound and Upper Bound values are included only if the Include Upper and Lower Bounds checkbox is selected in the Benford dialog box. |
Steps
Perform Benford analysis on a field to discover if one or more leading digits or digit combinations deviate significantly from the Benford distribution.
- Open the table containing the field you want to analyze.
- Select .
- On the Main tab, do one of the
following:
Select the field to analyze from the Benford On drop-down list.
Click Benford On to select the field, or to create an expression.
Note
Select a field that contains "naturally occurring numbers", such as transaction amounts. Benford analysis is not suitable for numeric data that is constrained in any way. For more information, see What data can I test using Benford analysis?
- Enter the Number of Leading Digits,
from 1 to 6, that you want to analyze.
Note
If you are analyzing four or more leading digits, results output must be sent to a file. Results of analyzing four or more digits cannot be displayed on the screen, sent to the printer, or displayed in a graph.
-
If there are records in the current view that you want to exclude from processing, enter a condition in the If text box, or click If to create an IF statement using the Expression Builder.
Note
The If condition is evaluated against only the records remaining in a table after any scope options have been applied (First, Next, While).
The IF statement considers all records in the view and filters out those that do not meet the specified condition.
- (Optional) Select Include Upper and Lower Bounds if you want to include computed boundary values in the output results for each digit or digit combination.
- Click the Output tab.
-
Select the appropriate output option in the To panel:
- Screen – Select this option to display the results in the Analytics display area.
Tip
You can click any linked result value in the display area to drill down to the associated record or records in the source table.
If the output table contains a large number of records, it is faster and more useful to save the results to a file than to display the results on the screen.
- Print – Select this option to send the results to the default printer.
- Graph – Select this option to create a graph of the results and display it in the Analytics display area.
- File – Select this option to save or append the results to a text file. The file is saved outside Analytics.
Note
Output options that do not apply to a particular analytical operation are disabled.
- Screen – Select this option to display the results in the Analytics display area.
-
If you selected File as the output type, specify the following information in the As panel:
- File Type – ASCII Text File or Unicode Text file (depending on which edition of Analytics you are using) is the only option. Saves the results to a new text file, or appends the results to an existing text file.
- Name – Enter a file name in the Name text box. Or click Name and enter the file name, or select an existing file in the Save or Save File As dialog box to overwrite or append to the file. If Analytics prefills a file name, you can accept the prefilled name, or change it.
You can also specify an absolute or relative file path, or navigate to a different folder, to save or append the file in a location other than the project location. For example: C:\Results\Output.txt or Results\Output.txt.
- Local – Disabled and selected. Saving the file locally is the only option.
-
Depending on the output type, you can optionally specify a Header and/or a Footer in the text box(es).
Headers and footers are centered by default. Type a left angle bracket (<) before the header or footer text to left align the text. Click Header or Footer to enter a header or footer of more than one line. Alternatively, you can enter a semi-colon (;) as a line-break character in the header or footer text box. Left aligning multiple lines requires a left angle bracket at the beginning of each line.
- Click the More tab.
-
Select the appropriate option in the Scope panel:
- All
- First
- Next
- While
Show me moreAll This option is selected by default. Leave it selected to specify that all records in the view are processed. First Select this option and enter a number in the text box to start processing at the first record in the view and include only the specified number of records. Next Select this option and enter a number in the text box to start processing at the currently selected record in the view and include only the specified number of records. The actual record number in the leftmost column must be selected, not data in the row. While Select this option to use a WHILE statement to limit the processing of records in the view based on a particular criterion or set of criteria. You can enter a condition in the While text box, or click While to create a WHILE statement using the Expression Builder.
A WHILE statement allows records in the view to be processed only while the specified condition evaluates to true. As soon as the condition evaluates to false, the processing terminates, and no further records are considered. You can use the While option in conjunction with the All, First, or Next options. Record processing stops as soon as one limit is reached.
Note
The number of records specified in the First or Next options references either the physical or the indexed order of records in a table, and disregards any filtering or quick sorting applied to the view. However, results of analytical operations respect any filtering.
If a view is quick sorted, Next behaves like First.
- If you selected File as the output type, and want to append the output results to the end of an existing text file, select Append To Existing File.
- If you selected File as the output
type, and want to append the output results to the end of an existing Analytics table,
do one of the following:
Select Append To Existing File if you are certain the output results and the existing table are identical in structure.
Leave Append To Existing File deselected if you want Analytics to compare the record lengths of the output results and the existing table. If the record lengths are not identical, the data structure is not identical, and the append will not work correctly.
Note
Leaving Append To Existing File deselected is recommended if you are uncertain whether the output results and the existing table have an identical data structure. For more information about appending and data structure, see Appending output results to an existing table.
- Click OK.
- If the overwrite prompt appears, select the appropriate
option.
If you are expecting the Append option to appear and it does not, click No to cancel the operation and see Appending output results to an existing table.