benford() method

Counts the number of times each leading digit (1–9), or leading digit combination, occurs in a numeric column, and compares the actual count to the expected count. The expected count is calculated using the Benford formula.

Syntax

dataframe_name.benford(on = "numeric_column", leading = number_of_digits, addbounds = True|False)

Parameters

Name	Description
on = "numeric_column"	The numeric column to analyze. Note Select a column that contains "naturally occurring numbers", such as transaction amounts. Benford analysis is not suitable for numeric data that is constrained in any way.
leading = number_of_digits optional	The number of leading digits to analyze. If you omit leading, the default value of 1 is used.
addbounds = True \| False optional	True include computed upper and lower bound values in the output results False do not include upper and lower bound values in the output results If two or more counts in the output results exceed either of the bounds, the data may have been manipulated and should be investigated. If you omit the parameter, upper and lower bound values are not included.

Name

Description

on = "numeric_column"

The numeric column to analyze.

Note

Select a column that contains "naturally occurring numbers", such as transaction amounts. Benford analysis is not suitable for numeric data that is constrained in any way.

leading = number_of_digits

optional

The number of leading digits to analyze.

If you omit leading, the default value of 1 is used.

addbounds = True | False

optional

True include computed upper and lower bound values in the output results
False do not include upper and lower bound values in the output results

If two or more counts in the output results exceed either of the bounds, the data may have been manipulated and should be investigated.

If you omit the parameter, upper and lower bound values are not included.

Returns

HCL dataframe.

Examples

Test a numeric column for leading digit irregularities

You use the benford() method to test the leading two digits in the Amount column for deviation from the expected counts:

accounts_receivable.benford(on = "Amount", leading = 2, addbounds = True)