CVSPREPARE command

Stratifies a population, and calculates a statistically valid sample size for each stratum, for classical variables sampling.

Syntax

CVSPREPARE ON book_value_field NUMSTRATA number MINIMUM minimum_strata_sample_size PRECISION value CONFIDENCE confidence_level <CUTOFF value> <BCUTOFF value> NCELLS number PLIMIT {BOTH|UPPER|LOWER} ERRORLIMIT number <IF test> <MINSAMPSIZE minimum_sample_size> TO {SCREEN|filename}

Parameters

Note

Do not include thousands separators, or percentage signs, when you specify values.

Name Description
ON book_value_field The numeric book value field to use as the basis for preparing the classical variables sample.
NUMSTRATA number

The number of strata to use for numerically stratifying the book_value_field.

The minimum number of strata is 1, and the maximum is 256.

If you specify NUMSTRATA 1, and do not specify a CUTOFF, the population is unstratified prior to drawing a sample.

Note

The number of strata cannot exceed 50% of the number of cells specified for NCELLS.

MINIMUM minimum_strata_sample_size

The minimum number of records to sample from each stratum.

Leave the default of zero (0) if you do not have a specific reason for specifying a minimum number.

PRECISION value

The monetary amount that is the difference between the tolerable misstatement and the expected misstatement in the account.

  • Tolerable misstatement the maximum total amount of misstatement that can occur in the sample field without being considered a material misstatement
  • Expected misstatement the total amount of misstatement that you expect the sample field to contain

The precision establishes the range of acceptability for an account to be considered fairly stated.

Reducing the precision decreases the range of acceptability (the margin of error) which requires an increased sample size.

CONFIDENCE confidence_level

The desired confidence level that the resulting sample is representative of the entire population.

For example, specifying 95 means that you want to be confident that 95% of the time the sample will in fact be representative. Confidence is the complement of "sampling risk". A 95% confidence level is the same as a 5% sampling risk.

  • If PLIMIT is BOTH, the minimum confidence level is 10%, and the maximum is 99.5%.
  • If PLIMIT is UPPER or LOWER, the minimum confidence level is 55%, and the maximum is 99.5%.
CUTOFF value

optional

A top certainty stratum cutoff value.

Amounts in the book_value_field greater than or equal to the cutoff value are automatically selected and included in the sample.

If you omit CUTOFF, a default cutoff value equal to the maximum amount in the book_value_field is used, and no records are included in the top certainty stratum.

BCUTOFF value

optional

A bottom certainty stratum cutoff value.

Amounts in the book_value_field less than or equal to the cutoff value are automatically selected and included in the sample.

If you omit BCUTOFF, a default cutoff value equal to the minimum amount in the book_value_field is used, and no records are included in the bottom certainty stratum.

NCELLS number

The number of cells to use for pre-stratifying the book_value_field.

Cells are narrower numeric divisions than strata. Pre-stratification is part of an internal process that optimizes the position of strata boundaries. Cells are not retained in the final stratified output.

The minimum number of cells is 2, and the maximum is 999.

Note

The number of cells must be at least twice (2 x) the number of strata specified for NUMSTRATA.

PLIMIT BOTH | UPPER | LOWER

The type of precision limit to use.

  • BOTH – specify this option if:
    • the account as a whole could be either overstated or understated
    • you are interested in estimating whether misstatement in either direction exceeds the specified PRECISION
  • UPPER – specify this option if:
    • the account as a whole is likely to be understated
    • you are only interested in estimating whether the total amount of understatement exceeds the specified PRECISION
  • LOWER – specify this option if:
    • the account as a whole is likely to be overstated
    • you are only interested in estimating whether the total amount of overstatement exceeds the specified PRECISION

    Caution

    Specify BOTH if you are not sure which option to specify.

ERRORLIMIT number

The minimum number of errors you expect in the sample.

Note

If the actual number of errors you find when you analyze the sample is less than the ERRORLIMIT number, the only evaluation method available is mean-per-unit.

IF test

optional

A conditional expression that must be true in order to process each record. The command is executed on only those records that satisfy the condition.

Caution

If you specify a conditional expression, an identical conditional expression must be used during both the calculation of the sample size, and the drawing of the sample.

If you use a condition at one stage and not the other, or if the two conditions are not identical, the sampling results will probably not be statistically valid.

MINSAMPSIZE minimum_sample_size

optional

The minimum number of records to sample from the entire population.

Leave the default of zero (0) if you do not have a specific reason for specifying a minimum number.

TO SCREEN | filename

The location to send the results of the command to:

  • SCREEN displays the results in the Analytics display area

    Tip

    You can click any linked result value in the display area to drill down to the associated record or records in the source table.

  • filename saves the results to a file

    Specify filename as a quoted string with the appropriate file extension. For example: TO "Output.TXT"

    By default, the file is saved to the folder containing the Analytics project.

    Use either an absolute or relative file path to save the file to a different, existing folder:

    • TO "C:\Output.TXT"
    • TO "Results\Output.TXT"

Analytics output variables

Name Contains
CONFIDENCE The confidence level specified by the user.
ERRLIMIT The minimum number of errors specified by the user.
NSTRATA The number of strata specified by the user.
PLIMIT The type of precision limit specified by the user.
S_IF A conditional expression specified by the user
S_TOP The top certainty stratum cutoff value specified by the user, or if none was specified, the upper boundary of the top stratum calculated by the command.
SAMPLEFIELD The book value field specified by the user.
SBOTTOM The bottom certainty stratum cutoff value specified by the user, or if none was specified, the lower boundary of the bottom stratum calculated by the command.
SBOUNDARY All strata upper boundaries calculated by the command. Does not include top or bottom certainty strata.
SPOPULATION The count of the number of records in each stratum, and the total monetary value for each stratum. Does not include top or bottom certainty strata.
SSAMPLE The sample size for each stratum calculated by the command. Does not include top or bottom certainty strata.

Examples

Prepare a classical variables sample

You have decided to use classical variables sampling to estimate the total amount of monetary misstatement in an account containing invoices.

Before drawing the sample, you must first stratify the population, and calculate a statistically valid sample size for each stratum.

You want to be confident that 95% of the time the sample drawn by Analytics will be representative of the population as a whole.

Using your specified confidence level, the example below stratifies a table based on the invoice_amount field, and calculates the sample size for each stratum and the top certainty stratum:

CVSPREPARE ON invoice_amount NUMSTRATA 5 MINIMUM 0 PRECISION 928003.97 CONFIDENCE 95.00 CUTOFF 35000 NCELLS 50 PLIMIT BOTH ERRORLIMIT 6 MINSAMPSIZE 0 TO SCREEN

Remarks

For more information about how this command works, see Preparing a classical variables sample.

Numeric length limitation

Several internal calculations occur during the preparation stage of classical variables sampling. These calculations support numbers with a maximum length of 17 digits. If the result of any calculation exceeds 17 digits, the result is not included in the output, and you cannot continue with the sampling process.

Note that source data numbers of less than 17 digits can produce internal calculation results that exceed 17 digits.