About sample selection methods

Sample selection methods are the specific methods used to select the records contained in a sample. ACL supports three sample selection methods: fixed interval, cell, and random. Each selection method can be used with either record or monetary unit sampling.

Note

All sample selection methods require input of some of the variables generated by calculating sample size.

Fixed interval

In fixed interval sampling, you specify the selection interval that was generated when you calculated the sample size, and a random start number. The random start number must be greater than zero and less than or equal to the selection interval. For example, if you choose 723 as the random start number and 1100 as the interval, item number 723 is selected for inclusion in the sample, followed by 1823, 2923, 4023, and so on.

If you are using monetary unit sampling, any item greater than the top stratum cutoff is automatically selected.

Note

If you plan to evaluate the effect of errors in a monetary unit sample, you must use fixed interval sampling to obtain accurate results.

When you use fixed interval sampling, you must be conscious of patterns in the data. Because a fixed interval is used for sample selection, a nonrepresentative sample can be selected if the data has a pattern that coincides with the interval you specify. For example, you sample expenses with an interval of $100,000 and the monthly expenses that you test are also approximately $100,000. In this situation, it is possible that the same expense category will be selected for every selection because this category appears at hundred-thousand dollar intervals in the file. This type of scenario is uncommon, but you should be aware of the potential.

Cell

Cell sampling, also called random interval sampling, is an interval selection method. In cell sampling you specify the selection interval that was generated when you calculated the sample size, and a random seed. The random seed is an arbitrary number used by ACL to generate a series of random numbers that are greater than zero and less than or equal to the size of the interval. The item represented by this random number is then selected, and the process is repeated for the group of items or records in the next interval. For example, if the interval is 1000 and the random seed is 254, item 429 might be selected from the first group of 1000 items, then item 1,844 from the second group, and so on.

Note

Every unique seed results in a different random sequence, while repeating the same seed generates the same random sequence. Therefore, to replicate the sample selection, you must specify the same random seed.

The main advantage of cell sampling over fixed interval sampling is that it automatically avoids problems relating to patterns in the data. A disadvantage is that, for monetary unit sampling, the entries selected in cell sampling might not be as consistent as those selected in fixed interval sampling. This lack of consistency occurs because an item can span the dividing point between two groups, and therefore appear in two different groups for sampling purposes. One implication of this is that the same entry can be selected twice. Also, if you are using monetary unit sampling, high value items that are less than the top stratum cutoff have a slightly reduced chance of being selected.

Random

In random sampling, you specify the number of items to select, a random seed, and the data set, which is the total number of records from which the sample is selected. ACL uses the random seed to initialize a random number generator.

Note

Every unique seed results in a different random sequence, while repeating the same seed generates the same random sequence. Therefore, to replicate the sample selection, you must specify the same random seed.

ACL does not generate the same random number twice. If more than one random number of the same value is generated, it is discarded and replaced by a new one. Remember that for monetary unit samples, the item selected is actually a cent, not a dollar, so it is unlikely that any numbers in a monetary unit sample will be discarded. The implication of this is that in record sampling the same record will not be selected twice, but in monetary unit sampling, the same record might be selected more than once.

When the list of selections has been established, ACL selects those specific items for inclusion in the sample. For example, if the data set is 1000, the sample size is 5, and the random seed is 983, ACL might generate the numbers 244, 261, 339, 874, and 985. These items would then be selected.

If you use random sampling, be aware that while each item has an equal chance of selection, there is no guarantee that the results will be evenly distributed. In the previous example, there is a gap between 339 and 874, meaning that no selections were made from more than 500 items. An equivalent fixed-interval sample would ensure that no gap exceeded 200. There is also no top stratum cutoff in random sampling. If our example were a monetary unit sample, it would be possible for one item representing over half of the file to not be selected at all if it fell in the gap noted above. Because there is no way to prevent selection of numbers that are “close” as opposed to “the same” in monetary unit samples, the same entry might be selected more than once, or even many times.

Related concepts
Sampling data
About sampling types
About calculating sample sizes
Monetary unit sampling options
About evaluating sampling errors
Related tasks
Sampling transaction records
Sampling monetary unit records


(C) 2013 ACL Services Ltd. All Rights Reserved. | Send feedback