Testing for gaps

Concept Information

GAPS command

Gaps in sequentially ordered numeric or datetime fields could indicate a data file is incomplete. You can test for gaps in sequentially ordered values in a field, and identify one or more gaps or missing items, if they exist.

For results to be valid, the field being tested must be in sequential order prior to testing. You can sort a field in advance, or use the Presort option during the gaps test.

You can test numeric or datetime fields, or numbers in character fields. You can test only one field at a time.

List gap ranges or list missing items

You have two options when outputting the results of gaps testing:

  • List Gap Ranges
  • List Missing Items

List Gap Ranges

This option identifies the start and end points of a gap, and the total number of missing items in the gap.

The values that identify the start and end points of the gap are not missing items themselves. They are the sequential values that appear immediately before and immediately after the gap – labeled Gap Start (Exclusive) and Gap End (Exclusive). For example, check numbers 12345 and 12350 would identify a gap with four missing items between those two numbers.

List Missing Items

This option lists the individual missing items in a gap. For example, check numbers 12346, 12347, 12348, and 12349.

These missing items are calculated values and do not actually appear in the data being tested. When using this option, the Maximum Missing Items field allows you to specify the maximum number of missing items to list individually for each gap, which is useful if a gap is large. If the maximum is exceeded, Analytics uses the range method of identifying gaps instead, with the one difference that the values identifying the start and end points of the gap are the first and last missing items, sequentially – labeled Gap Start (Inclusive) and Gap End (Inclusive).

Note

The number you specify in Maximum Missing Items applies on a per-gap basis. It does not limit the total number of missing item results across a data set, whether listed individually or by range.

When using the missing items method, the results can contain a mix of individual missing items and ranges depending on the value in the Maximum Missing Items field and the size of the different gaps.

Testing numeric data for gaps

When you test numeric data for gaps, the number of decimal places in the data governs the allowable interval in the data:

  • numeric data contains only integers (no decimal portion) the allowable interval is 1

    An interval greater than 1 is a gap. For gaps reported as ranges, the number of missing items is the number of missing integers.

  • numeric data contains decimal places the allowable interval is equivalent to the smallest decimal interval

    For example, if a numeric field has two decimal places, the allowable interval is 0.01. An interval greater than the smallest decimal interval is a gap. For gaps reported as ranges, the number of missing items is the number of missing decimal intervals.

Examples of testing numeric data for gaps

In the first example the numeric data contains only integers. The allowable interval is 1.

Test values

Missing items

Number of missing items

  -2

  -1

   0

   1

   4

   5

15

2

3

6 (to) 14 (Inclusive)

1 (integer)

1 (integer)

9 (integers)

In the second example the numeric data contains two decimal places. The allowable interval is 0.01.

Test values

Missing items

Number of missing items

4.24

4.25

4.26

4.29

4.30

5.00

4.27

4.28

4.31 (to) 4.99 (Inclusive)

1 (0.01 interval)

1 (0.01 interval)

69 (0.01 intervals)

Testing datetime data for gaps

You can test date, datetime, or time data for gaps:

  • The allowable interval in date fields is one day

    An interval greater than one day is a gap. For gaps reported as ranges, the number of missing items is the number of missing days.

  • The allowable interval in datetime or time fields is one second

    An interval greater than one second is a gap. For gaps reported as ranges, the number of missing items is the number of missing seconds. So a gap of one hour would be reported as a range with 3,600 missing items, and a gap of one day would be reported as a range with 86,400 missing items.

Examples of testing dates and datetimes for gaps

In the first example the data contains only dates. The allowable interval is one day.

Test values

Missing items

Number of missing items

27 Dec 2014

28 Dec 2014

31 Dec 2014

01 Jan 2015

02 Jan 2015

12 Jan 2015

13 Jan 2015

29 Dec 2014

30 Dec 2014

03 Jan 2015 (to) 11 Jan 2015 (Inclusive)

1 (day)

1 (day)

9 (days)

In the second example the data contains datetimes. The allowable interval is one second.

Test values

Missing items

Number of missing items

31 Dec 2014 23:59:54

31 Dec 2014 23:59:55

31 Dec 2014 23:59:58

31 Dec 2014 23:59:59

01 Jan 2015 01:00:00

01 Jan 2015 01:00:01

02 Jan 2015 01:00:02

31 Dec 2014 23:59:56

31 Dec 2014 23:59:57

01 Jan 2015 00:00:00 (to) 01 Jan 2015 00:59:59 (Inclusive)

01 Jan 2015 01:00:02 (to) 02 Jan 2015 01:00:01 (Inclusive)

1 (second)

1 (second)

3,600 (seconds)

86,400 (seconds)

Testing numeric data in a character field for gaps

You can test for gaps in numeric data that appears in a character field – for example, check numbers, which are typically formatted as character data.

If letters and numbers appear together in a character field, only the numbers are tested and the letters are ignored.

Examples of testing numbers in a character field for gaps

Note how the alpha prefixes are ignored, and only the numbers are considered.

Test values

Missing items

Number of missing items

A123

C124

 

0 (character number)

A123

B125

124

1 (character number)

Sorting of character fields can affect gaps testing

Depending on the arrangement of letters and numbers in character field values, anomalies may exist among the results of gaps testing. For example, if some numbers are prefaced with a letter and some are not, or in the non-Unicode edition of Analytics if some preceding letters are lowercase and some are uppercase, results may not be accurate.

The reason for the inaccuracy is that the inconsistent presence of alpha characters, or the inconsistent case of alpha characters, prevents the numbers being fully sequentially ordered by the Presort option. In the table below, 126 and 127, and 124, are not actually missing items, but because of the way the alphanumeric strings are sorted they are returned as missing items.

If you suspect an anomaly exists, perform a separate sort operation on the field in question to reveal the sequence of character field values being tested for gaps. If sequential numeric order is being disrupted by the presence of letters, you can ensure valid results by using an Analytics function such as INCLUDE( ) to strip out the letters before testing for gaps.

Examples of incorrect gaps results

Note how the inconsistent presence of alpha characters, or the inconsistent case of alpha characters, is causing items to be incorrectly reported as missing.

Test values

Missing items

Number of missing items

123

124

125

128

129

A-126

A-127

126

127

1 (character number)

1 (character number)

A-123

a-124

A-125

A-128

A-129

A-126

A-127

124

1 (character number)

Steps

You can test a single field at a time in the active table to detect whether sequentially ordered numbers or datetime values contain any gaps.