Analyzing data

"Analyzing data" is a broad concept that encompasses a large range of different processes and techniques. There may be more than one way to achieve the same data analysis objective. The overall process is often iterative, requiring that you modify your initial approach based upon information you discover along the way.

Effective data analysis

At its most basic, analyzing data is the process of finding answers to questions about data. Analytics provides a number of commands and other tools that you can use to gain general insights about the data you are investigating, and to answer specific questions. However, you should resist the notion that you can click two or three buttons in Analytics and magically have all your data analysis answers.

Effective data analysis requires:

understanding the nature of the data
formulating specific analysis objectives
knowledgeably applying the tools

Analytics can significantly amplify your data analysis abilities but it does not replace them.

Data analysis commands and tools in Analytics

The table below categorizes Analytics commands and tools by data analysis area. The categories are not intended to be absolute. You may find an effective use for a command outside its category. Some commands, such as Sort and Join, serve a main purpose that is not primarily analytical, but in some situations they can provide analytical insight.

Note

Data analysis, beyond the most basic, typically requires using a series of commands to progressively work toward your analysis objective, rather than using a single command in isolation.

Data analysis area	Command or tool	Description
General characteristics	Verify Count Total Profile Statistics Outliers Sort Index	Use these commands to discover general characteristics of a data set, including: data validity record counts total amounts minimum, maximum, and average amounts standard deviation, median, mode, and quartile values outliers ranges distribution of negative and positive values patterns
Reliability/Accuracy	computed fields	Use computed fields to recalculate and test the accuracy of calculated amounts in a data set, such as total amounts including tax
Isolation	filtering searching	Use filtering to restrict a data set or data processing to a subset of records of interest Use searching to locate specific values in a data set
Sequential Order	Sequence	Test whether data is sequentially ordered, and identify out-of-sequence items
Completeness	Gaps	Verify whether all records in a sequence, such as a sequentially ordered series of checks, are present, and identify the location of any gaps in the sequence
Uniqueness	Duplicates	Identify duplicate values or items in a field, or entire duplicate records
Inexactness	Fuzzy Duplicates	Identify nearly identical values that may refer to the same real-world entity
Frequency Distribution Concentration of Materiality	Stratify Age Classify Summarize Cross-Tabulate Histogram Cluster	Group records and determine how many records and how much value are concentrated by numeric range or cluster, by time period, or by record identifiers such as location codes, vendor/customer numbers, or product identifiers Also useful for identifying outliers
Compare	Join Fuzzy Join Relate	Combine tables to discover whether records are matched or unmatched across tables, such as an invoice table and a PO table
Numeric anomaly	Benford	Discover anomalous numeric data by testing leading digits for variance from expected Benford distribution

Analyzing data

Effective data analysis

Data analysis commands and tools in Analytics

Page options

Is this page helpful?

Is this page helpful?