Analyzing data
"Analyzing data" is a broad concept that encompasses a large range of different processes and techniques. There may be more than one way to achieve the same data analysis objective. The overall process is often iterative, requiring that you modify your initial approach based upon information you discover along the way.
Effective data analysis
At its most basic, analyzing data is the process of finding answers to questions about data. Analytics provides a number of commands and other tools that you can use to gain general insights about the data you are investigating, and to answer specific questions. However, you should resist the notion that you can click two or three buttons in Analytics and magically have all your data analysis answers.
Effective data analysis requires:
- understanding the nature of the data
- formulating specific analysis objectives
- knowledgeably applying the tools
Analytics can significantly amplify your data analysis abilities but it does not replace them.
Data analysis commands and tools in Analytics
The table below categorizes Analytics commands and tools by data analysis area. The categories are not intended to be absolute. You may find an effective use for a command outside its category. Some commands, such as Sort and Join, serve a main purpose that is not primarily analytical, but in some situations they can provide analytical insight.
Note
Data analysis, beyond the most basic, typically requires using a series of commands to progressively work toward your analysis objective, rather than using a single command in isolation.
Data analysis area | Command or tool | Description |
---|---|---|
General characteristics |
Use these commands to discover general characteristics of a data set, including:
|
|
Reliability/Accuracy | computed fields | Use computed fields to recalculate and test the accuracy of calculated amounts in a data set, such as total amounts including tax |
Isolation |
Use filtering to restrict a data set or data processing to a subset of records of interest Use searching to locate specific values in a data set |
|
Sequential Order | Sequence | Test whether data is sequentially ordered, and identify out-of-sequence items |
Completeness | Gaps | Verify whether all records in a sequence, such as a sequentially ordered series of checks, are present, and identify the location of any gaps in the sequence |
Uniqueness | Duplicates | Identify duplicate values or items in a field, or entire duplicate records |
Inexactness | Fuzzy Duplicates | Identify nearly identical values that may refer to the same real-world entity |
Frequency Distribution Concentration of Materiality |
Group records and determine how many records and how much value are concentrated by numeric range or cluster, by time period, or by record identifiers such as location codes, vendor/customer numbers, or product identifiers Also useful for identifying outliers |
|
Compare | Combine tables to discover whether records are matched or unmatched across tables, such as an invoice table and a PO table | |
Numeric anomaly | Benford | Discover anomalous numeric data by testing leading digits for variance from expected Benford distribution |