Joining tables
Concept Information
Joining tables allows you to combine two Analytics tables with different record structures into a new third table. You can select any combination of fields from the two original tables to be included in the new table.
Record structures are different if they have one or more fields (data elements) that differ. Joining is a good choice for investigative work that requires a permanently joined set of data as a starting point for analysis.
Example
Scenario
You want to identify any vendors who are also employees as one way of analyzing data for possible improper payments.
Approach
You join the Vendor master table with the Employee table, using the common key field of Address.
The joined output table contains any vendors and employees with the same address.
Note
For information about joining tables in the Data Access window as part of the data import process, see Joining tables in the Data Access window.
This topic is about joining Analytics tables once they are in Analytics.
Join tables using a common key field
You join tables using a common key field – that is, a data element such as employee number, vendor ID, or address, that appears in both tables. When identical values exist in the two key fields, the result is a match that joins individual records from the separate tables.
In the example below, a vendor master table and an employee master table are joined using the address field in each table as a common key (Vendor_Street, and Emp_Address). The output table contains two joined records. In the example, the unjoined records from each table are also included in the output table, which is an option you can select.
Joining tables using similar or nearly identical key field values
An Analytics fuzzy join use fuzzy matching of key field values to combine two Analytics tables into a new third table. In most respects, a fuzzy join is like a regular Analytics join. The main difference is that in addition to joining records based on exact matching of key field values, a fuzzy join can join records based on approximate matching.
For more information, see Fuzzy join.
Key field requirements
Several requirements apply to the key fields in the tables you are joining.
Key field characteristic | Requirement |
---|---|
Data element | Must be the same. For example, both key fields are employee number fields. |
Data type |
Can be any data type, but key fields must be the same data type as each other. For example, two character fields. The exception is character-numeric and numeric-character data type joins, which Analytics automatically harmonizes. For more information, see Automatic harmonization when joining tables. Datetime subtypes (date, datetime, and time) can only be joined to the same subtype. |
Field type | Can be physical fields or computed fields. |
Field name | Can be different. |
Start position | Can be different. |
Field length |
|
Justification and case in character fields | Must be the same. |
Primary and secondary tables and key fields
The tables and key fields in the join operation are identified as primary and secondary based on the order in which you open the tables:
- primary table the first table you open
- primary key field the key field you choose from the primary table
- secondary table the second table you open
Opening a secondary table means associating it with a primary table and making it available for processing. Secondary tables are not opened in the View tab.
- secondary key field the key field you choose from the secondary table
You are free to choose whatever primary and secondary tables and key fields you want. However, the join will succeed only if the key fields meet the Key field requirements.
For more information, see About key fields.
Matched versus unmatched records
When you work with joins, you need to consider both the matched and the unmatched records:
- Matched records primary and secondary records are matched if
they have identical values in the primary and secondary key fields.
Note
Depending on the join type you select, duplicate occurrences of matching secondary key values may be left unjoined. For more information, see Why are some secondary table records missing from the joined output table?
- Unmatched records primary and secondary records are unmatched if they do not have identical values in the primary and secondary key fields.
Which records are included in the joined table?
Matched and unmatched key field values, and the type of join you perform, determine which records from the two original tables are included in the new, joined table.
Types of joins
Analytics supports six different types of joins, summarized below. For specific examples, see Examples of join types.
Join type | Records included in joined table | |||
---|---|---|---|---|
Matched primary records | Unmatched primary records | Matched secondary records | Unmatched secondary records | |
Matched primary and secondary (1st secondary match)
|
|
|
Not included: duplicate occurrences of matching secondary key values |
|
Matched primary and secondary (all secondary matches)
|
|
|
Included and joined: duplicate occurrences of matching secondary key values |
|
Unmatched primary
|
|
|
|
|
All primary and matched secondary
|
|
|
Not included: duplicate occurrences of matching secondary key values |
|
All secondary and matched primary
|
|
|
Included but not joined: duplicate occurrences of matching secondary key values |
Included: duplicate occurrences of unmatched secondary key values |
All primary and secondary
|
|
|
Included but not joined: duplicate occurrences of matching secondary key values |
Included: duplicate occurrences of unmatched secondary key values |
Why are some secondary table records missing from the joined output table?
Five of the six Analytics join types do not join duplicate occurrences of matching secondary key values. Duplicate occurrences of matching primary key values are joined, but they are all joined to the first occurrence of the matching secondary key value. These join types are broadly known as many-to-one joins.
To join all matching secondary key values do one of the following:
- Reverse the tables In the join, reverse the primary and secondary tables. This method is appropriate if the values in the key field in the original primary table are unique. If there are duplicate occurrences of primary key values in both tables, this method may not produce the results you want.
- Use many-to-many joining Use the Matched primary and secondary (all secondary matches) join type.
Many-to-one joins and the many-to-many join
You may see Analytics joins referred to as many-to-one joins, and one of the six join types referred to as the many-to-many join. These terms are useful as a way of broadly describing the behavior of Analytics joins. However, the terms are generalizations, and do not entirely represent join behavior.
Analytics many-to-one joins
With one exception, the join types available to you in Analytics are many-to-one joins. They also function as one-to-one joins if all values in the primary key field are unique.
In an Analytics many-to-one join:
- joined duplicate occurrences of a matching primary key value are all joined to the first occurrence of the matching secondary key value
The duplicate primary key matches, and the first secondary key match, are included in the joined table.
- not joined duplicate occurrences of a matching secondary key value are left unjoined
The duplicate secondary key matches are excluded from the joined output table, unless you select a join type that includes all secondary records. If you include all secondary records, duplicate secondary key matches appear in the joined output table as unjoined records.
Many-to-one join
In the example below, both occurrences of the primary key value 'C' are joined in the output table, but only the first occurrence of the secondary key value 'C' is joined.
Tip
If the key field values are unique in one of the tables you are joining, make that table the secondary table. For example, if you are joining a transactional table with a master table, make the master table the secondary table.
Structuring the join in this manner ensures that all the matching records are joined and included in the output table.
The Analytics many-to-many join
One Analytics join type – Matched primary and secondary (all secondary matches) – includes all matching primary and secondary records. This join type is also known as the many-to-many join.
The many-to-many join also functions as a one-to-many join if all values in the primary key field are unique.
In an Analytics many-to-many join:
- joined all occurrences of a matching primary key value are joined to all occurrences of the matching secondary key value
The duplicate primary key matches, and the duplicate secondary key matches, are all joined and included in the output table.
- not joined no duplicate occurrences of a matching secondary key value are left unjoined
Many-to-many join
In the example below, both occurrences of the primary key value 'C' are joined in the output table, and both occurrences of the secondary key value 'C' are also joined.
Tip
If you are uncertain whether duplicate matches exist in the secondary key, choose the many-to-many join type. It ensures that you do not exclude any records that should be joined.
If you intentionally want to exclude duplicate secondary key matches, then do not choose the many-to-many join type.
How Analytics joins and SQL joins differ
There is an important difference between the Analytics joins you perform using the Join dialog box, and the SQL joins available when importing data into Analytics using the Data Access window:
- Analytics joins duplicate secondary key matching values are left unjoined (with the exception of the many-to-many join)
- SQL joins duplicate secondary key matching values are all joined, regardless of the join type you choose
For more information about SQL joins in the Data Access window, see Joining tables in the Data Access window.
Note
Analytics uses the term "many-to-many join" in a manner unique to Analytics. It is not the same as a SQL many-to-many join.
Sorting of joined tables
The combined table resulting from a join is sorted in ascending order on the primary key field, assuming you Presort the primary key field while performing the join, or the primary table already uses this sort order.
If you do not sort or presort by the primary key field, the resulting joined table uses the existing sort order of the primary table.
Additional information about sorting and joining
- It is not mandatory that the primary table be sorted, but processing time increases significantly if the primary table is completely unsorted, or sorted in descending order.
- If you perform a join using a partially sorted primary table key field – for example, joining on account code when the table is sorted by month and then by account code – the increase in processing time is not as significant.
- When joining, the Presort option exists for both the primary and the secondary tables.
- The primary and secondary key fields can be indexed instead of sorted. The secondary key field must be indexed in ascending order. Applying an index to the secondary table can only be performed from the command line or in a script.
Additional information about joining
The table below provides additional information about joining.
Functional area |
Details |
---|---|
Unmatched records and missing field values |
If you include unmatched primary or unmatched secondary records in a join, for the missing field values, Analytics displays a blank in character and datetime fields, a zero in numeric fields, and “F” in logical fields. |
Duplicates or blanks in secondary table key field |
If duplicates or missing values in a secondary table key field render subsequent analysis invalid, pre-processing the secondary table to remove the duplicates and/or blanks may be a solution in some circumstances. |
Partial matching |
Partial matching of key field values is not supported. To be matched, values must be 100% identical. For example:
Note Partial matching is supported by the Analytics Fuzzy join. |
Identical key field length not enforced |
With the exception of character key fields, Analytics does not enforce identical lengths for the primary and secondary key fields when joining tables. It is recommended that you always use identical lengths for numeric key fields, manually harmonizing the lengths prior to performing the join, if necessary. Results derived from joining using numeric key fields of different lengths are not reliable. Datetime key fields can be different lengths because Analytics, when performing operations involving dates, datetimes, or times, uses an internal Analytics datetime format. |
Harmonizing justification and case |
When you join tables using character key fields, justification and case must be the same:
|
Count of records not included in a join |
Depending on the type of join you perform, records in the primary and/or secondary tables may not be included in the joined table. The command log displays the number of primary records not included (<n> records bypassed), but not the number of bypassed secondary records. |
Conditional expressions and scope options used in join operation |
In many-to-one joins, the If, While, First, and Next parameters that limit which records are processed apply only to the primary table. In many-to-many joins, If and While expressions can also reference the secondary table. |
Identical field names in tables being joined |
If the primary and secondary key fields, or any other included fields, have identical names, Analytics adds ‘2’ to the end of the secondary field name in the layout for the output table. For example, ‘vendor_ID’ becomes ‘vendor_ID2’ (or ‘vendor_ID3’, and so on, until Analytics finds a name that does not conflict with any other field names in the output table). The alternate column titles in the view for the output table continue to display the identical names unaltered. |
Table unavailable as secondary table |
A table is unavailable to select as the secondary table in a join if it is currently related to the primary/parent table as a child table. To avoid this restriction, you can create a copy of the primary/parent table layout, or the child table layout, and join using the copied layout, or you can delete the relation. |
Restrictions on location of tables being joined |
To be joined, tables must be in the same Analytics project. Server tables must be on the same server, and must be accessed using the same server profile. You cannot join a local table with a server table. |
Size of joined table |
Depending on the type of join performed, the number of records in the resulting combined table can be greater than, equal to, or less than the sum of the records in the two tables being joined. |
Joining UTC-based and non-UTC data |
A UTC-based and a non-UTC datetime key field can be used to join two tables. (UTC is Coordinated Universal Time, the time at zero degrees longitude.) When performing operations involving datetimes or times, Analytics uses an internal Analytics datetime format, so the following two datetimes are interpreted as identical, and constitute a match:
You should exercise caution if you mix UTC-based and non-UTC time data in an Analytics operation. Although Analytics will match the two time values above, it may not make logical sense to do so, because one value references a time zone, and the other value does not. For more information about UTC, see Date and Time options. |