Joining tables

Concept Information

JOIN command

Joining tables allows you to combine two Analytics tables with different record structures into a new third table. You can select any combination of fields from the two original tables to be included in the new table.

Record structures are different if they have one or more fields (data elements) that differ. Joining is a good choice for investigative work that requires a permanently joined set of data as a starting point for analysis.

Example

Scenario

You want to identify any vendors who are also employees as one way of analyzing data for possible improper payments.

Approach

You join the Vendor master table with the Employee table, using the common key field of Address.

The joined output table contains any vendors and employees with the same address.

Note

For information about joining tables in the Data Access window as part of the data import process, see Joining tables in the Data Access window.

This topic is about joining Analytics tables once they are in Analytics.

Join tables using a common key field

You join tables using a common key field – that is, a data element such as employee number, vendor ID, or address, that appears in both tables. When identical values exist in the two key fields, the result is a match that joins individual records from the separate tables.

In the example below, a vendor master table and an employee master table are joined using the address field in each table as a common key (Vendor_Street, and Emp_Address). The output table contains two joined records. In the example, the unjoined records from each table are also included in the output table, which is an option you can select.

Joining tables using similar or nearly identical key field values

An Analytics fuzzy join use fuzzy matching of key field values to combine two Analytics tables into a new third table. In most respects, a fuzzy join is like a regular Analytics join. The main difference is that in addition to joining records based on exact matching of key field values, a fuzzy join can join records based on approximate matching.

For more information, see Fuzzy join.

Key field requirements

Several requirements apply to the key fields in the tables you are joining.

Primary and secondary tables and key fields

The tables and key fields in the join operation are identified as primary and secondary based on the order in which you open the tables:

  • primary table the first table you open
  • primary key field the key field you choose from the primary table
  • secondary table the second table you open

    Opening a secondary table means associating it with a primary table and making it available for processing. Secondary tables are not opened in the View tab.

  • secondary key field the key field you choose from the secondary table

You are free to choose whatever primary and secondary tables and key fields you want. However, the join will succeed only if the key fields meet the Key field requirements.

For more information, see About key fields.

Matched versus unmatched records

When you work with joins, you need to consider both the matched and the unmatched records:

  • Matched records primary and secondary records are matched if they have identical values in the primary and secondary key fields.

    Note

    Depending on the join type you select, duplicate occurrences of matching secondary key values may be left unjoined. For more information, see Why are some secondary table records missing from the joined output table?

  • Unmatched records primary and secondary records are unmatched if they do not have identical values in the primary and secondary key fields.

Which records are included in the joined table?

Matched and unmatched key field values, and the type of join you perform, determine which records from the two original tables are included in the new, joined table.

Types of joins

Analytics supports six different types of joins, summarized below. For specific examples, see Examples of join types.

  Join type Records included in joined table
Matched primary records Unmatched primary records Matched secondary records Unmatched secondary records

Matched primary and secondary

(1st secondary match)

 

Not included: duplicate occurrences of matching secondary key values

 

Matched primary and secondary

(all secondary matches)

 

Included and joined: duplicate occurrences of matching secondary key values

 

Unmatched primary

 

 

 

All primary and matched secondary

Not included: duplicate occurrences of matching secondary key values

 

All secondary and matched primary

 

Included but not joined: duplicate occurrences of matching secondary key values

Included: duplicate occurrences of unmatched secondary key values

All primary and secondary

Included but not joined: duplicate occurrences of matching secondary key values

Included: duplicate occurrences of unmatched secondary key values

Why are some secondary table records missing from the joined output table?

Five of the six Analytics join types do not join duplicate occurrences of matching secondary key values. Duplicate occurrences of matching primary key values are joined, but they are all joined to the first occurrence of the matching secondary key value. These join types are broadly known as many-to-one joins.

To join all matching secondary key values do one of the following:

  • Reverse the tables In the join, reverse the primary and secondary tables. This method is appropriate if the values in the key field in the original primary table are unique. If there are duplicate occurrences of primary key values in both tables, this method may not produce the results you want.
  • Use many-to-many joining Use the Matched primary and secondary (all secondary matches) join type.

Many-to-one joins and the many-to-many join

You may see Analytics joins referred to as many-to-one joins, and one of the six join types referred to as the many-to-many join. These terms are useful as a way of broadly describing the behavior of Analytics joins. However, the terms are generalizations, and do not entirely represent join behavior.

Analytics many-to-one joins

With one exception, the join types available to you in Analytics are many-to-one joins. They also function as one-to-one joins if all values in the primary key field are unique.

The Analytics many-to-many join

One Analytics join type – Matched primary and secondary (all secondary matches) – includes all matching primary and secondary records. This join type is also known as the many-to-many join.

The many-to-many join also functions as a one-to-many join if all values in the primary key field are unique.

How Analytics joins and SQL joins differ

There is an important difference between the Analytics joins you perform using the Join dialog box, and the SQL joins available when importing data into Analytics using the Data Access window:

  • Analytics joins duplicate secondary key matching values are left unjoined (with the exception of the many-to-many join)
  • SQL joins duplicate secondary key matching values are all joined, regardless of the join type you choose

    For more information about SQL joins in the Data Access window, see Joining tables in the Data Access window.

Note

Analytics uses the term "many-to-many join" in a manner unique to Analytics. It is not the same as a SQL many-to-many join.

Sorting of joined tables

The combined table resulting from a join is sorted in ascending order on the primary key field, assuming you Presort the primary key field while performing the join, or the primary table already uses this sort order.

If you do not sort or presort by the primary key field, the resulting joined table uses the existing sort order of the primary table.

Additional information about sorting and joining

  • It is not mandatory that the primary table be sorted, but processing time increases significantly if the primary table is completely unsorted, or sorted in descending order.
  • If you perform a join using a partially sorted primary table key field – for example, joining on account code when the table is sorted by month and then by account code – the increase in processing time is not as significant.
  • When joining, the Presort option exists for both the primary and the secondary tables.
  • The primary and secondary key fields can be indexed instead of sorted. The secondary key field must be indexed in ascending order. Applying an index to the secondary table can only be performed from the command line or in a script.

Additional information about joining

The table below provides additional information about joining.

 

Functional area

Details

Unmatched records and missing field values

If you include unmatched primary or unmatched secondary records in a join, for the missing field values, Analytics displays a blank in character and datetime fields, a zero in numeric fields, and “F” in logical fields.

Duplicates or blanks in secondary table key field

If duplicates or missing values in a secondary table key field render subsequent analysis invalid, pre-processing the secondary table to remove the duplicates and/or blanks may be a solution in some circumstances.

Partial matching

Partial matching of key field values is not supported. To be matched, values must be 100% identical.

For example:

  • matched AB-123, AB-123
  • not matched AB-123, 123

Note

Partial matching is supported by the Analytics Fuzzy join.

Identical key field length not enforced

With the exception of character key fields, Analytics does not enforce identical lengths for the primary and secondary key fields when joining tables.

It is recommended that you always use identical lengths for numeric key fields, manually harmonizing the lengths prior to performing the join, if necessary. Results derived from joining using numeric key fields of different lengths are not reliable.

Datetime key fields can be different lengths because Analytics, when performing operations involving dates, datetimes, or times, uses an internal Analytics datetime format.

Harmonizing justification and case

When you join tables using character key fields, justification and case must be the same:

  • Both key fields must have the same justification. Use the LTRIM( ) function to remove leading blanks from the key fields.

  • Both key fields must be in the same case – UPPER, lower, or Proper. To harmonize the case, use the UPPER( ), LOWER( ), or PROPER( ) function.

Count of records not included in a join

Depending on the type of join you perform, records in the primary and/or secondary tables may not be included in the joined table. The command log displays the number of primary records not included (<n> records bypassed), but not the number of bypassed secondary records.

Conditional expressions and scope options used in join operation

In many-to-one joins, the If, While, First, and Next parameters that limit which records are processed apply only to the primary table. In many-to-many joins, If and While expressions can also reference the secondary table.

Identical field names in tables being joined

If the primary and secondary key fields, or any other included fields, have identical names, Analytics adds ‘2’ to the end of the secondary field name in the layout for the output table. For example, ‘vendor_ID’ becomes ‘vendor_ID2’ (or ‘vendor_ID3’, and so on, until Analytics finds a name that does not conflict with any other field names in the output table).

The alternate column titles in the view for the output table continue to display the identical names unaltered.

Table unavailable as secondary table

A table is unavailable to select as the secondary table in a join if it is currently related to the primary/parent table as a child table. To avoid this restriction, you can create a copy of the primary/parent table layout, or the child table layout, and join using the copied layout, or you can delete the relation.

Restrictions on location of tables being joined

To be joined, tables must be in the same Analytics project. Server tables must be on the same server, and must be accessed using the same server profile. You cannot join a local table with a server table.

Size of joined table

Depending on the type of join performed, the number of records in the resulting combined table can be greater than, equal to, or less than the sum of the records in the two tables being joined.

Joining UTC-based and non-UTC data

A UTC-based and a non-UTC datetime key field can be used to join two tables. (UTC is Coordinated Universal Time, the time at zero degrees longitude.) When performing operations involving datetimes or times, Analytics uses an internal Analytics datetime format, so the following two datetimes are interpreted as identical, and constitute a match:

  • UTC-based – 31/12/2014 10:30:15-05:00

  • non-UTC – 31/12/2014 15:30:15

You should exercise caution if you mix UTC-based and non-UTC time data in an Analytics operation. Although Analytics will match the two time values above, it may not make logical sense to do so, because one value references a time zone, and the other value does not. For more information about UTC, see Date and Time options.