join() method
Based on a common key, combines columns from two dataframes into a single dataframe.
Syntax
left_dataframe_name.join(other = right_dataframe_name, lkey = ["left_key_column", "...n"], lcolumns = ["left_column", "...n"], rkey = ["right_key_column", "...n"], rcolumns = ["right_column", "...n"], join_type = "inner"|"left"|"right"|"outer"|"unmatched")
Parameters
Name | Description | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
other = right_dataframe_name | The name of the right dataframe. | ||||||||||||||||||
lkey = ["left_key_column", "...n"] |
The key column or columns in the left dataframe. Columns are used in the order that you list them. If you are joining on more than one column, corresponding columns must be in the same order in lkey and rkey. Note Matching of left and right key values is case sensitive. Matching values are only considered identical if their casing is also identical. |
||||||||||||||||||
lcolumns = ["left_column", "...n"] optional |
The columns from the left dataframe to include in the joined output dataframe. Columns are included in the order that you list them. If you omit lcolumns, all columns from the left dataframe are included in the joined output dataframe. If you want to exclude all left columns, specify lcolumns = []. Note The lkey column or columns are automatically included in the output dataframe. You do not need to re-specify them in lcolumns. |
||||||||||||||||||
rkey = ["right_key_column", "...n"] |
The key column or columns in the right dataframe. Columns are used in the order that you list them. If you are joining on more than one column, corresponding columns must be in the same order in rkey and lkey. Note Matching of left and right key values is case sensitive. Matching values are only considered identical if their casing is also identical. |
||||||||||||||||||
rcolumns = ["right_column", "...n"] optional |
The columns from the right dataframe to include in the joined output dataframe. Columns are included in the order that you list them. If you omit rcolumns, all columns from the right dataframe are included in the joined output dataframe. If you want to exclude all right columns, specify rcolumns = []. rcolumns is ignored if you use the unmatched join type. Note The rkey column or columns are automatically included in the output dataframe. You do not need to re-specify them in rcolumns. |
||||||||||||||||||
join_type = "inner" | "left" | "right" | "outer" | "unmatched" optional |
The type of join to perform. If you omit join_type an inner join is performed by default.
|
Returns
HCL dataframe.
Examples
Join two dataframes as a way of discovering employees who may also be vendors
The example below joins the employees dataframe (left) and the vendors dataframe (right) using address as the common key column (the Address and Vendor_Street columns).
The join() method creates a new dataframe with matched left and right rows, which results in a list of any employees and vendors with the same address.
employee_vendor_match = employees.join(other = vendors, lkey = ["Address"], lcolumns = ["Empno", "First", "Last"], rkey = ["Vendor_Street"], rcolumns = ["Vendor_No", "Vendor_Name"], join_type = "inner")
Join two dataframes as a way of discovering accounts receivable rows with no matching customer
The example below joins the accounts_receivable dataframe (left) and the customers dataframe (right) using customer number (CustNo) as the common key column.
The join() method uses the unmatched join type to create a new dataframe with unmatched left rows, which results in a list of accounts_receivable rows that are not associated with any customers row.
customers_not_found = accounts_receivable.join(other = customers, lkey = ["CustNo"], lcolumns = ["Due", "Amount"], rkey = ["CustNo"], join_type = "unmatched")