read_csv() method

Reads a comma separated values file (*.csv) or a text file (*.txt) to an HCL dataframe.

Syntax

hcl.read_csv(source file, nrows = number_of_rows, usecols = ["column name", "...n"], header = number_of_header_rows, names = ["column name", "...n"], delimiter = "separator character", index_col = ["column name", "...n"], multiple additional parameters)

Parameters

Name	Description
source file	The name, file path, or URL of the source file, including the file extension (.csv or .txt).
nrows = number_of_rows optional	The number of rows to use. Rows are counted from the beginning of the file, starting at 0. If omitted, all rows in the source file are used.
usecols = ["column name", "...n"] optional	The columns to use. If omitted, all columns in the source file are used. Specify the column names exactly as they appear in the source file unless you are specifying different names with names. If you are specifying different names, then use those names with usecols. The resulting order of the columns in the dataframe is the same as their order in the source file, regardless of the order in which you specify them.
header = number_of_header_rows optional	Excludes one or more lines of header text in the source file from the dataframe.
names = ["column name", "...n"] optional	Column names to use in the dataframe. Specifies column names if no names exist in the source file, or overrides the names in the source file. The names you specify are applied sequentially to the columns in the data so make sure that the names and the columns are properly aligned. Note Use the header parameter to prevent existing column names being included in the dataframe. For example, specify header = 0 if the column names are on the first line of the source file.
delimiter = "separator character" optional	The delimiter character used between values in the source file if other than a comma. Qualify the delimiter character using quotation marks. delimiter = "\|" Tab delimiter character: delimiter = "\t" If omitted, the comma delimiter ( , ) is used.
index_col = ["column name", "...n"] optional	Uses one of the data columns in the source file as the index column in the dataframe. Allows specifying more than one index column.
multiple additional parameters optional	hcl.read_csv() supports all parameters supported by the Pandas function pandas.read_csv(). For a full list of parameters, consult the Pandas documentation for pandas.read_csv().

Name

Description

source file

The name, file path, or URL of the source file, including the file extension (*.csv or *.txt).

nrows = number_of_rows

optional

The number of rows to use.

Rows are counted from the beginning of the file, starting at 0. If omitted, all rows in the source file are used.

usecols = ["column name", "...n"]

optional

The columns to use.

If omitted, all columns in the source file are used.

Specify the column names exactly as they appear in the source file unless you are specifying different names with names. If you are specifying different names, then use those names with usecols.

The resulting order of the columns in the dataframe is the same as their order in the source file, regardless of the order in which you specify them.

header = number_of_header_rows

optional

Excludes one or more lines of header text in the source file from the dataframe.

names = ["column name", "...n"]

optional

Column names to use in the dataframe.

Specifies column names if no names exist in the source file, or overrides the names in the source file.

The names you specify are applied sequentially to the columns in the data so make sure that the names and the columns are properly aligned.

Note

Use the header parameter to prevent existing column names being included in the dataframe. For example, specify header = 0 if the column names are on the first line of the source file.

delimiter = "separator character"

optional

The delimiter character used between values in the source file if other than a comma. Qualify the delimiter character using quotation marks.

delimiter = "|"

Tab delimiter character:

delimiter = "\t"

If omitted, the comma delimiter ( , ) is used.

index_col = ["column name", "...n"]

optional

Uses one of the data columns in the source file as the index column in the dataframe. Allows specifying more than one index column.

multiple additional parameters

optional

hcl.read_csv() supports all parameters supported by the Pandas function pandas.read_csv().

For a full list of parameters, consult the Pandas documentation for pandas.read_csv().

Returns

HCL dataframe.

Examples

Read a CSV file to an HCL dataframe

You want to read all the data – all rows and all columns – from the Pcard_Transactions.csv file to the pcard_transactions dataframe. The column names from the source CSV file are used in the dataframe. Because the source file uses a comma ( , ) as a separator between values, you are not required to specify the separator.

pcard_transactions = hcl.read_csv("https://help.highbond.com/analytics/Pcard_Transactions.csv")

Read a subset of rows and columns from a CSV file to an HCL dataframe

You want to read only a subset of the data from the Pcard_Transactions.csv file to the pcard_transactions dataframe. The example below reads only the first 100 rows and the specified columns.

pcard_transactions = hcl.read_csv("https://help.highbond.com/analytics/Pcard_Transactions.csv", nrows = 100, usecols = ["AccountNumber", "Amount", "Description", "Quantity", "TransDate", "UnitCost", "VendorLocation","VendorName", "VendorNumber"])

Read a tab-delimited text file to an HCL dataframe

You need to read data from the tab-delimited Pcard_Transactions.txt file to the pcard_transactions dataframe. The column names from the source text file are used in the dataframe. Because the source file uses a tab as a separator between values, you are required to specify the separator ( "\t" ).

pcard_transactions = hcl.read_csv("https://help.highbond.com/analytics/Pcard_Transactions.txt", delimiter = "\t")

Read a CSV file to an HCL dataframe and update the column names

You want to read the data from the Pcard_Transactions.csv file to the pcard_transactions dataframe, and specify your own column names rather than use the column names in the source file.

In addition to specifying different column names, you need to specify header = 0 to prevent the source column names on the first line of the source file from being included in the dataframe.

pcard_transactions = hcl.read_csv("https://help.highbond.com/analytics/Pcard_Transactions.csv", header = 0, names = ["Acct_Num", "Amount", "Currency", "Country", "Desc", "Merch_Code", "Qty", "Ref_Num", "Currency_Src", "Trans_Date", "Unit_Cost", "Vend_Loc", "Vend_Name", "Vend_Num"])

Read a CSV file to an HCL dataframe and exclude header information

You want to read the data from the Pcard_Transactions.csv file to the pcard_transactions dataframe, and skip three lines of header information in the source file.

pcard_transactions = hcl.read_csv("https://help.highbond.com/analytics/Pcard_Transactions.csv", header = 3)

read_csv() method

Syntax

Parameters

Returns

Examples

Read a CSV file to an HCL dataframe

Read a subset of rows and columns from a CSV file to an HCL dataframe

Read a tab-delimited text file to an HCL dataframe

Read a CSV file to an HCL dataframe and update the column names

Read a CSV file to an HCL dataframe and exclude header information

Page options

Is this page helpful?

Is this page helpful?