read_csv() method
Reads a comma separated values file (*.csv) or a text file (*.txt) to an HCL dataframe.
Syntax
hcl.read_csv(source file, nrows = number_of_rows, usecols = ["column name", "...n"], header = number_of_header_rows, names = ["column name", "...n"], delimiter = "separator character", index_col = ["column name", "...n"], multiple additional parameters)
Parameters
Name | Description |
---|---|
source file |
The name, file path, or URL of the source file, including the file extension (*.csv or *.txt). |
nrows = number_of_rows optional |
The number of rows to use. Rows are counted from the beginning of the file, starting at 0. If omitted, all rows in the source file are used. |
usecols = ["column name", "...n"] optional |
The columns to use. If omitted, all columns in the source file are used. Specify the column names exactly as they appear in the source file unless you are specifying different names with names. If you are specifying different names, then use those names with usecols. The resulting order of the columns in the dataframe is the same as their order in the source file, regardless of the order in which you specify them. |
header = number_of_header_rows optional |
Excludes one or more lines of header text in the source file from the dataframe. |
names = ["column name", "...n"] optional |
Column names to use in the dataframe. Specifies column names if no names exist in the source file, or overrides the names in the source file. The names you specify are applied sequentially to the columns in the data so make sure that the names and the columns are properly aligned. Note Use the header parameter to prevent existing column names being included in the dataframe. For example, specify header = 0 if the column names are on the first line of the source file. |
delimiter = "separator character" optional |
The delimiter character used between values in the source file if other than a comma. Qualify the delimiter character using quotation marks. delimiter = "|" Tab delimiter character: delimiter = "\t" If omitted, the comma delimiter ( , ) is used. |
index_col = ["column name", "...n"] optional |
Uses one of the data columns in the source file as the index column in the dataframe. Allows specifying more than one index column. |
multiple additional parameters
optional |
hcl.read_csv() supports all parameters supported by the Pandas function pandas.read_csv(). For a full list of parameters, consult the Pandas documentation for pandas.read_csv(). |
Returns
HCL dataframe.
Examples
Read a CSV file to an HCL dataframe
You want to read all the data – all rows and all columns – from the Pcard_Transactions.csv file to the pcard_transactions dataframe. The column names from the source CSV file are used in the dataframe. Because the source file uses a comma ( , ) as a separator between values, you are not required to specify the separator.
pcard_transactions = hcl.read_csv("https://help.highbond.com/analytics/Pcard_Transactions.csv")
Read a subset of rows and columns from a CSV file to an HCL dataframe
You want to read only a subset of the data from the Pcard_Transactions.csv file to the pcard_transactions dataframe. The example below reads only the first 100 rows and the specified columns.
pcard_transactions = hcl.read_csv("https://help.highbond.com/analytics/Pcard_Transactions.csv", nrows = 100, usecols = ["AccountNumber", "Amount", "Description", "Quantity", "TransDate", "UnitCost", "VendorLocation","VendorName", "VendorNumber"])
Read a tab-delimited text file to an HCL dataframe
You need to read data from the tab-delimited Pcard_Transactions.txt file to the pcard_transactions dataframe. The column names from the source text file are used in the dataframe. Because the source file uses a tab as a separator between values, you are required to specify the separator ( "\t" ).
pcard_transactions = hcl.read_csv("https://help.highbond.com/analytics/Pcard_Transactions.txt", delimiter = "\t")
Read a CSV file to an HCL dataframe and update the column names
You want to read the data from the Pcard_Transactions.csv file to the pcard_transactions dataframe, and specify your own column names rather than use the column names in the source file.
In addition to specifying different column names, you need to specify header = 0 to prevent the source column names on the first line of the source file from being included in the dataframe.
pcard_transactions = hcl.read_csv("https://help.highbond.com/analytics/Pcard_Transactions.csv", header = 0, names = ["Acct_Num", "Amount", "Currency", "Country", "Desc", "Merch_Code", "Qty", "Ref_Num", "Currency_Src", "Trans_Date", "Unit_Cost", "Vend_Loc", "Vend_Name", "Vend_Num"])
Read a CSV file to an HCL dataframe and exclude header information
You want to read the data from the Pcard_Transactions.csv file to the pcard_transactions dataframe, and skip three lines of header information in the source file.
pcard_transactions = hcl.read_csv("https://help.highbond.com/analytics/Pcard_Transactions.csv", header = 3)