Connecting to Apache Cassandra
Concept Information
Apache Cassandra is a NoSQL database management system. Use the Apache Cassandra data connector to import your organization's Cassandra data.
Note
Analytics provides Cassandra as an optional connector and if it is not available in your Data Access window, it is likely that the connector was not selected during installation. For more information, see Install optional Analytics data connectors and Python engine.
Before you start
To connect to Cassandra, you must gather the following:
- the database server's host name or IP address
- the correct connection port
- your username and password if using authentication
For help gathering the connection prerequisites, contact the Cassandra administrator in your organization. If your administrator cannot help you, you or your administrator should contact Cassandra Support.
Create a Cassandra connection
- From the Analytics main menu, select Import > Database and application.
- From the New Connections tab, in the ACL Connectors section, select Cassandra.
Tip
You can filter the list of available connectors by entering a search string in the Filter connections box. Connectors are listed alphabetically.
- In the Data Connection Settings panel, enter the connection settings and at the bottom of the panel, click Save and Connect.
You can accept the default Connection Name, or enter a new one.
The connection for Cassandra is saved to the Existing Connections tab. In the future, you can reconnect to Cassandra from the saved connection.
Once the connection is established, the Data Access window opens to the Staging Area and you can begin importing data. For help importing data from Cassandra, see Working with the Data Access window.
Connection settings
Basic settings
Setting | Description | Example |
---|---|---|
Host |
The IP address or host name of the Cassandra server. |
|
Port | The TCP port for the Cassandra database. | 9042 |
Default keyspace | The default keyspace (schema) to connect to in Cassandra. | |
Authentication Mechanism |
The authentication mechanism to use to connect to Cassandra server. Available options are as follows:
|
No Authentication |
User name | The user name to use to access the Cassandra server. | |
Password | The password corresponding to the user name provided. |
Advanced settings
Setting | Description | Example |
---|---|---|
Query mode |
Specifies the query mode to use when sending queries to Cassandra. The options available are:
|
SQL with CQL Fallback |
Tunable consistency | The specific Cassandra replica or the number of Cassandra replicas that must process a query for the query to be successful. | ONE |
Load Balancing Policy | Specifies the load balancing policy to be used. | |
Binary column length | The default column length to report for BLOB columns. | 4000 |
String column length | The default column length to report for ASCII, TEXT, and VARCHAR columns. | 4000 |
Virtual table name separator | The separator for naming a virtual table built from a collection. The name of a virtual table consists of the name of the original table, then the separator, and then the name of the collection. |
_vt_ |
Enable Token Aware | Specifies whether to use a token-aware policy to improve load balancing and latency. | |
Enable Latency Aware | Specifies whether the driver must use a latency-awareness algorithm to distribute the load away from slower-performing nodes. | |
Enable Null Values Insertion | Specifies whether the driver must insert all NULL values as specified in INSERT statements. | |
Enable Case Sensitive |
Specifies whether the driver differentiates between capital and lower-case letters in schema, table, and column names. If this option is enabled, all schemas, tables, and columns must be double quotation marks ("). |
|
Use SQL_WVARCHAR for string data type | Specifies whether to use SQL_WVARCHAR for text and varchar types. | |
Enable paging | Specifies whether to split large result sets into pages. | |
Rows per page | When the Enable Paging option is enabled, use this option to specify the maximum number of rows to display on each page. | 10000 |
SSL Options |
Specifies how the driver uses SSL to connect to the Cassandra server. The options available are:
|
No SSL |
Enable Server Hostname Verification | Specifies whether the driver mandates the host name of the server to match the host name in the SSL certificate. | |
Ssltrustedcertspath | The full path to the .pem file containing the certificate for verifying the server. | |
Client-side Certificate | The full path to the .pem file containing the certificate for verifying the client. | |
Client-side Private Key | The full path to the file containing the private key used to verify the client. | |
Key File Password | The password for the private key file that is specified in the Client-side Private Key field. |
Querying Cassandra
One advantage of the Apache Cassandra design is the ability to store data that is denormalized into fewer tables. By taking advantage of nested data structures such as sets, lists, and maps, transactions can be simplified. However, Analytics does not support accessing this type of data. By re-normalizing the data contained within collections (sets, lists, and maps) into virtual tables, the connector allows users to directly interact with the data but leave the storage of the data in its denormalized form in Cassandra.
If a table contains any collection columns, when the table is queried for the first time, the connector creates the following virtual tables:
- A "base" table, which contains the same data as the real table except for the collection columns.
- A virtual table for each collection column, which expands the nested data.
Virtual tables refer to the data in the real table, enabling the connector to access the denormalized data. By querying the virtual tables, you can access the contents of Cassandra collections via ODBC.
The base table and virtual tables appear as additional tables in the list of tables that exist in the database. The base table uses the same name as the real table that it represents. The virtual tables that represent collections are named using the name of the real table, a separator (_vt_ by default), and the name of the column.
Example
ExampleTable is a Cassandra database table that contains an integer primary key column named pk_int, a list column, a map column, and a set column (named StringSet).
Source table with collections
pk_int | List | Map | StringSet |
---|---|---|---|
1 | ["1","2","3"] | {"S1" : "a", "S2" : "b"} | {"a", "b", "c"} |
3 | ["100","101","102","105"] | {"S1" : "t"} | {"a","e"} |
The connector generates multiple virtual tables to represent this single table. The first virtual table is the base table:
Base table
pk_int |
---|
1 |
3 |
The base table contains the same data as the original database table except for the collections, which are omitted from this table and expanded in other virtual tables.
The following tables show the virtual tables that re-normalize the data from the List, Map, and StringSet columns:
List
pk_int | List#index | List#value |
---|---|---|
1 | 0 | 1 |
1 | 1 | 2 |
1 | 2 | 3 |
3 | 0 | 100 |
3 | 1 | 101 |
3 | 2 | 102 |
3 | 3 | 105 |
Map
pk_int | Map#key | Map#value |
---|---|---|
1 | S1 | a |
1 | S2 | b |
3 | S1 | t |
StringSet
pk_int | StringSet#value |
---|---|
1 | a |
1 | b |
1 | c |
3 | a |
3 | e |
The foreign key columns in the virtual tables reference the primary key columns in the real table, and indicate which real table row the virtual table row corresponds to. The columns with names that end with #index or #key indicate the position of the data within the original list or map. The columns with names that end with #value contain the expanded data from the collection.
Data connector updates
When you upgrade Analytics, the Robots Agent, or AX Server, you should test any of your scripts that import data using one of the Analytics data connectors (ACCESSDATA command).
The possibility exists that changes made by third-party data sources or ODBC driver vendors required updates to one or more of the data connectors. Scripted data connections may need to be updated in order to continue working correctly.
- Re-run the import The easiest way to update a connection is to manually perform an import using the Data Access window in the upgraded version of Analytics. Copy the ACCESSDATA command from the log and use it to update your script.
Note
Before connecting to a data source and re-running the import, clear the connector cache to flush the existing set of table names.
In the Existing Connections tab in the Data Access window, beside the connector name, select > Clear cache.
- Update field specifications You may also need to update field specifications in the script body to align with table schema changes in the data source or ODBC driver. Possible changes include field names, field data types, and field and record lengths.
- Check the results of any filtering You should also check the results of any filtering that you apply as part of the data import. Confirm that the import filtering is including and excluding records correctly.
Apache Cassandra data connector changes
Specific changes made to the Apache Cassandra data connector are listed below.
Analytics version | Change |
---|---|
14.2 |
The connector no longer supports connecting to Apache Cassandra 2.0. Connections can be made to Apache Cassandra 2.1, 2.2, and 3.0. |