Connecting to Apache Cassandra

Concept Information

ACCESSDATA command

Apache Cassandra is a NoSQL database management system. Use the Apache Cassandra data connector to import your organization's Cassandra data.

Note

Analytics provides Cassandra as an optional connector and if it is not available in your Data Access window, it is likely that the connector was not selected during installation. For more information, see Install optional Analytics data connectors and Python engine.

Before you start

To connect to Cassandra, you must gather the following:

  • the database server's host name or IP address
  • the correct connection port
  • your username and password if using authentication

For help gathering the connection prerequisites, contact the Cassandra administrator in your organization. If your administrator cannot help you, you or your administrator should contact Cassandra Support.

Create a Cassandra connection

  1. From the Analytics main menu, select Import > Database and application.
  2. From the New Connections tab, in the ACL Connectors section, select Cassandra.

    Tip

    You can filter the list of available connectors by entering a search string in the Filter connections box. Connectors are listed alphabetically.

  3. In the Data Connection Settings panel, enter the connection settings and at the bottom of the panel, click Save and Connect.

    You can accept the default Connection Name, or enter a new one.

The connection for Cassandra is saved to the Existing Connections tab. In the future, you can reconnect to Cassandra from the saved connection.

Once the connection is established, the Data Access window opens to the Staging Area and you can begin importing data. For help importing data from Cassandra, see Working with the Data Access window.

Connection settings

Basic settings

Setting Description Example
Host

The IP address or host name of the Cassandra server.

 

Port The TCP port for the Cassandra database. 9042
Default keyspace The default keyspace (schema) to connect to in Cassandra.  
Authentication Mechanism

The authentication mechanism to use to connect to Cassandra server. Available options are as follows:

  • No Authentication
  • User Name and Password
No Authentication
User name The user name to use to access the Cassandra server.  
Password The password corresponding to the user name provided.  

Advanced settings

Setting Description Example
Query mode

Specifies the query mode to use when sending queries to Cassandra. The options available are:

  • SQL - Uses SQL_QUERY_MODE and executes all queries in SQL.
  • CQL - Uses CQL_QUERY_MODE and executes all queries in CQL.
  • SQL with CQL Fallback - Uses SQL_WITH_CQL_FALLBACK_QUERY_MODE and executes all queries in SQL by default. If a query fails, then the driver executes the query in CQL.
SQL with CQL Fallback
Tunable consistency The specific Cassandra replica or the number of Cassandra replicas that must process a query for the query to be successful. ONE
Load Balancing Policy Specifies the load balancing policy to be used.  
Binary column length The default column length to report for BLOB columns. 4000
String column length The default column length to report for ASCII, TEXT, and VARCHAR columns. 4000
Virtual table name separator The separator for naming a virtual table built from a collection.
The name of a virtual table consists of the name of the original table, then the separator, and then the name of the collection.
_vt_
Enable Token Aware Specifies whether to use a token-aware policy to improve load balancing and latency.  
Enable Latency Aware Specifies whether the driver must use a latency-awareness algorithm to distribute the load away from slower-performing nodes.  
Enable Null Values Insertion Specifies whether the driver must insert all NULL values as specified in INSERT statements.  
Enable Case Sensitive

Specifies whether the driver differentiates between capital and lower-case letters in schema, table, and column names.

If this option is enabled, all schemas, tables, and columns must be double quotation marks (").

 
Use SQL_WVARCHAR for string data type Specifies whether to use SQL_WVARCHAR for text and varchar types.  
Enable paging Specifies whether to split large result sets into pages.  
Rows per page When the Enable Paging option is enabled, use this option to specify the maximum number of rows to display on each page. 10000
SSL Options

Specifies how the driver uses SSL to connect to the Cassandra server. The options available are:

  • No SSL - The driver does not use SSL.
  • One-way Server Verification - If the Enable Server Hostname Verification option is enabled, the client verifies the Cassandra server using SSL. Otherwise, the driver connects to the Cassandra server using SSL but the client and server do not verify each other.

  • Two-way Server and Client Verification - If the Enable Server Hostname Verification option is enabled, the client and the Cassandra server verify each other using SSL. Otherwise, the driver connects to the Cassandra server using SSL but the client and server do not verify each other.

No SSL
Enable Server Hostname Verification Specifies whether the driver mandates the host name of the server to match the host name in the SSL certificate.  
Ssltrustedcertspath The full path to the .pem file containing the certificate for verifying the server.  
Client-side Certificate The full path to the .pem file containing the certificate for verifying the client.  
Client-side Private Key The full path to the file containing the private key used to verify the client.  
Key File Password The password for the private key file that is specified in the Client-side Private Key field.  

Querying Cassandra

One advantage of the Apache Cassandra design is the ability to store data that is denormalized into fewer tables. By taking advantage of nested data structures such as sets, lists, and maps, transactions can be simplified. However, Analytics does not support accessing this type of data. By re-normalizing the data contained within collections (sets, lists, and maps) into virtual tables, the connector allows users to directly interact with the data but leave the storage of the data in its denormalized form in Cassandra.

If a table contains any collection columns, when the table is queried for the first time, the connector creates the following virtual tables:

  • A "base" table, which contains the same data as the real table except for the collection columns.
  • A virtual table for each collection column, which expands the nested data.

Virtual tables refer to the data in the real table, enabling the connector to access the denormalized data. By querying the virtual tables, you can access the contents of Cassandra collections via ODBC.

The base table and virtual tables appear as additional tables in the list of tables that exist in the database. The base table uses the same name as the real table that it represents. The virtual tables that represent collections are named using the name of the real table, a separator (_vt_ by default), and the name of the column.

Example

ExampleTable is a Cassandra database table that contains an integer primary key column named pk_int, a list column, a map column, and a set column (named StringSet).

Source table with collections

pk_int List Map StringSet
1 ["1","2","3"] {"S1" : "a", "S2" : "b"} {"a", "b", "c"}
3 ["100","101","102","105"] {"S1" : "t"} {"a","e"}

The connector generates multiple virtual tables to represent this single table. The first virtual table is the base table:

Base table

pk_int
1
3

The base table contains the same data as the original database table except for the collections, which are omitted from this table and expanded in other virtual tables.

The following tables show the virtual tables that re-normalize the data from the List, Map, and StringSet columns:

List

pk_int List#index List#value
1 0 1
1 1 2
1 2 3
3 0 100
3 1 101
3 2 102
3 3 105

Map

pk_int Map#key Map#value
1 S1 a
1 S2 b
3 S1 t

StringSet

pk_int StringSet#value
1 a
1 b
1 c
3 a
3 e

The foreign key columns in the virtual tables reference the primary key columns in the real table, and indicate which real table row the virtual table row corresponds to. The columns with names that end with #index or #key indicate the position of the data within the original list or map. The columns with names that end with #value contain the expanded data from the collection.

Data connector updates

When you upgrade Analytics, the Robots Agent, or AX Server, you should test any of your scripts that import data using one of the Analytics data connectors (ACCESSDATA command).

The possibility exists that changes made by third-party data sources or ODBC driver vendors required updates to one or more of the data connectors. Scripted data connections may need to be updated in order to continue working correctly.

  • Re-run the import The easiest way to update a connection is to manually perform an import using the Data Access window in the upgraded version of Analytics. Copy the ACCESSDATA command from the log and use it to update your script.

    Note

    Before connecting to a data source and re-running the import, clear the connector cache to flush the existing set of table names.

    In the Existing Connections tab in the Data Access window, beside the connector name, select > Clear cache.

  • Update field specifications You may also need to update field specifications in the script body to align with table schema changes in the data source or ODBC driver. Possible changes include field names, field data types, and field and record lengths.
  • Check the results of any filtering You should also check the results of any filtering that you apply as part of the data import. Confirm that the import filtering is including and excluding records correctly.

Apache Cassandra data connector changes

Specific changes made to the Apache Cassandra data connector are listed below.

Analytics version Change

14.2

The connector no longer supports connecting to Apache Cassandra 2.0.

Connections can be made to Apache Cassandra 2.1, 2.2, and 3.0.