Connecting to Apache Cassandra

Apache Cassandra is a NoSQL database management system. Use Analytics's data connector to import your organization's Cassandra data.

Before you start

To connect to Cassandra, you must gather the following:

  • the database server's host name
  • the correct connection port
  • your username and password if using authentication

For help gathering the connection prerequisites, contact the Cassandra administrator in your organization. If your administrator cannot help you, you or your administrator should contact Cassandra Support.

Create a Cassandra connection

  1. From the Analytics main menu, select Import > Database and application.
  2. From the New Connections tab, in the ACL Connectors section, select Cassandra.

    Tip

    You can filter the list of available connectors by entering a search string in the Filter connections box. Connectors are listed alphabetically.

  3. In the Data Connection Settings panel, enter the connection settings and at the bottom of the panel, click Save and Connect.

    You can accept the default Connection Name, or enter a new one.

The connection for Cassandra is saved to the Existing Connections tab. In the future, you can reconnect to Cassandra from the saved connection.

Once the connection is established, the Data Access window opens to the Staging Area and you can begin importing data. For help importing data from Cassandra, see Import data using the Data Access window .

Querying Cassandra

One advantage of the Apache Cassandra design is the ability to store data that is denormalized into fewer tables. By taking advantage of nested data structures such as sets, lists, and maps, transactions can be simplified. However, Analytics does not support accessing this type of data. By re-normalizing the data contained within collections (sets, lists, and maps) into virtual tables, the connector allows users to directly interact with the data but leave the storage of the data in its denormalized form in Cassandra.

If a table contains any collection columns, when the table is queried for the first time, the connector creates the following virtual tables:

  • A "base" table, which contains the same data as the real table except for the collection columns.
  • A virtual table for each collection column, which expands the nested data.

Virtual tables refer to the data in the real table, enabling the connector to access the denormalized data. By querying the virtual tables, you can access the contents of Cassandra collections via ODBC.

The base table and virtual tables appear as additional tables in the list of tables that exist in the database. The base table uses the same name as the real table that it represents. The virtual tables that represent collections are named using the name of the real table, a separator (_vt_ by default), and the name of the column.

Example

ExampleTable is a Cassandra database table that contains an integer primary key column named pk_int, a list column, a map column, and a set column (named StringSet).

Source table with collections

pk_int List Map StringSet
1 ["1","2","3"] {"S1" : "a", "S2" : "b"} {"a", "b", "c"}
3 ["100","101","102","105"] {"S1" : "t"} {"a","e"}

The connector generates multiple virtual tables to represent this single table. The first virtual table is the base table:

Base table

pk_int
1
3

The base table contains the same data as the original database table except for the collections, which are omitted from this table and expanded in other virtual tables.

The following tables show the virtual tables that re-normalize the data from the List, Map, and StringSet columns:

List

pk_int List#index List#value
1 0 1
1 1 2
1 2 3
3 0 100
3 1 101
3 2 102
3 3 105

Map

pk_int Map#key Map#value
1 S1 a
1 S2 b
3 S1 t

StringSet

pk_int StringSet#value
1 a
1 b
1 c
3 a
3 e

The foreign key columns in the virtual tables reference the primary key columns in the real table, and indicate which real table row the virtual table row corresponds to. The columns with names that end with #index or #key indicate the position of the data within the original list or map. The columns with names that end with #value contain the expanded data from the collection.

Analytics 14.1 Help