Connecting to Cloudera Impala

Concept Information

ACCESSDATA command

Cloudera Impala is a cloud data service. You can use the Cloudera Impala data connector to import your organization's Impala data.

Note

Analytics provides Impala as an optional connector and if it is not available in your Data Access window, it is likely that the connector was not selected during installation. For more information, see Install optional Analytics data connectors and Python engine.

Before you start

To connect to Impala, you must gather the following:

  • username
  • password
  • Read access

For help gathering the connection prerequisites, contact the Impala administrator in your organization. If your administrator cannot help you, you or your administrator should contact Impala Support.

Create an Impala connection

  1. From the Analytics main menu, select Import > Database and application.
  2. From the New Connections tab, in the ACL Connectors section, select Impala.

    Tip

    You can filter the list of available connectors by entering a search string in the Filter connections box. Connectors are listed alphabetically.

  3. In the Data Connection Settings panel, enter the connection settings and at the bottom of the panel, click Save and Connect.

    You can accept the default Connection Name, or enter a new one.

The connection for Impala is saved to the Existing Connections tab. In the future, you can reconnect to Impala from the saved connection.

Once the connection is established, the Data Access window opens to the Staging Area and you can begin importing data. For help importing data from Impala, see Working with the Data Access window.

Connection settings

Basic settings

Setting Description Example
Host

IP address or host name of the Impala server.

 
Port Port for the connection to the Impala server instance.  
Database Name of the Impala database to use by default.  
Authentication Mechanism

The authentication mechanism to use. Options available are:

  • No Authentication
  • Kerberos
  • SASL User Name
  • User Name and Password
No Authentication
Realm Realm of the Impala host.  
Host FQDN Fully qualified domain name of the Impala host. _HOST
Service Name Kerberos service principal name of the Impala server. impala
User Name User name to access the Impala server.  
Password Password to authenticate access to the Impala server.  
Transport Buffer Size Number of bytes to reserve in memory for buffering unencrypted data from the network. 1000
Use Simple Authentication and Security Layer (SASL) Specifies whether the driver uses SASL to handle authentication.  
Delegation UID When a user ID is specified for this option, the Impala driver delegates all operations against Impala to the specified user, rather than to the authenticated user for the connection.  

Advanced settings

Setting Description Example
Enable SSL

Specifies whether the client uses an SSL encrypted connection to communicate with the Impala.

 
Allow Common Name Host Name Mismatch Specifies whether a CA-issued SSL certificate name must match the host name of the Impala server.  
Allow Self-signed Server Certificate Specifies whether the driver allows a connection to an Impala server that uses a self-signed certificate.  
Trusted Certificates Full path of the .pem file containing the trusted CA certificates, for verifying the server when using SSL.  
Use Native Query Specifies whether the driver uses native Impala SQL queries. If this option is not selected, the driver converts the queries emitted by an application into an equivalent form in Impala SQL. If the application is Impala-aware and already emits Impala SQL, then enable this option to avoid the extra overhead of query transformation.  
Enable Simulated Transactions Specifies whether the driver should simulate transactions. When disabled, the driver returns an error if it attempts to run a query that contains transaction statements.  
Use SQL Unicode Types Specifies the SQL types to be returned for string data types. When enabled, the driver returns SQL_WVARCHAR for STRING and VARCHAR columns, and returns SQL_WCHAR for CHAR columns.  
Rows fetched per block Maximum number of rows that a query returns at a time. 10000
Socket timeout

Number of seconds that the TCP socket waits for a response from the server before timing out the request and returning an error message.

When set to 0, the TCP socket does not time out any requests.

30
String Column Length Maximum number of characters that can be contained in STRING columns. 32767

Data connector updates

When you upgrade Analytics, the Robots Agent, or AX Server, you should test any of your scripts that import data using one of the Analytics data connectors (ACCESSDATA command).

The possibility exists that changes made by third-party data sources or ODBC driver vendors required updates to one or more of the data connectors. Scripted data connections may need to be updated in order to continue working correctly.

  • Re-run the import The easiest way to update a connection is to manually perform an import using the Data Access window in the upgraded version of Analytics. Copy the ACCESSDATA command from the log and use it to update your script.

    Note

    Before connecting to a data source and re-running the import, clear the connector cache to flush the existing set of table names.

    In the Existing Connections tab in the Data Access window, beside the connector name, select > Clear cache.

  • Update field specifications You may also need to update field specifications in the script body to align with table schema changes in the data source or ODBC driver. Possible changes include field names, field data types, and field and record lengths.
  • Check the results of any filtering You should also check the results of any filtering that you apply as part of the data import. Confirm that the import filtering is including and excluding records correctly.