Connecting to Apache Hive

Concept Information

ACCESSDATA command

Apache Hive is a cloud data service. You can use the Apache Hive data connector to import your organization's Hive data.

Note

Analytics provides Hive as an optional connector and if it is not available in your Data Access window, it is likely that the connector was not selected during installation. For more information, see Install optional Analytics data connectors and Python engine.

Before you start

To connect to Hive, you must gather the following:

  • username and password
  • the correct connection port
  • the authentication scheme used
  • the server's host name or IP address
  • the transport mode to communicate with the server
  • Read access

For help gathering the connection prerequisites, contact the Hive administrator in your organization. If your administrator cannot help you, you or your administrator should contact Hive Support.

Create a Hive connection

  1. From the Analytics main menu, select Import > Database and application.
  2. From the New Connections tab, in the ACL Connectors section, select Hive.

    Tip

    You can filter the list of available connectors by entering a search string in the Filter connections box. Connectors are listed alphabetically.

  3. In the Data Connection Settings panel, enter the connection settings and at the bottom of the panel, click Save and Connect.

    You can accept the default Connection Name, or enter a new one.

The connection for Hive is saved to the Existing Connections tab. In the future, you can reconnect to Hive from the saved connection.

Once the connection is established, the Data Access window opens to the Staging Area and you can begin importing data. For help importing data from Hive, see Working with the Data Access window.

Connection settings

Basic settings

Setting Description Example
Hive Server Type Specifies the Hive Server instance to connect to. Hive Server 2
Service Discovery Mode

Specifies how the Hive Server services are discovered. The options available are:

  • No Service Discovery - The driver connects to Hive without using a discovery service.
  • ZooKeeper - The driver discovers Hive Server services through the ZooKeeper service.
No Service Discovery
Host(s) IP address or host name of the Hive server.  
Port The port for the connection to the Hive Server instance. 10000
Database Name of the database schema to use when a schema is not explicitly specified in a query.  
ZooKeeper Namespace The namespace configured on ZooKeeper for the Hive Server 2 znodes.  
Authentication Mechanism

Specifies the authentication mechanism to use. The options available are:

  • No Authentication
  • Kerberos
  • User Name
  • Username and Password
  • Windows Azure HDInsight Service
No Authentication
Realm The realm of the Hive Server 2 host.  
Host FQDN Fully qualified domain name of the Hive Server host. _HOST
Service Name The Kerberos service principal name of the Hive server.  
User Name User name to authenticate to the Hive Server.  
Password Password for the username to authenticate to the Hive Server.  
Delegation UID User ID of the delegated user to whom the driver must delegate all Hive operations, rather than to the authenticated user for the connection.  
Thrift Transport

Specifies the transport protocol to use in the Thrift layer. The options available are:

  • Binary
  • SASL
  • HTTP
Binary

Advanced settings

Setting Description Example
Enable SSL Specifies whether the client uses an SSL encrypted connection to communicate with the Hive server.  
Allow Common Name Host Name Mismatch Specifies whether a CA-issued SSL certificate name must match the host name of the Hive server.  
Allow Self-signed Server Certificate Specifies whether the driver allows a connection to the Hive server that uses a self-signed certificate, even if this certificate is not in the list of trusted certificates.  
Trusted Certificates The full path of the .pem file containing trusted CA certificates, for verifying the server when using SSL.  
Two-way SSL Specifies whether two-way SSL is enabled.  
Client Certificate File The full path to the .pem file containing the SSL certificate of the client.  
Client Private Key File The full path to the .pem file containing the SSL private key of the client.  
Client Private Key Password The password of the private key file specified in the Client Private Key File field.  
Use Native Query Specifies whether the driver uses native HiveQL queries. If this option is not selected the driver converts the queries emitted by an application into an equivalent form in HiveQL.  
Fast SQLPrepare Specifies whether the driver defers query execution to SQLExecute.  
Driver Config Take Precedence Specifies whether driver-wide configuration settings take precedence over connection and DSN settings.  
Use Async Exec Specifies whether to execute queries synchronously or asynchronously.  
Async Exec Poll Interval The time in milliseconds between each poll for the query execution status. 100
Get Tables with Query Specifies whether the driver uses the SHOW TABLES query to retrieve table names from the database. If disabled, the driver uses GetTables Thrift API call.  
Unicode SQL character types Specifies the SQL types to be returned for string data types. When enabled, the driver returns SQL_WVARCHAR for STRING and VARCHAR columns, and returns SQL_WCHAR for CHAR columns.  
Show System Table Specifies whether the driver returns the hive_system table for catalog function calls such as SQLTables and SQLColumns.  
Use only SSPI Specifies whether the driver handles Kerberos authentication with the SSPI plugin or with MIT Kerberos.  
Invalid Session Auto Recover Specifies whether the driver opens a new session automatically when the existing session is no longer valid.  
Rows fetched per block Maximum number of rows that a query returns at a time. 10000
Default string column length Maximum number of characters that can be contained in STRING columns. 255
Binary column length Maximum data length for BINARY columns. 32767
Decimal column scale Maximum number of digits to the right of the decimal point for numeric data types. 10
Socket Timeout Number of seconds that an operation can remain idle before it is closed. 60
HTTP Path The partial URL corresponding to the Hive server.  

Hive connection fields

Column unique names

Hive connections that are made through the Data Access window use a connection string parameter EnableUniqueColumnName that is set to 0 by default. This parameter must have a value of 0 to ensure that correct column names are retrieved when connecting.

If you create a Hive connection using a DSN rather than from the Data Access window, this value is set to 1 by default. You must change it to 0 in the Windows registry for your connection to work.

Note

Scripts that use DSN connections established in versions of ACL older than 13.1 continue to work after upgrading to version 13.1.