Link Search Menu Expand Document

Using lakeFS with Hive

The Apache Hive ™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.

Table of contents

  1. Configuration
  2. Examples
    1. Example with schema
    2. Example with external table

Configuration

In order to configure hive to work with lakeFS we will set the lakeFS credentials in the corresponding S3 credential fields.

lakeFS endpoint: fs.s3a.endpoint

lakeFS access key: fs.s3a.access.key

lakeFS secret key: fs.s3a.secret.key

Note In the following examples we set AWS credentials at runtime, for clarity. In production, these properties should be set using one of Hadoop’s standard ways of Authenticating with S3.

For example, we could add the configurations to the file hdfs-site.xml:

<configuration>
    ...
    <property>
        <name>fs.s3a.secret.key</name>
        <value>wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY</value>
    </property>
    <property>
        <name>fs.s3a.access.key</name>
        <value>AKIAIOSFODNN7EXAMPLE</value>
    </property>
    <property>
        <name>fs.s3a.endpoint</name>
        <value>https://lakefs.example.com</value>
    </property>
    <property>
       <name>fs.s3a.path.style.access</name>
       <value>true</value>
    </property>
</configuration>

Examples

Example with schema

CREATE  SCHEMA example LOCATION 's3a://example/main/' ;
CREATE TABLE example.request_logs (
    request_time timestamp,
    url string,
    ip string,
    user_agent string
);

Example with external table

CREATE EXTERNAL TABLE request_logs (
    request_time timestamp,
    url string,
    ip string,
    user_agent string
) LOCATION 's3a://example/main/request_logs' ;