Skip to content

Using lakeFS with AWS Glue & Amazon Athena

Info

Available in lakeFS Enterprise

Tip

This integration requires the lakeFS Iceberg REST Catalog to be enabled. Contact us to get started!

Overview

Amazon Athena can query lakeFS-managed Apache Iceberg tables directly through AWS Glue Catalog Federation -- no data copying or metadata syncing required.

Athena discovers table metadata in real time through lakeFS and reads the underlying data files directly from S3.

Setup

Before querying from Athena, you need to create a federated Glue catalog that connects to your lakeFS Iceberg REST Catalog. Follow the Glue Data Catalog integration guide for step-by-step instructions on:

  1. Installing the lakefs-glue CLI tool.
  2. Creating a federated catalog pointing to a lakeFS repository and branch.
  3. Granting access to the appropriate IAM principals.

Querying from Athena

Once the federated catalog is created, query your lakeFS tables directly from Athena using the catalog name as a prefix:

SELECT * FROM "lakefs-catalog"."default"."my_table" LIMIT 10;

Run aggregations and joins across lakeFS-managed tables:

SELECT 
    category, 
    COUNT(*) AS total, 
    SUM(amount) AS total_amount
FROM "lakefs-catalog"."default"."transactions"
GROUP BY category
ORDER BY total_amount DESC;

Comparing Data Across Branches

By creating separate catalogs for different refs, you can compare data across branches, tags, or commits directly from Athena:

-- Compare row counts between production and dev
SELECT 'main' AS branch, COUNT(*) AS row_count 
FROM "my-repo-main"."default"."my_table"
UNION ALL
SELECT 'dev' AS branch, COUNT(*) AS row_count 
FROM "my-repo-dev"."default"."my_table";

Limitations

  • Read-only: AWS Glue Catalog Federation only supports read queries. INSERT, CREATE TABLE, and other write operations are not supported through Athena.
  • Single ref per catalog: Each federated catalog points to one lakeFS ref. Create multiple catalogs to query multiple branches or tags.
  • Flat namespaces only: AWS Glue Catalog Federation supports only flat catalog.namespace.table structures -- nested namespaces are not supported.