Using lakeFS with AWS Glue & Amazon Athena¶
Info
Available in lakeFS Enterprise
Tip
This integration requires the lakeFS Iceberg REST Catalog to be enabled. Contact us to get started!
Overview¶
Amazon Athena can query lakeFS-managed Apache Iceberg tables directly through AWS Glue Catalog Federation -- no data copying or metadata syncing required.
Athena discovers table metadata in real time through lakeFS and reads the underlying data files directly from S3.
Setup¶
Before querying from Athena, you need to create a federated Glue catalog that connects to your lakeFS Iceberg REST Catalog. Follow the Glue Data Catalog integration guide for step-by-step instructions on:
- Installing the
lakefs-glueCLI tool. - Creating a federated catalog pointing to a lakeFS repository and branch.
- Granting access to the appropriate IAM principals.
Querying from Athena¶
Once the federated catalog is created, query your lakeFS tables directly from Athena using the catalog name as a prefix:
Run aggregations and joins across lakeFS-managed tables:
SELECT
category,
COUNT(*) AS total,
SUM(amount) AS total_amount
FROM "lakefs-catalog"."default"."transactions"
GROUP BY category
ORDER BY total_amount DESC;
Comparing Data Across Branches¶
By creating separate catalogs for different refs, you can compare data across branches, tags, or commits directly from Athena:
-- Compare row counts between production and dev
SELECT 'main' AS branch, COUNT(*) AS row_count
FROM "my-repo-main"."default"."my_table"
UNION ALL
SELECT 'dev' AS branch, COUNT(*) AS row_count
FROM "my-repo-dev"."default"."my_table";
Limitations¶
- Read-only: AWS Glue Catalog Federation only supports read queries.
INSERT,CREATE TABLE, and other write operations are not supported through Athena. - Single ref per catalog: Each federated catalog points to one lakeFS ref. Create multiple catalogs to query multiple branches or tags.
- Flat namespaces only: AWS Glue Catalog Federation supports only flat
catalog.namespace.tablestructures -- nested namespaces are not supported.