Using lakeFS with Starburst Galaxy¶
Starburst Galaxy is a fully managed, cloud-native analytics platform built on Trino. You can connect Starburst Galaxy directly to the lakeFS Iceberg REST Catalog and query your lakeFS-versioned Iceberg tables, with every lakeFS branch and tag exposed as a queryable schema.
Iceberg REST Catalog¶
Starburst Galaxy connects to lakeFS through the lakeFS Iceberg REST Catalog, using its Apache Polaris catalog connector. lakeFS manages table metadata and versioning, while Starburst reads and writes the data files directly from the underlying object store — lakeFS stays outside the data path.
Info
The lakeFS Iceberg REST Catalog is a lakeFS Enterprise feature. Contact us to get started.
Scope the catalog to a repository¶
A Starburst Galaxy catalog points at a single Iceberg REST endpoint. Use a relative-namespace endpoint scoped to your repository so that every branch and tag in the repository is surfaced as a schema:
With a repository-scoped endpoint, namespaces are returned as <ref>.<namespace> (for example, main.inventory),
so a single catalog lets you query — and compare — multiple branches and tags side by side.
Tip
Scope to the repository (relative_to/<repository>), not to a single branch
(relative_to/<repository>.<branch>). The repository scope exposes every ref as its own schema, which is what
makes branch-to-branch and tag-to-branch comparison possible from one catalog.
Configuration¶
In Starburst Galaxy, go to Catalogs → Create catalog → Apache Polaris, and configure the following:
| Field | Value | Notes |
|---|---|---|
| Catalog name | e.g. lakefs |
The name you use in SQL (<catalog>.<schema>.<table>). |
| Authentication to S3 | AWS access key or Cross-account IAM role | Grants Galaxy direct read/write access to your data bucket (see below). |
| Polaris server endpoint | https://lakefs.example.com/iceberg/relative_to/<repository>/api |
The repository-scoped REST endpoint described above. |
| Polaris catalog | any value (e.g. lakefs) |
Required by the form, but ignored by lakeFS — the repository comes from the endpoint. |
| Client Id | <lakefs_access_key_id> |
Your lakeFS access key. |
| Client secret | <lakefs_secret_access_key> |
Your lakeFS secret key. |
Click Test connection, then save the catalog and add it to a cluster in the same region as your data bucket.
Data bucket permissions¶
The lakeFS Iceberg REST Catalog manages table metadata, while Starburst Galaxy reads and writes data files directly from your underlying storage (for example, Amazon S3). You must ensure the IAM role or access key that Galaxy uses has read/write access to your data bucket. The following AWS IAM policy provides the required permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "StarburstGalaxyIcebergAccess",
"Effect": "Allow",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::<lakefs_repo_storage_namespace>/_managed/",
"arn:aws:s3:::<lakefs_repo_storage_namespace>/_managed/*"
]
},
{
"Sid": "BucketLevelRequiredForStarburst",
"Effect": "Allow",
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket"
],
"Resource": "arn:aws:s3:::<lakefs_repo_storage_namespace_bucket_name>"
}
]
}
Querying¶
Each lakeFS ref is exposed as a schema named <ref>.<namespace>. Because that schema name contains a dot, quote it
as a single identifier:
-- approved data on the main branch
SELECT count(*) FROM "lakefs"."main.inventory"."books";
-- compare a candidate branch and a tag against main, all from the same catalog
SELECT count(*) FROM "lakefs"."staging.inventory"."books";
SELECT count(*) FROM "lakefs"."v1.inventory"."books";
Tip
To learn more about the Iceberg REST Catalog and relative namespaces, see the Iceberg REST Catalog documentation.