Skip to content

Using lakeFS with Starburst Galaxy

Starburst Galaxy is a fully managed, cloud-native analytics platform built on Trino. You can connect Starburst Galaxy directly to the lakeFS Iceberg REST Catalog and query your lakeFS-versioned Iceberg tables, with every lakeFS branch and tag exposed as a queryable schema.

Iceberg REST Catalog

Starburst Galaxy connects to lakeFS through the lakeFS Iceberg REST Catalog, using its Apache Polaris catalog connector. lakeFS manages table metadata and versioning, while Starburst reads and writes the data files directly from the underlying object store — lakeFS stays outside the data path.

Info

The lakeFS Iceberg REST Catalog is a lakeFS Enterprise feature. Contact us to get started.

Scope the catalog to a repository

A Starburst Galaxy catalog points at a single Iceberg REST endpoint. Use a relative-namespace endpoint scoped to your repository so that every branch and tag in the repository is surfaced as a schema:

https://lakefs.example.com/iceberg/relative_to/<repository>/api

With a repository-scoped endpoint, namespaces are returned as <ref>.<namespace> (for example, main.inventory), so a single catalog lets you query — and compare — multiple branches and tags side by side.

Tip

Scope to the repository (relative_to/<repository>), not to a single branch (relative_to/<repository>.<branch>). The repository scope exposes every ref as its own schema, which is what makes branch-to-branch and tag-to-branch comparison possible from one catalog.

Configuration

In Starburst Galaxy, go to Catalogs → Create catalog → Apache Polaris, and configure the following:

Field Value Notes
Catalog name e.g. lakefs The name you use in SQL (<catalog>.<schema>.<table>).
Authentication to S3 AWS access key or Cross-account IAM role Grants Galaxy direct read/write access to your data bucket (see below).
Polaris server endpoint https://lakefs.example.com/iceberg/relative_to/<repository>/api The repository-scoped REST endpoint described above.
Polaris catalog any value (e.g. lakefs) Required by the form, but ignored by lakeFS — the repository comes from the endpoint.
Client Id <lakefs_access_key_id> Your lakeFS access key.
Client secret <lakefs_secret_access_key> Your lakeFS secret key.

Click Test connection, then save the catalog and add it to a cluster in the same region as your data bucket.

Data bucket permissions

The lakeFS Iceberg REST Catalog manages table metadata, while Starburst Galaxy reads and writes data files directly from your underlying storage (for example, Amazon S3). You must ensure the IAM role or access key that Galaxy uses has read/write access to your data bucket. The following AWS IAM policy provides the required permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "StarburstGalaxyIcebergAccess",
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::<lakefs_repo_storage_namespace>/_managed/",
                "arn:aws:s3:::<lakefs_repo_storage_namespace>/_managed/*"
            ]
        },
        {
            "Sid": "BucketLevelRequiredForStarburst",
            "Effect": "Allow",
            "Action": [
                "s3:GetBucketLocation",
                "s3:ListBucket"
            ],
            "Resource": "arn:aws:s3:::<lakefs_repo_storage_namespace_bucket_name>"
        }
    ]
}

Querying

Each lakeFS ref is exposed as a schema named <ref>.<namespace>. Because that schema name contains a dot, quote it as a single identifier:

-- approved data on the main branch
SELECT count(*) FROM "lakefs"."main.inventory"."books";

-- compare a candidate branch and a tag against main, all from the same catalog
SELECT count(*) FROM "lakefs"."staging.inventory"."books";
SELECT count(*) FROM "lakefs"."v1.inventory"."books";

Tip

To learn more about the Iceberg REST Catalog and relative namespaces, see the Iceberg REST Catalog documentation.