Using lakeFS with Dremio¶
Dremio is a next-generation data lake engine that liberates your data with live, interactive queries directly on cloud data lake storage, including S3 and lakeFS.
Iceberg REST Catalog¶
lakeFS Iceberg REST Catalog allow you to use lakeFS as a spec-compliant Apache Iceberg REST catalog, allowing Dremio to manage and access tables using a standard REST API.
This is the recommended way to use lakeFS with Dremio, as it allows lakeFS to stay completely outside the data path: data itself is read and written by Dremio executors, directly to the underlying object store. Metadata is managed by Iceberg at the table level, while lakeFS keeps track of new snapshots to provide versioning and isolation.
Read more about using the Iceberg REST Catalog.
Configuration¶
To configure Dremio to work with the Iceberg REST Catalog, you need to configure the Iceberg REST Catalog in Dremio.
- On the Datasets page, to the right of Sources in the left panel, click
+ -
In the Add Data Source dialog, under Lakehouse Catalogs, select Iceberg REST Catalog Source. The New Iceberg REST Catalog Source dialog box appears, which contains the following tabs:
- In General →
- Enter a name for your Iceberg REST Catalog source, specify the endpoint URI (i.e.
https://lakefs.example.com/iceberg/api) - Uncheck "Use vended credentials"
- Enter a name for your Iceberg REST Catalog source, specify the endpoint URI (i.e.
-
In Advanced Options → Catalog Properties, add the following key-value pairs (left = key, right = value):
Key Value Notes oauth2-server-urihttps://lakefs.example.com/iceberg/api/v1/oauth/tokensYour lakeFS OAuth2 token endpoint (not the catalog URL). credential<lakefs_access_key>:<lakefs_secret_key>Your lakeFS credentials. fs.s3a.aws.credentials.providerorg.apache.hadoop.fs.s3a.SimpleAWSCredentialsProviderUse static AWS credentials. fs.s3a.access.key<aws_access_key_id>AWS key with read/write access to your data bucket. fs.s3a.secret.key<aws_secret_access_key>AWS secret key. dremio.s3.list.all.bucketsfalseAvoid listing all buckets during initialization.
- In General →
-
Click Save to create the Iceberg REST Catalog source.
Data Bucket Permissions¶
The lakeFS Iceberg Catalog manages table metadata, while Dremio reads and writes data files directly from your underlying storage (for example, Amazon S3).
You must ensure that the IAM role or user Dremio uses has read/write access to your data bucket. The following AWS IAM policy provides the required permissions for direct access:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DremioIcebergAccess",
"Effect": "Allow",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::<lakefs_repo_storage_namespace>/_managed/",
"arn:aws:s3:::<lakefs_repo_storage_namespace>/_managed/*"
]
},
{
"Sid": "BucketLevelRequiredForDremio",
"Effect": "Allow",
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket"
],
"Resource": "arn:aws:s3:::<lakefs_repo_storage_namespace_bucket_name>"
}
]
}
Tip
To learn more about the Iceberg REST Catalog, see the Iceberg REST Catalog documentation.
Using Dremio with the S3 Gateway¶
Alternatively, you can use the S3 Gateway to read and write data to lakeFS from Dremio.
While flexible, this approach requires lakeFS to be involved in the data path, which can be less efficient than the Iceberg REST Catalog approach, since lakeFS has to proxy all data operations through the lakeFS server. This is particularly true for large data sets where network bandwidth might incur some overhead.
Configuration¶
Starting from version 3.2.3, Dremio supports Minio as an experimental S3-compatible plugin. Similarly, you can connect lakeFS with Dremio.
Suppose you already have both lakeFS and Dremio deployed, and want to use Dremio to query your data in the lakeFS repositories. You can follow the steps listed below to configure on Dremio UI:
- click Add Data Lake.
- Under File Stores, choose Amazon S3.
- Under Advanced Options, check Enable compatibility mode (experimental).
- Under Advanced Options > Connection Properties, add
fs.s3a.path.style.accessand set the value totrue. - Under Advanced Options > Connection Properties, add
fs.s3a.endpointand set lakeFS S3 endpoint to the value. - Under the General tab, specify the access_key_id and secret_access_key provided by lakeFS server.
- Click Save, and now you should be able to browse lakeFS repositories on Dremio.
