lakeFS Tables
lakeFS Enterprise
lakeFS Tables are currently in private preview for lakeFS Enterprise customers. Contact us to get started!
What are lakeFS Tables?
lakeFS Tables allow you to use lakeFS as a spec-compliant Apache Iceberg REST catalog, allowing Iceberg clients to manage Iceberg tables using a standard REST API.
Using lakeFS Tables, you can use lakeFS a drop-in replacement for other Iceberg catalogs like AWS Glue, Nessie, or Hive Metastore.
With lakeFS Tables, you can:
- Manage Iceberg tables with full version control capabilities.
- Use standard Iceberg clients and tools without modification.
- Leverage lakeFS’s branching and merging features for managing table’s lifecycle.
- Maintain data consistency across different environments.
Use Cases
- Version-Controlled Data Development:
- Create feature branches for table schema changes or data migrations
- Test modifications in isolation, across multiple tables
- Merge changes safely with conflict detection
- Multi-Environment Management:
- Use branches to represent different environments (dev, staging, prod)
- Promote changes between environments through merges, with automated testing
- Maintain consistent table schemas across environments
- Collaborative Data Development:
- Multiple teams can work on different table features simultaneously
- Maintain data quality through pre-merge validations
- Collaborate using pull requests on changes to data and schema
- Manage and Govern Access to data:
- Use the detailed built-in commit log capturing who, what and how data is changed
- Manage access using fine grained access control to users and groups using RBAC policies
- Rollback changes atomically and safely to reduce time-to-recover and increase system stability
Configuration
The Iceberg REST catalog API is exposed at /iceberg/api
in your lakeFS server.
To use it:
- Enable the feature (contact us for details).
- Configure your Iceberg clients to use the lakeFS REST catalog endpoint.
- Use your lakeFS access key and secret for authentication.
Catalog Initialization Example (using pyiceberg
)
from pyiceberg.catalog import load_catalog
catalog = RestCatalog(**{
'uri': f'{lakefs_endpoint}/iceberg/api',
'oauth2-server-uri': f'{lakefs_endpoint}/iceberg/api/v1/oauth/tokens',
'credential': f'{lakefs_client_key}:{lakefs_client_secret}',
})
Example Client code
import lakefs
from pyiceberg.catalog import load_catalog
# Initialize the catalog
catalog = RestCatalog(**{
'uri': 'https://lakefs.example.com/iceberg/api',
'oauth2-server-uri': 'https://lakefs.example.com/iceberg/api/iceberg/api/v1/oauth/tokens',
'credential': f'AKIAlakefs12345EXAMPLE:abc/lakefs/1234567bPxRfiCYEXAMPLEKEY',
})
# List namespaces in a branch
catalog.list_namespaces(('repo', 'main'))
# Query a table
catalog.list_tables('repo.main.inventory')
table = catalog.load_table('repo.main.inventory.books')
arrow_df = table.scan().to_arrow()
-- List tables in the iceberg catalog
USE "repo.main.inventory"; -- <repository>.<branch or reference>.<namespace>
SHOW TABLES;
-- Query a table
SELECT * FROM books LIMIT 100;
-- Switch to a different branch
USE "repo.new_branch.inventory";
SELECT * FROM books;
// Configure Spark to use the lakeFS REST catalog
spark.sql("USE my_repo.main.namespace")
// List available tables
spark.sql("SHOW TABLES").show()
// Query data with branch isolation
spark.sql("SELECT * FROM my_table").show()
// Switch to a feature branch
spark.sql("USE my_repo.new_branch.namespace")
spark.sql("SELECT * FROM my_table").show()
CREATE EXTERNAL CATALOG lakefs
COMMENT "lakeFS REST Catalog"
PROPERTIES (
"type" = "iceberg",
"iceberg.catalog.type" = "rest",
"iceberg.catalog.uri" = "https://lakefs.example.com/catalog/iceberg/v1",
"client.factory" = "com.starrocks.connector.iceberg.IcebergAwsClientFactory"
);
-- Use a specific repository and branch
SELECT * FROM frosty.`repo_name.main.namespace`.table_name;
Namespaces and Tables
Namespace Operations
The Iceberg Catalog supports Iceberg namespace operations:
- Create namespaces
- List namespaces
- Drop namespaces
- List tables within namespaces
Namespace Usage
Namespaces in the Iceberg Catalog follow the pattern "<repository>.<branch>.<namespace>(.<namespace>...)"
where:
<repository>
must be a valid lakeFS repository name.<branch>
must be a valid lakeFS branch name.<namespace>
components can be nested using unit separator (e.g.,inventory.books
).
Examples:
my-repo.main.inventory
my-repo.feature-branch.inventory.books
The repository and branch components must already exist in lakeFS before using them in the Iceberg catalog.
Namespace Restrictions
- Repository and branch names must follow lakeFS naming conventions.
- Namespace components cannot contain special characters except dots (.) for nesting.
- The total namespace path length must be less than 255 characters.
- Namespaces are case-sensitive.
- Empty namespace components are not allowed.
Table Operations
The Iceberg Catalog supports all standard Iceberg table operations:
- Create tables with schemas and partitioning.
- Update table schemas and partitioning.
- Commit changes to tables.
- Delete tables.
- List tables in namespaces.
Version Control Features
The Iceberg Catalog integrates with lakeFS’s version control system, treating each table change as a commit. This provides a complete history of table modifications and enables branching and merging workflows.
Catalog Changes as Commits
Each modification to a table (schema changes, data updates, etc.) creates a new commit in lakeFS. Creating or deleting a namespace or a table results in a lakeFS commit on the relevant branch, as well as table data updates (“Iceberg table commit”).
Branching
Create a new branch to work on table changes:
# Create a lakeFS branch using lakeFS Python SDK
branch = lakefs.repository('repo').branch('new_branch').create(source_reference='main')
# The table is now accessible in the new branch
new_table = catalog.load_table(f'repo.{branch.id}.inventory.books')
Merging
Merge changes between branches:
# Merge the branch using lakeFS Python SDK
branch.merge_into('main')
# Changes are now visible in main
main_table = catalog.load_table('repo.main.inventory.books')
Currently, lakeFS handles table changes as file operations during merges. This means that when merging branches with table changes, lakeFS treats the table metadata files as regular files. No special merge logic is applied to handle conflicting table changes, and if there are conflicting changes to the same table in different branches, the merge will fail with a conflict that needs to be resolved manually.
Authentication
lakeFS provides an OAuth2 token endpoint at /catalog/iceberg/v1/oauth/tokens
that clients need to configure.
To authenticate, clients must provide their lakeFS access key and secret in the format access_key:secret
as the credential.
The authorization requirements are managed at the lakeFS level, meaning:
- Users need appropriate lakeFS permissions to access repositories and branches
- Table operations require lakeFS permissions on the underlying objects
- The same lakeFS RBAC policies apply to Iceberg catalog operations
Limitations
Current Limitations
The following features are not yet supported or implemented:
- Table Maintenance:
- See Table Maintenance section for details
- Catalog Sync:
- Push/pull operations with other catalogs
- Advanced Features:
- Views (all view operations are unsupported)
- Transactional changes (
stage-create
) - Multi-table transactions
- Server-side query planning
- Table renaming
- Updating table’s location (using Commit)
- Table statistics (set-statistics and remove-statistics operations)
- Advanced Merging:
- Merging tables with conflicting changes
- Specialized merge strategies for different table operations
- Table Registration:
- Registering existing Iceberg tables from other catalogs
- Importing tables from external sources
In addition, currently only Iceberg v2
table format is supported.
Table Maintenance
The following table maintenance operations are not supported in the current version:
- Drop table with purge
- Compact data files
- Rewrite manifests
- Expire snapshots
- Remove old metadata files
- Delete orphan files
To prevent data loss, clients should disable their own cleanup operations by:
- Disabling orphan file deletion.
- Setting
remove-dangling-deletes
to false when rewriting.- Disabling snapshot expiration.
- Setting a very high value for
min-snapshots-to-keep
parameter.
Storage Compatibility
lakeFS Tables were tested to work with Amazon S3 and Google Cloud Storage. Other storage backends, such as Azure or Local storage are currently not supported, but will be in future releases.
Future Releases
The following features are planned for future releases:
- Catalog Sync:
- Support for pushing/pulling tables to/from other catalogs.
- Integration with AWS Glue and other Iceberg-compatible catalogs.
- Table Import:
- Support for importing existing Iceberg tables from other catalogs.
- Bulk import capabilities for large-scale migrations.
- Advanced Features:
- Views API support.
- Table transactions.
- Azure Storage Support