lakeFS requires a PostgreSQL database to synchronize actions on your repositories. This section assumes you already have a PostgreSQL database accessible from where you intend to install lakeFS. Instructions for creating the database can be found on the deployment instructions for AWS, Azure and GCP.
A production-suitable lakeFS installation will require three DNS records pointing at your lakeFS server. A good convention for those will be, assuming you already own the domain
s3.lakefs.example.com- this is the S3 Gateway Domain
The second record, the S3 Gateway Domain, needs to be specified in the lakeFS configuration (see the
S3_GATEWAY_DOMAIN placeholder below). This will allow lakeFS to route requests to the S3-compatible API. For more info, see Why do I need these three DNS records?
To deploy using Docker, create a yaml configuration file. Here is a minimal example, but you can see the reference for the full list of configurations.
database: connection_string: "[DATABASE_CONNECTION_STRING]" auth: encrypt: secret_key: "[ENCRYPTION_SECRET_KEY]" blockstore: type: s3 gateways: s3: domain_name: "[S3_GATEWAY_DOMAIN]"
database: connection_string: "[DATABASE_CONNECTION_STRING]" auth: encrypt: secret_key: "[ENCRYPTION_SECRET_KEY]" blockstore: type: gs # Uncomment the following lines to give lakeFS access to your buckets using a service account: # gs: # credentials_json: [YOUR SERVICE ACCOUNT JSON STRING] gateways: s3: domain_name: "[S3_GATEWAY_DOMAIN]"
database: connection_string: "postgres://user:pass@<AZURE_POSTGRES_SERVER_NAME>..." auth: encrypt: secret_key: "<RANDOM_GENERATED_STRING>" blockstore: type: azure azure: auth_method: msi # msi for active directory, access-key for access key # In case you chose to authenticate via access key replace unmark the following rows and insert the values from the previous step # storage_account: <your storage account> # storage_access_key: <your access key> gateways: s3: domain_name: s3.lakefs.example.com
Save the configuration file locally as
lakefs-config.yaml and run the following command:
docker run \ --name lakefs \ -p 8000:8000 \ -v $(pwd)/lakefs-config.yaml:/home/lakefs/.lakefs.yaml \ treeverse/lakefs:latest run
You should have a load balancer direct requests to the lakeFS server. By default, lakeFS operates on port 8000, and exposes a
/_health endpoint which you can use for health checks.
As mentioned above, you should create 3 DNS records for lakeFS:
- One record for the lakeFS API:
- Two records for the S3-compatible API:
All records should point to your Load Balancer, preferably with a short TTL value.
Multiple DNS records are needed to access the two different lakeFS APIs (covered in more detail in the Architecture section):
- The lakeFS OpenAPI: used by the
lakectlCLI tool. Exposes git-like operations (branching, diffing, merging etc.).
- An S3-compatible API: read and write your data in any tool that can communicate with S3. Examples include: AWS CLI, Boto, Presto and Spark.
lakeFS actually exposes only one API endpoint. For every request, lakeFS checks the
Host header. If the header is under the S3 gateway domain, the request is directed to the S3-compatible API.
The third DNS record (
*.s3.lakefs.example.com) allows for virtual-host style access. This is a way for AWS clients to specify the bucket name in the Host subdomain.