On-Premises Deployment
The instructions given here are for a self-managed deployment of lakeFS.
For a hosted lakeFS service with guaranteed SLAs, try lakeFS Cloud
⏰ Expected deployment time: 25 min
Prerequisites
To use lakeFS on-premises, you can either use the local blockstore adapter or have access to an S3-compatible object store such as MinIO.
For more information on how to set up MinIO, see the official deployment guide
Setting up a database
lakeFS requires a PostgreSQL database to synchronize actions on your repositories. This section assumes that you already have a PostgreSQL >= 11.0 database accessible.
Setting up a lakeFS Server
Connect to your host using SSH:
-
Create a
config.yaml
on your VM, with the following parameters:--- database: type: "postgres" postgres: connection_string: "[DATABASE_CONNECTION_STRING]" auth: encrypt: # replace this with a randomly-generated string. Make sure to keep it safe! secret_key: "[ENCRYPTION_SECRET_KEY]" blockstore: type: s3 s3: force_path_style: true endpoint: http://<minio_endpoint> discover_bucket_region: false credentials: access_key_id: <minio_access_key> secret_access_key: <minio_secret_key>
⚠️ Notice that the lakeFS Blockstore type is set to
s3
- This configuration works with S3-compatible storage engines such as MinIO. -
Download the binary to the server.
-
Run the
lakefs
binary:lakefs --config config.yaml run
Note: It’s preferable to run the binary as a service using systemd or your operating system’s facilities.
To support container-based environments, you can configure lakeFS using environment variables. Here is a docker run
command to demonstrate starting lakeFS using Docker:
docker run \
--name lakefs \
-p 8000:8000 \
-e LAKEFS_DATABASE_TYPE="postgres" \
-e LAKEFS_DATABASE_POSTGRES_CONNECTION_STRING="[DATABASE_CONNECTION_STRING]" \
-e LAKEFS_AUTH_ENCRYPT_SECRET_KEY="[ENCRYPTION_SECRET_KEY]" \
-e LAKEFS_BLOCKSTORE_TYPE="s3" \
-e LAKEFS_BLOCKSTORE_S3_FORCE_PATH_STYLE="true" \
-e LAKEFS_BLOCKSTORE_S3_ENDPOINT="http://<minio_endpoint>" \
-e LAKEFS_BLOCKSTORE_S3_DISCOVER_BUCKET_REGION="false" \
-e LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_KEY_ID="<minio_access_key>" \
-e LAKEFS_BLOCKSTORE_S3_CREDENTIALS_SECRET_ACCESS_KEY="<minio_secret_key>" \
treeverse/lakefs:latest run
⚠️ Notice that the lakeFS Blockstore type is set to s3
- This configuration works with S3-compatible storage engines such as MinIO.
See the reference for a complete list of environment variables.
You can install lakeFS on Kubernetes using a Helm chart.
To install lakeFS with Helm:
-
Copy the Helm values file relevant for S3-Compatible storage (MinIO in this example):
secrets: # replace this with the connection string of the database you created in a previous step: databaseConnectionString: [DATABASE_CONNECTION_STRING] # replace this with a randomly-generated string authEncryptSecretKey: [ENCRYPTION_SECRET_KEY] lakefsConfig: | blockstore: type: s3 s3: force_path_style: true endpoint: http://<minio_endpoint> discover_bucket_region: false credentials: access_key_id: <minio_access_key> secret_access_key: <minio_secret_key>
⚠️ Notice that the lakeFS Blockstore type is set to
s3
- This configuration works with S3-compatible storage engines such as MinIO. -
Fill in the missing values and save the file as
conf-values.yaml
. For more configuration options, see our Helm chart README.The
lakefsConfig
parameter is the lakeFS configuration documented here but without sensitive information. Sensitive information likedatabaseConnectionString
is given through separate parameters, and the chart will inject it into Kubernetes secrets. -
In the directory where you created
conf-values.yaml
, run the following commands:# Add the lakeFS repository helm repo add lakefs https://charts.lakefs.io # Deploy lakeFS helm install my-lakefs lakefs/lakefs -f conf-values.yaml
my-lakefs is the Helm Release name.
Load balancing
To configure a load balancer to direct requests to the lakeFS servers you can use the
LoadBalancer
Service type or a Kubernetes Ingress. By default, lakeFS operates on port 8000 and exposes a/_health
endpoint that you can use for health checks.💡 The NGINX Ingress Controller by default limits the client body size to 1 MiB. Some clients use bigger chunks to upload objects - for example, multipart upload to lakeFS using the S3-compatible Gateway or a simple PUT request using the OpenAPI Server. Checkout Nginx documentation for increasing the limit, or an example of Nginx configuration with MinIO.
Secure connection
Using a load balancer or cluster manager for TLS/SSL termination is recommended. It helps speed the decryption process and reduces the processing burden from lakeFS.
In case lakeFS needs to listen and serve with HTTPS, for example for development purposes, update its config yaml with the following section:
tls:
enabled: true
cert_file: server.crt # provide path to your certificate file
key_file: server.key # provide path to your server private key
Local Blockstore
You can configure a block adapter to a POSIX compatible storage location shared by all lakeFS instances. Using the shared storage location, both data and metadata will be stored there.
Using the local blockstore import and allowing lakeFS access to a specific prefix, it is possible to import files from a shared location.
Import is not enabled by default, as it doesn’t assume the local path is shared and there is a security concern about accessing a path outside the specified in the blockstore configuration.
Enabling is done by blockstore.local.import_enabled
and blockstore.local.allowed_external_prefixes
as described in the configuration reference.
Sample configuration using local blockstore
database:
type: "postgres"
postgres:
connection_string: "[DATABASE_CONNECTION_STRING]"
auth:
encrypt:
# replace this with a randomly-generated string. Make sure to keep it safe!
secret_key: "[ENCRYPTION_SECRET_KEY]"
blockstore:
type: local
local:
path: /shared/location/lakefs_data # location where data and metadata kept by lakeFS
import_enabled: true # required to be true to enable import files
# from `allowed_external_prefixes` locations
allowed_external_prefixes:
- /shared/location/files_to_import # location with files we can import into lakeFS, require access from lakeFS
Limitations
- Using a local adapter on a shared location is relativly new and not battle-tested yet
- lakeFS doesn’t control the way a shared location is managed across machines
- When using lakectl or the lakeFS UI, you can currently import only directories. If you need to import a single file, use the HTTP API or API Clients with
type=object
in the request body anddestination=<full-path-to-file>
. - Garbage collector (for committed and uncommitted) and lakeFS Hadoop FileSystem currently unsupported
Create the admin user
When you first open the lakeFS UI, you will be asked to create an initial admin user.
- Open
http://<lakefs-host>/
in your browser. If you haven’t set up a load balancer, this will likely behttp://<instance ip address>:8000/
-
On first use, you’ll be redirected to the setup page:
-
Follow the steps to create an initial administrator user. Save the credentials you’ve received somewhere safe, you won’t be able to see them again!
- Follow the link and go to the login screen. Use the credentials from the previous step to log in.
Create your first repository
- Use the credentials from the previous step to log in
-
Click Create Repository and choose Blank Repository.
- Under Storage Namespace, enter a path to your desired location on the object store. This is where data written to this repository will be stored.
- Click Create Repository
-
You should now have a configured repository, ready to use!
Congratulations! Your environment is now ready 🤩