Installing lakeFS
You are now ready to install the lakeFS server. Following are some options for doing that.
Table of contents
Kubernetes with Helm
lakeFS can be easily installed on Kubernetes using a Helm chart. To install lakeFS with Helm:
-
Create a
conf-values.yaml
file, replacing values as described in the comments:secrets: # replace this with the connection string of the database you created in a previous step: databaseConnectionString: postgres://postgres:myPassword@my-lakefs-db.rds.amazonaws.com:5432/lakefs?search_path=lakefs # replace this with a randomly-generated string authEncryptSecretKey: <some random string> lakefsConfig: | blockstore: type: s3 s3: region: us-east-1 gateways: s3: # replace this with the host you will use for the lakeFS S3-compatible endpoint: domain_name: s3.lakefs.example.com
See below for more configuration options. The
lakefsConfig
parameter is the lakeFS configuration documented here, but without sensitive information. Sensitive information likedatabaseConnectionString
is given through separate parameters, and the chart will inject them into Kubernetes secrets. -
In the directory where you created
conf-values.yaml
, run the following commands:# Add the lakeFS repository helm repo add lakefs https://charts.lakefs.io # Deploy lakeFS helm install example-lakefs lakefs/lakefs -f conf-values.yaml
example-lakefs
is the Helm Release name.
You should give your Kubernetes nodes access to all S3 buckets you intend to use lakeFS with. If you can’t provide such access, lakeFS can be configured to use an AWS key-pair to authenticate (part of the lakefsConfig
YAML below).
Configurations
Parameter | Description | Default |
---|---|---|
secrets.databaseConnectionString | PostgreSQL connection string to be used by lakeFS | |
secrets.authEncryptSecretKey | A random (cryptographically safe) generated string that is used for encryption and HMAC signing | |
lakefsConfig | lakeFS config YAML stringified, as shown above. See reference for available configurations. | |
replicaCount | Number of lakeFS pods | 1 |
resources | Pod resource requests & limits | {} |
service.type | Kubernetes service type | ClusterIP |
service.port | Kubernetes service external port | 80 |
extraEnvVarsSecret | Name of a Kubernetes secret containing extra environment variables |
Docker
To deploy using Docker, create a yaml configuration file. Here is a minimal example, but you can see the reference for the full list of configurations.
database:
connection_string: "postgres://user:pass@<RDS_ENDPOINT>:5432/postgres"
auth:
encrypt:
secret_key: "<RANDOM_GENERATED_STRING>"
blockstore:
type: s3
gateways:
s3:
domain_name: s3.lakefs.example.com
Depending on your runtime environment, running lakeFS using docker would look like this:
docker run \
--name lakefs \
-p 8000:8000 \
-v <PATH_TO_CONFIG_FILE>:/home/lakefs/.lakefs.yaml \
treeverse/lakefs:latest run
Fargate and other container-based environments
Some environments make it harder to use a configuration file, and are best configured using environment variables.
Here is an example of running lakeFS using environment variables. See the reference for the full list of configurations.
docker run \
--name lakefs \
-p 8000:8000 \
-e LAKEFS_DATABASE_CONNECTION_STRING="postgres://user:pass@<RDS ENDPOINT>..." \
-e LAKEFS_AUTH_ENCRYPT_SECRET_KEY="<RANDOM_GENERATED_STRING>" \
-e LAKEFS_BLOCKSTORE_TYPE="s3" \
-e LAKEFS_GATEWAYS_S3_DOMAIN_NAME="s3.lakefs.example.com" \
treeverse/lakefs:latest run
AWS EC2
Alternatively, you can run lakeFS directly on an EC2 instance:
- Download the binary for your operating system
-
lakefs
is a single binary, you can run it directly, but preferably run it as a service using systemd or your operating system’s facilities.lakefs --config <PATH_TO_CONFIG_FILE> run