Deploy lakeFS on AWS
Expected deployment time: 25min
Prerequisites
Users that require S3 access using virtual host addressing should configure an S3 Gateway domain.
Creating the Database on AWS RDS
lakeFS requires a PostgreSQL database to synchronize actions on your repositories. We will show you how to create a database on AWS RDS, but you can use any PostgreSQL database as long as it’s accessible by your lakeFS installation.
If you already have a database, take note of the connection string and skip to the next step
- Follow the official AWS documentation on how to create a PostgreSQL instance and connect to it. You may use the default PostgreSQL engine, or Aurora PostgreSQL. Make sure you’re using PostgreSQL version >= 11.
-
Once your RDS is set up and the server is in
Available
state, take note of the endpoint and port. - Make sure your security group rules allow you to connect to the database instance.
Installation Options
On EC2
-
Save the following configuration file as
config.yaml
:--- database: connection_string: "[DATABASE_CONNECTION_STRING]" auth: encrypt: # replace this with a randomly-generated string: secret_key: "[ENCRYPTION_SECRET_KEY]" blockstore: type: s3 s3: region: us-east-1
- Download the binary to the EC2 instance.
- Run the
lakefs
binary on the EC2 instance:lakefs --config config.yaml run
Note: it is preferable to run the binary as a service using systemd or your operating system’s facilities.
On ECS
To support container-based environments like AWS ECS, lakeFS can be configured using environment variables. Here is a docker run
command to demonstrate starting lakeFS using Docker:
docker run \
--name lakefs \
-p 8000:8000 \
-e LAKEFS_DATABASE_CONNECTION_STRING="[DATABASE_CONNECTION_STRING]" \
-e LAKEFS_AUTH_ENCRYPT_SECRET_KEY="[ENCRYPTION_SECRET_KEY]" \
-e LAKEFS_BLOCKSTORE_TYPE="s3" \
treeverse/lakefs:latest run
See the reference for a complete list of environment variables.
On EKS
Load balancing
Depending on how you chose to install lakeFS, you should have a load balancer direct requests to the lakeFS server.
By default, lakeFS operates on port 8000, and exposes a /_health
endpoint which you can use for health checks.
Notes for using an AWS Application Load Balancer
- Your security groups should allow the load balancer to access the lakeFS server.
- Create a target group with a listener for port 8000.
- Setup TLS termination using the domain names you wish to use (e.g.
lakefs.example.com
and potentiallys3.lakefs.example.com
,*.s3.lakefs.example.com
if using virtual-host addressing). - Configure the health-check to use the exposed
/_health
URL
Next Steps
Your next step is to prepare your storage. If you already have a storage bucket/container, you are ready to create your first lakeFS repository.