Deploy lakeFS on Azure
⏰ Expected deployment time: 25 min
Create a database
lakeFS requires a PostgreSQL database to synchronize actions in your repositories. We will show you how to create a database on Azure Database, but you can use any PostgreSQL database as long as it’s accessible by your lakeFS installation.
If you already have a database, take note of the connection string and skip to the next step
- Follow the official Azure documentation on how to create a PostgreSQL instance and connect to it. Make sure that you’re using PostgreSQL version >= 11.
- Once your Azure Database for PostgreSQL server is set up and the server is in the
Available
state, take note of the endpoint and username. - Make sure your Access control roles allow you to connect to the database instance.
Run the lakeFS server
Connect to your VM instance using SSH:
-
Create a
config.yaml
on your VM, with the following parameters:--- database: type: "postgres" postgres: connection_string: "[DATABASE_CONNECTION_STRING]" auth: encrypt: # replace this with a randomly-generated string. Make sure to keep it safe! secret_key: "[ENCRYPTION_SECRET_KEY]" blockstore: type: azure azure: auth_method: msi # msi for active directory, access-key for access key # In case you chose to authenticate via access key, unmark the following rows and insert the values from the previous step # storage_account: [your storage account] # storage_access_key: [your access key]
- Download the binary to the VM.
-
Run the
lakefs
binary:lakefs --config config.yaml run
Note: It’s preferable to run the binary as a service using systemd or your operating system’s facilities.
To support container-based environments, you can configure lakeFS using environment variables. Here is a docker run
command to demonstrate starting lakeFS using Docker:
docker run \
--name lakefs \
-p 8000:8000 \
-e LAKEFS_DATABASE_TYPE="postgres" \
-e LAKEFS_DATABASE_POSTGRES_CONNECTION_STRING="[DATABASE_CONNECTION_STRING]" \
-e LAKEFS_AUTH_ENCRYPT_SECRET_KEY="[ENCRYPTION_SECRET_KEY]" \
-e LAKEFS_BLOCKSTORE_TYPE="azure" \
-e LAKEFS_BLOCKSTORE_AZURE_STORAGE_ACCOUNT="[YOUR_STORAGE_ACCOUNT]" \
-e LAKEFS_BLOCKSTORE_AZURE_STORAGE_ACCESS_KEY="[YOUR_ACCESS_KEY]" \
treeverse/lakefs:latest run
See the reference for a complete list of environment variables.
You can install lakeFS on Kubernetes using a Helm chart.
To install lakeFS with Helm:
-
Copy the Helm values file relevant for Azure Blob:
secrets: # replace this with the connection string of the database you created in a previous step: databaseConnectionString: [DATABASE_CONNECTION_STRING] # replace this with a randomly-generated string authEncryptSecretKey: [ENCRYPTION_SECRET_KEY] lakefsConfig: | blockstore: type: azure azure: auth_method: msi # msi for active directory, access-key for access key # If you chose to authenticate via access key, unmark the following rows and insert the values from the previous step # storage_account: [your storage account] # storage_access_key: [your access key]
-
Fill in the missing values and save the file as
conf-values.yaml
. For more configuration options, see our Helm chart README.The
lakefsConfig
parameter is the lakeFS configuration documented here but without sensitive information. Sensitive information likedatabaseConnectionString
is given through separate parameters, and the chart will inject it into Kubernetes secrets. -
In the directory where you created
conf-values.yaml
, run the following commands:# Add the lakeFS repository helm repo add lakefs https://charts.lakefs.io # Deploy lakeFS helm install my-lakefs lakefs/lakefs -f conf-values.yaml
my-lakefs is the Helm Release name.
Load balancing
To configure a load balancer to direct requests to the lakeFS servers you can use the LoadBalancer
Service type or a Kubernetes Ingress.
By default, lakeFS operates on port 8000 and exposes a /_health
endpoint that you can use for health checks.
💡 The NGINX Ingress Controller by default limits the client body size to 1 MiB. Some clients use bigger chunks to upload objects - for example, multipart upload to lakeFS using the S3-compatible Gateway or a simple PUT request using the OpenAPI Server. Checkout Nginx documentation for increasing the limit, or an example of Nginx configuration with MinIO.
Create the admin user
When you first open the lakeFS UI, you will be asked to create an initial admin user.
- open
http://<lakefs-host>/
in your browser. If you haven’t set up a load balancer, this will likely behttp://<instance ip address>:8000/
-
On first use, you’ll be redirected to the setup page:
-
Follow the steps to create an initial administrator user. Save the credentials you’ve received somewhere safe, you won’t be able to see them again!
- Follow the link and go to the login screen. Use the credentials from the previous step to log in.
Create your first repository
- Use the credentials from the previous step to log in
-
Click Create Repository and choose Blank Repository.
- Under Storage Namespace, enter a path to your desired location on the object store. This is where data written to this repository will be stored.
- Click Create Repository
-
You should now have a configured repository, ready to use!
Congratulations! Your environment is now ready 🤩