Deploy lakeFS on Azure
Expected deployment time: 25 min
Prerequisites
Users that require S3 access using virtual host addressing should configure an S3 Gateway domain.
Creating the Database on Azure Database
lakeFS requires a PostgreSQL database to synchronize actions in your repositories. We will show you how to create a database on Azure Database, but you can use any PostgreSQL database as long as it’s accessible by your lakeFS installation.
If you already have a database, take note of the connection string and skip to the next step
- Follow the official Azure documentation on how to create a PostgreSQL instance and connect to it. Make sure that you’re using PostgreSQL version >= 11.
- Once your Azure Database for PostgreSQL server is set up and the server is in the
Available
state, take note of the endpoint and username. - Make sure your Access control roles allow you to connect to the database instance.
Installation Options
On Azure VM
-
Save the following configuration file as
config.yaml
:--- database: type: "postgres" postgres: connection_string: "[DATABASE_CONNECTION_STRING]" auth: encrypt: # replace this with a randomly-generated string: secret_key: "[ENCRYPTION_SECRET_KEY]" blockstore: type: azure azure: auth_method: msi # msi for active directory, access-key for access key # In case you chose to authenticate via access key, unmark the following rows and insert the values from the previous step # storage_account: [your storage account] # storage_access_key: [your access key]
- Download the binary to the Azure Virtual Machine.
- Run the
lakefs
binary on the machine:lakefs --config config.yaml run
Note: It is preferable to run the binary as a service using systemd or your operating system’s facilities.
- To support Azure AD authentication go to
Identity
tab and switchStatus
toggle to on, then add the `Storage Blob Data Contributor’ role on the container you created.
On Azure Container instances
To support container-based environments like Azure Container Instances, you can configure lakeFS using environment variables. Here is a docker run
command to demonstrate starting lakeFS using Docker:
docker run \
--name lakefs \
-p 8000:8000 \
-e LAKEFS_DATABASE_TYPE="postgres" \
-e LAKEFS_DATABASE_POSTGRES_CONNECTION_STRING="[DATABASE_CONNECTION_STRING]" \
-e LAKEFS_AUTH_ENCRYPT_SECRET_KEY="[ENCRYPTION_SECRET_KEY]" \
-e LAKEFS_BLOCKSTORE_TYPE="azure" \
-e LAKEFS_BLOCKSTORE_AZURE_STORAGE_ACCOUNT="[YOUR_STORAGE_ACCOUNT]" \
-e LAKEFS_BLOCKSTORE_AZURE_STORAGE_ACCESS_KEY="[YOUR_ACCESS_KEY]" \
treeverse/lakefs:latest run
See the reference for a complete list of environment variables.
On AKS
Load balancing
Depending on how you chose to install lakeFS, you should have a load balancer direct requests to the lakeFS server.
By default, lakeFS operates on port 8000, and exposes a /_health
endpoint which you can use for health checks.
Next Steps
Your next step is to prepare your storage. If you already have a storage bucket/container, you are ready to create your first lakeFS repository.