Deploy lakeFS on GCP
Expected deployment time: 25min
Prerequisites
Users that require S3 access using virtual host addressing should configure an S3 Gateway domain.
Creating the Database on GCP SQL
lakeFS requires a PostgreSQL database to synchronize actions on your repositories. We will show you how to create a database on Google Cloud SQL, but you can use any PostgreSQL database as long as it’s accessible by your lakeFS installation.
If you already have a database, take note of the connection string and skip to the next step
- Follow the official Google documentation on how to create a PostgreSQL instance. Make sure you’re using PostgreSQL version >= 11.
- On the Users tab in the console, create a user. The lakeFS installation will use it to connect to your database.
- Choose the method by which lakeFS will connect to your database. Google recommends using the SQL Auth Proxy.
Depending on the chosen lakeFS installation method, you will need to make sure lakeFS can access your database. For example, if you install lakeFS on GKE, you need to deploy the SQL Auth Proxy from this Helm chart, or as a sidecar container in your lakeFS pod.
Installation Options
On Google Compute Engine
-
Save the following configuration file as
config.yaml
:--- database: connection_string: "[DATABASE_CONNECTION_STRING]" auth: encrypt: # replace this with a randomly-generated string: secret_key: "[ENCRYPTION_SECRET_KEY]" blockstore: type: gs # Uncomment the following lines to give lakeFS access to your buckets using a service account: # gs: # credentials_json: [YOUR SERVICE ACCOUNT JSON STRING]
- Download the binary to the GCE instance.
- Run the
lakefs
binary on the GCE machine:lakefs --config config.yaml run
Note: it is preferable to run the binary as a service using systemd or your operating system’s facilities.
On Google Cloud Run
To support container-based environments like Google Cloud Run, lakeFS can be configured using environment variables. Here is a docker run
command to demonstrate starting lakeFS using Docker:
docker run \
--name lakefs \
-p 8000:8000 \
-e LAKEFS_DATABASE_CONNECTION_STRING="[DATABASE_CONNECTION_STRING]" \
-e LAKEFS_AUTH_ENCRYPT_SECRET_KEY="[ENCRYPTION_SECRET_KEY]" \
-e LAKEFS_BLOCKSTORE_TYPE="gs" \
treeverse/lakefs:latest run
See the reference for a complete list of environment variables.
On GKE
Load balancing
Depending on how you chose to install lakeFS, you should have a load balancer direct requests to the lakeFS server.
By default, lakeFS operates on port 8000, and exposes a /_health
endpoint which you can use for health checks.
Next Steps
Your next step is to prepare your storage. If you already have a storage bucket/container, you are ready to create your first lakeFS repository.