Configuration Reference
Configuring lakeFS is done using a yaml configuration file and/or environment variable. The configuration file location can be set with the ‘–config’ flag. If not specified, the the first file found in the following order will be used:
- ./config.yaml
$HOME/lakefs/config.yaml- /etc/lakefs/config.yaml
$HOME/.lakefs.yaml
Configuration items can each be controlled by an environment variable. The variable name will have a prefix of LAKEFS_, followed by the name of the configuration, replacing every ‘.’ with a ‘_’.
Example: LAKEFS_LOGGING_LEVEL controls logging.format.
This reference uses . to denote the nesting of values.
Reference
logging.format(one of ["json", "text"] : "text")- Format to output log message inlogging.level(one of ["DEBUG", "INFO", "WARN", "ERROR", "NONE"] : "DEBUG")- Logging level to outputlogging.output(string : "-")- Path name to write logs to."-"means Standard Outputdatabase.connection_string(string : "postgres://localhost:5432/postgres?sslmode=disable")- PostgreSQL connection string to usedatabase.max_open_connections(int : 25)- Maximum number of open connections to the databasedatabase.max_idle_connections(int : 25)- Sets the maximum number of connections in the idle connection pooldatabase.connection_max_lifetime(duration : 5m)- Sets the maximum amount of time a connection may be reusedlisten_address(string : "0.0.0.0:8000")- A<host>:<port>structured string representing the address to listen onauth.cache.enabled(bool : true)- Whether to cache access credentials and user policies in-memory. Can greatly improve throughput when enabled.auth.cache.size(int : 1024)- How many items to store in the auth cache. Systems with a very high user count should use a larger value at the expense of ~1kb of memory per cached user.auth.cache.ttl(time duration : "20s")- How long to store an item in the auth cache. Using a higher value reduces load on the database, but will cause changes longer to take effect for cached users.auth.cache.jitter(time duration : "3s")- A random amount of time between 0 and this value is added to each item’s TTL. This is done to avoid a large bulk of keys expiring at once and overwhelming the database.-
auth.encrypt.secret_key(string : required)- A random (cryptographically safe) generated string that is used for encryption and HMAC signingNote: It is best to keep this somewhere safe such as KMS or Hashicorp Vault, and provide it to the system at run time
blockstore.type(one of ["local", "s3", "gs", "azure", "mem"] : required). Block adapter to use. This controls where the underlying data will be storedblockstore.local.path(string: "~/lakefs/data")- When using the local Block Adapter, which directory to store files inblockstore.gs.credentials_file(string : )- If specified will be used as a file path of the JSON file that contains your Google service account keyblockstore.gs.credentials_json(string : )- If specified will be used as JSON string that contains your Google service account key (when credentials_file is not set)blockstore.azure.storage_account(string : )- If specified, will be used as the Azure storage accountblockstore.azure.storage_access_key(string : )- If specified, will be used as the Azure storage access keyblockstore.azure.auth_method(one of ["msi", "access-key"]: "access-key" )- Authentication method to use (msi is used for Azure AD authentication).blockstore.s3.region(string : "us-east-1")- Default region for lakeFS to use when interacting with S3.blockstore.s3.profile(string : )- If specified, will be used as a named credentials profileblockstore.s3.credentials_file(string : )- If specified, will be used as a credentials fileblockstore.s3.credentials.access_key_id(string : )- If specified, will be used as a static set of credentialblockstore.s3.credentials.secret_access_key(string : )- If specified, will be used as a static set of credentialblockstore.s3.credentials.session_token(string : )- If specified, will be used as a static session tokenblockstore.s3.endpoint(string : )- If specified, custom endpoint for the AWS S3 API (https://s3_compatible_service_endpoint:port)blockstore.s3.force_path_style(boolean : false)- When true, use path-style S3 URLs (https:/// instead of https:// . ) blockstore.s3.streaming_chunk_size(int : 1048576)- Object chunk size to buffer before streaming to blockstore (use a lower value for less reliable networks). Minimum is 8192.blockstore.s3.streaming_chunk_timeout(time duration : "60s")- Per object chunk timeout for blockstore streaming operations (use a larger value for less reliable networks).committed.local_cache- an object describing the local (on-disk) cache of metadata from permanent storage:committed.local_cache.size_bytes(int:1073741824) - bytes for local cache to use on disk. The cache may use more storage for short periods of time.committed.local_cache.dir(string,~/lakefs/local_tier) - directory to store local cache.committed.local_cache.range_proportion(float:0.9) - proportion of local cache to use for storing ranges (leaves of committed metadata storage).committed.local_cache.range.open_readers(int:500) - maximal number of unused open SSTable readers to keep for ranges.committed.local_cache.range.num_shards(int:30) - sharding factor for open SSTable readers for ranges. Should be at leastsqrt(committed.local_cache.range.open_readers).committed.local_cache.metarange_proportion(float:0.1) - proportion of local cache to use for storing metaranges (roots of committed metadata storage).committed.local_cache.metarange.open_readers(int:50) - maximal number of unused open SSTable readers to keep for metaranges.committed.local_cache.metarange.num_shards(int:10) - sharding factor for open SSTable readers for metaranges. Should be at leastsqrt(committed.local_cache.metarange.open_readers).
committed.block_storage_prefix(string:_lakefs) - Prefix for metadata file storage in each repository’s storage namespacecommitted.permanent.min_range_size_bytes(int:0) - Smallest allowable range in metadata. Increase to somewhat reduce random access time on committed metadata, at the cost of increased committed metadata storage cost.committed.permanent.max_range_size_bytes(int:20971520) - Largest allowable range in metadata. Should be close to the size at which fetching from remote storage becomes linear.committed.permanent.range_raggedness_entries(int:50_000) - Average number of object pointers to store in each range (subject tomin_range_size_bytesandmax_range_size_bytes).committed.sstable.memory.cache_size_bytes(int:200_000_000) - maximal size of in-memory cache used for each SSTable reader.gateways.s3.domain_name(string : "s3.local.lakefs.io")- a FQDN representing the S3 endpoint used by S3 clients to call this server (*.s3.local.lakefs.ioalways resolves to 127.0.0.1, useful for local development, if using virtual-host addressing.gateways.s3.region(string : "us-east-1")- AWS region we’re pretending to be. Should match the region configuration used in AWS SDK clientsgateways.s3.fallback_url(string)- If specified, requests with a non-existing repository will be forwarded to this url. This can be useful for using lakeFS side-by-side with S3, with the URL pointing at an S3Proxy instance.stats.enabled(boolean : true)- Whether or not to periodically collect anonymous usage statistics
Using Environment Variables
All configuration variables can be set or overridden using environment variables.
To set an environment variable, prepend LAKEFS_ to its name, convert it to upper case, and replace . with _:
For example, logging.format becomes LAKEFS_LOGGING_FORMAT, blockstore.s3.region becomes LAKEFS_BLOCKSTORE_S3_REGION, etc.
Example: Local Development
---
listen_address: "0.0.0.0:8000"
database:
connection_string: "postgres://localhost:5432/postgres?sslmode=disable"
logging:
format: text
level: DEBUG
output: "-"
auth:
encrypt:
secret_key: "10a718b3f285d89c36e9864494cdd1507f3bc85b342df24736ea81f9a1134bcc09e90b6641"
blockstore:
type: local
local:
path: "~/lakefs/dev/data"
gateways:
s3:
region: us-east-1
Example: AWS Deployment
---
logging:
format: json
level: WARN
output: "-"
database:
connection_string: "postgres://user:pass@lakefs.rds.amazonaws.com:5432/postgres"
auth:
encrypt:
secret_key: "10a718b3f285d89c36e9864494cdd1507f3bc85b342df24736ea81f9a1134bcc"
blockstore:
type: s3
s3:
region: us-east-1
credentials_file: /secrets/aws/credentials
profile: default
Example: Google Storage
---
logging:
format: json
level: WARN
output: "-"
database:
connection_string: "postgres://user:pass@lakefs.rds.amazonaws.com:5432/postgres"
auth:
encrypt:
secret_key: "10a718b3f285d89c36e9864494cdd1507f3bc85b342df24736ea81f9a1134bcc"
blockstore:
type: gs
gs:
credentials_file: /secrets/lakefs-service-account.json
Example: MinIO
---
logging:
format: json
level: WARN
output: "-"
database:
connection_string: "postgres://user:pass@lakefs.rds.amazonaws.com:5432/postgres"
auth:
encrypt:
secret_key: "10a718b3f285d89c36e9864494cdd1507f3bc85b342df24736ea81f9a1134bcc"
blockstore:
type: s3
s3:
region: us-east-1
force_path_style: true
endpoint: http://localhost:9000
credentials:
access_key_id: minioadmin
secret_access_key: minioadmin
Example: Azure blob storage
---
logging:
format: json
level: WARN
output: "-"
database:
connection_string: "postgres://user:pass@lakefs.rds.amazonaws.com:5432/postgres"
auth:
encrypt:
secret_key: "10a718b3f285d89c36e9864494cdd1507f3bc85b342df24736ea81f9a1134bcc"
blockstore:
type: azure
azure:
auth_method: access-key
storage_account: exampleStorageAcount
storage_access_key: ExampleAcessKeyMD7nkPOWgV7d4BUjzLw==