Configuration Reference
Configuring lakeFS is done using a yaml configuration file and/or environment variable. The configuration file location can be set with the ‘–config’ flag. If not specified, the the first file found in the following order will be used:
- ./config.yaml
$HOME
/lakefs/config.yaml- /etc/lakefs/config.yaml
$HOME
/.lakefs.yaml
Configuration items can each be controlled by an environment variable. The variable name will have a prefix of LAKEFS_, followed by the name of the configuration, replacing every ‘.’ with a ‘_’.
Example: LAKEFS_LOGGING_LEVEL
controls logging.level
.
This reference uses .
to denote the nesting of values.
Reference
logging.format
(one of ["json", "text"] : "text")
- Format to output log message inlogging.level
(one of ["DEBUG", "INFO", "WARN", "ERROR", "NONE"] : "DEBUG")
- Logging level to outputlogging.output
(string : "-")
- Path name to write logs to."-"
means Standard Outputdatabase.connection_string
(string : "postgres://localhost:5432/postgres?sslmode=disable")
- PostgreSQL connection string to usedatabase.max_open_connections
(int : 25)
- Maximum number of open connections to the databasedatabase.max_idle_connections
(int : 25)
- Sets the maximum number of connections in the idle connection pooldatabase.connection_max_lifetime
(duration : 5m)
- Sets the maximum amount of time a connection may be reusedlisten_address
(string : "0.0.0.0:8000")
- A<host>:<port>
structured string representing the address to listen onauth.cache.enabled
(bool : true)
- Whether to cache access credentials and user policies in-memory. Can greatly improve throughput when enabled.auth.cache.size
(int : 1024)
- How many items to store in the auth cache. Systems with a very high user count should use a larger value at the expense of ~1kb of memory per cached user.auth.cache.ttl
(time duration : "20s")
- How long to store an item in the auth cache. Using a higher value reduces load on the database, but will cause changes longer to take effect for cached users.auth.cache.jitter
(time duration : "3s")
- A random amount of time between 0 and this value is added to each item’s TTL. This is done to avoid a large bulk of keys expiring at once and overwhelming the database.-
auth.encrypt.secret_key
(string : required)
- A random (cryptographically safe) generated string that is used for encryption and HMAC signingNote: It is best to keep this somewhere safe such as KMS or Hashicorp Vault, and provide it to the system at run time
blockstore.type
(one of ["local", "s3", "gs", "azure", "mem"] : required)
. Block adapter to use. This controls where the underlying data will be storedblockstore.local.path
(string: "~/lakefs/data")
- When using the local Block Adapter, which directory to store files inblockstore.gs.credentials_file
(string : )
- If specified will be used as a file path of the JSON file that contains your Google service account keyblockstore.gs.credentials_json
(string : )
- If specified will be used as JSON string that contains your Google service account key (when credentials_file is not set)blockstore.azure.storage_account
(string : )
- If specified, will be used as the Azure storage accountblockstore.azure.storage_access_key
(string : )
- If specified, will be used as the Azure storage access keyblockstore.azure.auth_method
(one of ["msi", "access-key"]: "access-key" )
- Authentication method to use (msi is used for Azure AD authentication).blockstore.s3.region
(string : "us-east-1")
- Default region for lakeFS to use when interacting with S3.blockstore.s3.profile
(string : )
- If specified, will be used as a named credentials profileblockstore.s3.credentials_file
(string : )
- If specified, will be used as a credentials fileblockstore.s3.credentials.access_key_id
(string : )
- If specified, will be used as a static set of credentialblockstore.s3.credentials.secret_access_key
(string : )
- If specified, will be used as a static set of credentialblockstore.s3.credentials.session_token
(string : )
- If specified, will be used as a static session tokenblockstore.s3.endpoint
(string : )
- If specified, custom endpoint for the AWS S3 API (https://s3_compatible_service_endpoint:port)blockstore.s3.force_path_style
(boolean : false)
- When true, use path-style S3 URLs (https:/// instead of https:// . ) blockstore.s3.streaming_chunk_size
(int : 1048576)
- Object chunk size to buffer before streaming to blockstore (use a lower value for less reliable networks). Minimum is 8192.blockstore.s3.streaming_chunk_timeout
(time duration : "60s")
- Per object chunk timeout for blockstore streaming operations (use a larger value for less reliable networks).committed.local_cache
- an object describing the local (on-disk) cache of metadata from permanent storage:committed.local_cache.size_bytes
(int
:1073741824
) - bytes for local cache to use on disk. The cache may use more storage for short periods of time.committed.local_cache.dir
(string
,~/lakefs/local_tier
) - directory to store local cache.committed.local_cache.range_proportion
(float
:0.9
) - proportion of local cache to use for storing ranges (leaves of committed metadata storage).committed.local_cache.range.open_readers
(int
:500
) - maximal number of unused open SSTable readers to keep for ranges.committed.local_cache.range.num_shards
(int
:30
) - sharding factor for open SSTable readers for ranges. Should be at leastsqrt(committed.local_cache.range.open_readers)
.committed.local_cache.metarange_proportion
(float
:0.1
) - proportion of local cache to use for storing metaranges (roots of committed metadata storage).committed.local_cache.metarange.open_readers
(int
:50
) - maximal number of unused open SSTable readers to keep for metaranges.committed.local_cache.metarange.num_shards
(int
:10
) - sharding factor for open SSTable readers for metaranges. Should be at leastsqrt(committed.local_cache.metarange.open_readers)
.
committed.block_storage_prefix
(string
:_lakefs
) - Prefix for metadata file storage in each repository’s storage namespacecommitted.permanent.min_range_size_bytes
(int
:0
) - Smallest allowable range in metadata. Increase to somewhat reduce random access time on committed metadata, at the cost of increased committed metadata storage cost.committed.permanent.max_range_size_bytes
(int
:20971520
) - Largest allowable range in metadata. Should be close to the size at which fetching from remote storage becomes linear.committed.permanent.range_raggedness_entries
(int
:50_000
) - Average number of object pointers to store in each range (subject tomin_range_size_bytes
andmax_range_size_bytes
).committed.sstable.memory.cache_size_bytes
(int
:200_000_000
) - maximal size of in-memory cache used for each SSTable reader.gateways.s3.domain_name
(string : "s3.local.lakefs.io")
- a FQDN representing the S3 endpoint used by S3 clients to call this server (*.s3.local.lakefs.io
always resolves to 127.0.0.1, useful for local development, if using virtual-host addressing.gateways.s3.region
(string : "us-east-1")
- AWS region we’re pretending to be. Should match the region configuration used in AWS SDK clientsgateways.s3.fallback_url
(string)
- If specified, requests with a non-existing repository will be forwarded to this url. This can be useful for using lakeFS side-by-side with S3, with the URL pointing at an S3Proxy instance.stats.enabled
(boolean : true)
- Whether or not to periodically collect anonymous usage statistics
Using Environment Variables
All configuration variables can be set or overridden using environment variables.
To set an environment variable, prepend LAKEFS_
to its name, convert it to upper case, and replace .
with _
:
For example, logging.format
becomes LAKEFS_LOGGING_FORMAT
, blockstore.s3.region
becomes LAKEFS_BLOCKSTORE_S3_REGION
, etc.
Example: Local Development
---
listen_address: "0.0.0.0:8000"
database:
connection_string: "postgres://localhost:5432/postgres?sslmode=disable"
logging:
format: text
level: DEBUG
output: "-"
auth:
encrypt:
secret_key: "10a718b3f285d89c36e9864494cdd1507f3bc85b342df24736ea81f9a1134bcc09e90b6641"
blockstore:
type: local
local:
path: "~/lakefs/dev/data"
gateways:
s3:
region: us-east-1
Example: AWS Deployment
---
logging:
format: json
level: WARN
output: "-"
database:
connection_string: "postgres://user:pass@lakefs.rds.amazonaws.com:5432/postgres"
auth:
encrypt:
secret_key: "10a718b3f285d89c36e9864494cdd1507f3bc85b342df24736ea81f9a1134bcc"
blockstore:
type: s3
s3:
region: us-east-1
credentials_file: /secrets/aws/credentials
profile: default
Example: Google Storage
---
logging:
format: json
level: WARN
output: "-"
database:
connection_string: "postgres://user:pass@lakefs.rds.amazonaws.com:5432/postgres"
auth:
encrypt:
secret_key: "10a718b3f285d89c36e9864494cdd1507f3bc85b342df24736ea81f9a1134bcc"
blockstore:
type: gs
gs:
credentials_file: /secrets/lakefs-service-account.json
Example: MinIO
---
logging:
format: json
level: WARN
output: "-"
database:
connection_string: "postgres://user:pass@lakefs.rds.amazonaws.com:5432/postgres"
auth:
encrypt:
secret_key: "10a718b3f285d89c36e9864494cdd1507f3bc85b342df24736ea81f9a1134bcc"
blockstore:
type: s3
s3:
region: us-east-1
force_path_style: true
endpoint: http://localhost:9000
credentials:
access_key_id: minioadmin
secret_access_key: minioadmin
Example: Azure blob storage
---
logging:
format: json
level: WARN
output: "-"
database:
connection_string: "postgres://user:pass@lakefs.rds.amazonaws.com:5432/postgres"
auth:
encrypt:
secret_key: "10a718b3f285d89c36e9864494cdd1507f3bc85b342df24736ea81f9a1134bcc"
blockstore:
type: azure
azure:
auth_method: access-key
storage_account: exampleStorageAcount
storage_access_key: ExampleAcessKeyMD7nkPOWgV7d4BUjzLw==