JWT Login (Machine-to-Machine)¶
Info
Available in lakeFS Cloud and lakeFS Enterprise.
For interactive (browser) logins use Single Sign On (SSO).
For AWS-resident workloads use AWS IAM Roles.
JWT login lets a workload that already holds a JWT from an external identity provider — a service account, a CI runner, an application backend — exchange that JWT for a lakeFS session bearer token, without a browser flow and without provisioning a per-workload lakeFS access key.
How it works¶
┌──────────┐ client_credentials ┌──────────┐
│ Caller │ ────────────────────► │ IdP │
└──────────┘ │ (Entra/ │
│ external JWT │ Auth0) │
│ ◄────────────────────────── └──────────┘
│
│ POST /api/v1/auth/jwt/login
│ { "token": "<external JWT>" }
▼
┌──────────┐
│ lakeFS │
└──────────┘
│ { "token": "<lakeFS bearer>", "token_expiration": ... }
│
▼
Authorization: Bearer <lakeFS bearer>
→ any subsequent lakeFS API call
- The caller obtains a JWT from its IdP. For service-account workloads
this is typically the OAuth 2.0
client_credentialsgrant. - The caller posts the JWT to
POST /api/v1/auth/jwt/login. lakeFS verifies the signature against the IdP's JWKS, checks the standardiss/aud/exp/iat/nbfclaims, and resolves the JWT's group/role claim to lakeFS groups. - lakeFS persists a short-lived Session containing the resolved policy IDs, and returns a bearer token whose subject is the session id.
- The caller uses the bearer on subsequent lakeFS API calls via the
Authorization: Bearer …header.
No lakeFS user row is created on JWT login: the session is the authorization record. Deleting the session at the server immediately invalidates every bearer minted from it.
When to use this¶
| Scenario | Use |
|---|---|
| Interactive (human) login | SSO |
| AWS-resident workload with an IAM role | AWS IAM Roles |
| Workload that already holds an IdP-issued JWT | JWT login |
| Workload with no IdP — wants a static credential | lakeFS access key |
Configuration¶
JWT login is opt-in. The endpoint returns 501 Not Implemented unless
auth.providers.jwt.jwks_url is set.
auth:
providers:
jwt:
# Required.
jwks_url: https://<idp>/.well-known/jwks.json
issuer: https://<idp>/
# At least one configured audience must match the token's `aud`
# claim. Empty list skips the audience check.
audiences:
- https://lakefs/api
# RFC 6901 JSON Pointer to the principal identity claim.
# Defaults to "/oid" (Entra). For Auth0 use "/sub".
identity_claim_ref: /oid
# JSON Pointer to the group / role claim. Each value must match a
# lakeFS group identifier (group name on Enterprise; generated
# group ID on Cloud — see "Mapping IdP groups to lakeFS
# permissions"). Default "/roles".
groups_claim_ref: /roles
# Caps the lakeFS session's lifetime. Effective expiry is
# min(now + session_max_ttl, jwt.exp). Default 1h.
session_max_ttl: 1h
# Clock-skew tolerance for exp / iat / nbf. Default 60s.
leeway: 60s
# How often the background sweep deletes expired sessions.
# Default 5m.
cleanup_interval: 5m
# Pin additional claims to exact string values. See
# "Per-tenant isolation" below.
required_claims:
# https://lakefscloud.io/org_id: acme
Field reference¶
| Key | Required | Default | Description |
|---|---|---|---|
jwks_url |
yes | — | URL of the IdP's JWKS document. |
issuer |
yes | — | Exact value the token's iss claim must equal. |
audiences |
no | [] |
Accepted aud values. Empty disables the check. |
identity_claim_ref |
no | /oid |
JSON Pointer to the principal identifier. |
groups_claim_ref |
no | /roles |
JSON Pointer to the group/role list. |
session_max_ttl |
no | 1h |
Upper bound on the minted session's lifetime. |
leeway |
no | 60s |
Clock-skew tolerance for exp / iat / nbf. |
cleanup_interval |
no | 5m |
Period of the expired-session sweep. |
required_claims |
no | none | Map of claim name → exact string value. |
Endpoint¶
| Status | Meaning |
|---|---|
| 200 | Success. Body: { "token": "<lakeFS bearer>", "token_expiration": <unix-seconds> }. |
| 401 | Verification failed: signature, expiry, audience, issuer, missing identity claim, etc. |
| 501 | JWT login is not configured (no jwks_url). |
Mapping IdP groups to lakeFS permissions¶
For every value in the JWT's groups claim, lakeFS looks up a matching group. The union of policies attached to those groups is recorded on the session and consulted by every authorization check.
The claim value must match the lakeFS group identifier. What that identifier is depends on the RBAC backend:
| Deployment | Group identifier the claim must carry |
|---|---|
| lakeFS Enterprise (built-in RBAC) | the group's name, e.g. data-engineers |
| lakeFS Cloud (external RBAC) | the group's generated ID, e.g. LGIDAfIbmpkx-711slxX-BGKt — not its display name |
On lakeFS Cloud, groups have generated IDs (LGID…) and the friendly
string is only a display name — the same reason the OIDC configuration
references groups by ID (default_initial_groups: [LGID…]). A claim
carrying the display name will not resolve to any policies. Find a
group's ID with lakectl auth groups list or in the UI.
Warning
Unmatched group values are silently skipped. A token whose
groups claim matches no group identifier yields a session with no
permissions, and every authorized call returns
401 insufficient permissions (the token still authenticates — the
session is minted — it just can't do anything). Create the groups,
attach policies, and make sure the IdP emits the correct identifier
before issuing JWT-driven calls.
Microsoft Entra ID¶
Entra setup (one-time)¶
-
App Registration — "lakeFS" (the resource)
- Expose an API → Application ID URI: e.g.
api://lakefs. This value is youraudience. -
App roles → Create app role:
- Allowed member types: Applications
- Value:
lakefs-data-engineers(this is the value emitted in therolesclaim and must match the lakeFS group ID).
- Expose an API → Application ID URI: e.g.
-
App Registration — "lakeFS-client" (the caller)
- Certificates & secrets → New client secret — record the value.
- API permissions → Add a permission → My APIs → lakeFS → tick
the
lakefs-data-engineersapplication permission → Grant admin consent.
-
(Optional, recommended in multi-tenant deployments) Add a claims-mapping policy that injects an
org_id(or similar) claim from the service principal's app metadata, then pin it viarequired_claimsin the lakeFS config. See Per-tenant isolation below.
lakeFS config¶
auth:
providers:
jwt:
jwks_url: https://login.microsoftonline.com/<tenant-id>/discovery/v2.0/keys
issuer: https://login.microsoftonline.com/<tenant-id>/v2.0
audiences: ["api://lakefs"]
identity_claim_ref: /oid
groups_claim_ref: /roles
session_max_ttl: 1h
Pre-create the matching lakeFS group and attach a policy:
lakectl auth groups create --id lakefs-data-engineers
lakectl auth policies create --id ReadAll --statement-document - <<'EOF'
{ "statement": [
{ "effect": "allow", "action": ["fs:*"], "resource": "*" }
]}
EOF
lakectl auth groups policies attach --id lakefs-data-engineers --policy ReadAll
Exchange flow¶
# 1. Acquire an Entra access token via client_credentials.
TOKEN=$(curl -s -X POST \
"https://login.microsoftonline.com/$TENANT_ID/oauth2/v2.0/token" \
-H 'content-type: application/x-www-form-urlencoded' \
-d "client_id=$ENTRA_CLIENT_ID\
&client_secret=$ENTRA_CLIENT_SECRET\
&scope=api%3A%2F%2Flakefs%2F.default\
&grant_type=client_credentials" | jq -r .access_token)
# 2. Exchange for a lakeFS bearer.
BEARER=$(curl -s -X POST "https://lakefs.example.com/api/v1/auth/jwt/login" \
-H 'content-type: application/json' \
-d "{\"token\": \"$TOKEN\"}" | jq -r .token)
# 3. Drive authenticated API calls.
curl -H "Authorization: Bearer $BEARER" \
"https://lakefs.example.com/api/v1/repositories"
Verifying the token shape¶
Decode the payload to confirm the right claims are present:
Expected fields:
{
"iss": "https://login.microsoftonline.com/<tenant-id>/v2.0",
"oid": "<service-principal-object-id>",
"aud": "api://lakefs",
"roles": ["lakefs-data-engineers"],
"iat": 1700000000,
"exp": 1700003600
}
If roles is missing, confirm the App Role was created with Allowed
member types = Applications and that admin consent was granted to the
client application for that role.
Auth0¶
Auth0 setup (one-time)¶
-
APIs → Create API
- Identifier (this becomes the
audience): e.g.https://lakefs/api - Signing Algorithm:
RS256(default) - Enable RBAC: on
- Add Permissions in the Access Token: on
- Permissions → Add a Permission: define one permission per
lakeFS group you intend to use. The permission's name must equal
the lakeFS group identifier — on lakeFS Cloud that's the
generated group ID (e.g.
LGIDAfIbmpkx-711slxX-BGKt), not the display name. Watch for a stray trailing character when pasting the ID.
- Identifier (this becomes the
-
Applications → Create Application → Machine to Machine
- Authorize the new application for the API above.
- Expand the application's row and tick the permission(s) you
defined in step 1. The M2M client will then receive these in the
permissionsclaim of every minted access token. - Record the
client_idandclient_secret.
lakeFS config¶
auth:
providers:
jwt:
jwks_url: https://YOUR_TENANT.us.auth0.com/.well-known/jwks.json
issuer: https://YOUR_TENANT.us.auth0.com/
audiences: ["https://lakefs/api"]
identity_claim_ref: /sub
groups_claim_ref: /permissions
session_max_ttl: 1h
Create the lakeFS group whose ID matches the Auth0 permission name and
attach a policy (same lakectl auth … commands as the Entra example).
Exchange flow¶
# 1. Acquire an Auth0 access token via client_credentials.
TOKEN=$(curl -s -X POST "https://YOUR_TENANT.us.auth0.com/oauth/token" \
-H 'content-type: application/x-www-form-urlencoded' \
-d "grant_type=client_credentials\
&client_id=$AUTH0_CLIENT_ID\
&client_secret=$AUTH0_CLIENT_SECRET\
&audience=https://lakefs/api" | jq -r .access_token)
# 2. Exchange for a lakeFS bearer.
BEARER=$(curl -s -X POST "https://lakefs.example.com/api/v1/auth/jwt/login" \
-H 'content-type: application/json' \
-d "{\"token\": \"$TOKEN\"}" | jq -r .token)
# 3. Drive authenticated API calls.
curl -H "Authorization: Bearer $BEARER" \
"https://lakefs.example.com/api/v1/repositories"
Verifying the token shape¶
Expected fields:
{
"iss": "https://YOUR_TENANT.us.auth0.com/",
"sub": "<m2m-client-id>@clients",
"aud": "https://lakefs/api",
"permissions": ["lakefs-data-engineers"],
"iat": 1700000000,
"exp": 1700003600
}
If permissions is missing or empty, re-check that Enable RBAC
and Add Permissions in the Access Token are both on for the API,
and that the M2M application has the permission ticked.
Security¶
Asymmetric signing only¶
The verifier accepts RS256/384/512, ES256/384/512, and PS256/384/512.
HMAC variants (HS*) and none are rejected at startup — symmetric
signing is incompatible with a JWKS trust model.
Per-tenant isolation¶
In multi-tenant deployments where a single IdP issues tokens for many
lakeFS instances (for example, one Entra tenant serving multiple
lakeFS Cloud organisations), a token minted for tenant A would
otherwise verify cleanly against tenant B's lakeFS — same iss, valid
signature, audience matches if both use the same Application ID URI.
Mitigate by pinning a per-tenant claim with required_claims:
Combined with an IdP-side claims-mapping policy that injects the same claim from the service principal's app metadata, this guarantees a token issued for one tenant cannot be used against another.
If your IdP can't emit a per-tenant claim for M2M tokens — for example
Auth0 tenants without Organizations for Client Credentials, where
the native org_id claim isn't available to the client_credentials
grant — pin a dedicated client instead by matching its azp
(authorized party = the M2M client_id):
Each tenant then uses its own M2M application, and only that client's tokens are accepted. This keeps a real isolation guarantee without depending on IdP Organization features.
Warning
Skipping per-tenant isolation in a shared-IdP deployment is a
cross-tenant escalation risk. Always configure required_claims
(an org/tenant claim, or at minimum azp) when more than one
lakeFS instance trusts the same IdP.
Logged data¶
Per-request logs and audit records identify session-authenticated
callers via principal_type=session, the session's subject (e.g.
jwt:<iss>:<oid>), and a unique session_id. The raw external JWT is
never logged. Verifier errors surface claim names and timestamps but
never raw token bytes.
Revocation¶
Sessions are stored server-side, so revocation is immediate: deleting
a session — explicitly or by letting it expire — makes every bearer
minted from that session return 401 on the next request, even though
the bearer's signature is still valid. The cleanup_interval setting
controls how often expired sessions are purged from KV; expired
sessions are also reaped lazily on read.
Audit logs¶
Every action driven by a JWT-login bearer carries the session principal in the audit record:
| Column | Value |
|---|---|
principal_type |
session |
subject |
jwt:<iss>:<oid> (the session's subject) |
session_id |
the live session entity's id |
user |
mirrors subject (back-compat column) |
Operators can filter audit rows by principal_type=session to see every
M2M call across the install, or by session_id to correlate every
action driven by a single login.