JWT Login (Machine-to-Machine)¶

Info

This feature is available with lakeFS Enterprise. Start a free trial.

For interactive (browser) logins use Single Sign On (SSO). For AWS-resident workloads use AWS IAM Roles.

JWT login lets a workload that already holds a JWT from an external identity provider — a service account, a CI runner, an application backend — exchange that JWT for a lakeFS session bearer token, without a browser flow and without provisioning a per-workload lakeFS access key.

How it works¶

   ┌──────────┐  client_credentials   ┌──────────┐
   │  Caller  │ ────────────────────► │   IdP    │
   └──────────┘                       │  (Entra/ │
        │   external JWT              │   Auth0) │
        │ ◄────────────────────────── └──────────┘
        │
        │   POST /api/v1/auth/jwt/login
        │   { "token": "<external JWT>" }
        ▼
   ┌──────────┐
   │  lakeFS  │
   └──────────┘
        │   { "token": "<lakeFS bearer>", "token_expiration": ... }
        │
        ▼
   Authorization: Bearer <lakeFS bearer>
   → any subsequent lakeFS API call

The caller obtains a JWT from its IdP. For service-account workloads this is typically the OAuth 2.0 client_credentials grant.
The caller posts the JWT to POST /api/v1/auth/jwt/login. lakeFS verifies the signature against the IdP's JWKS, checks the standard iss / aud / exp / iat / nbf claims, and resolves the JWT's group/role claim to lakeFS groups.
lakeFS persists a short-lived Session containing the resolved policy IDs, and returns a bearer token whose subject is the session id.
The caller uses the bearer on subsequent lakeFS API calls via the Authorization: Bearer … header.

No lakeFS user row is created on JWT login: the session is the authorization record. Deleting the session at the server immediately invalidates every bearer minted from it.

When to use this¶

Scenario	Use
Interactive (human) login	SSO
AWS-resident workload with an IAM role	AWS IAM Roles
Workload that already holds an IdP-issued JWT	JWT login
Workload with no IdP — wants a static credential	lakeFS access key

Configuration¶

JWT login is opt-in. The endpoint returns 501 Not Implemented unless auth.providers.jwt.jwks_url is set.

auth:
  providers:
    jwt:
      # Required.
      jwks_url: https://<idp>/.well-known/jwks.json
      issuer:   https://<idp>/

      # At least one configured audience must match the token's `aud`
      # claim. Empty list skips the audience check.
      audiences:
        - https://lakefs/api

      # RFC 6901 JSON Pointer to the principal identity claim.
      # Defaults to "/oid" (Entra). For Auth0 use "/sub".
      identity_claim_ref: /oid

      # JSON Pointer to the group / role claim. Each value must match a
      # lakeFS group identifier (group name on Enterprise; generated
      # group ID on Cloud — see "Mapping IdP groups to lakeFS
      # permissions"). Default "/roles".
      groups_claim_ref: /roles

      # Caps the lakeFS session's lifetime. Effective expiry is
      # min(now + session_max_ttl, jwt.exp). Default 1h.
      session_max_ttl: 1h

      # Clock-skew tolerance for exp / iat / nbf. Default 60s.
      leeway: 60s

      # How often the background sweep deletes expired sessions.
      # Default 5m.
      cleanup_interval: 5m

      # Pin additional claims to exact string values. See
      # "Per-tenant isolation" below.
      required_claims:
        # https://lakefscloud.io/org_id: acme

Field reference¶

Key	Required	Default	Description
`jwks_url`	yes	—	URL of the IdP's JWKS document.
`issuer`	yes	—	Exact value the token's `iss` claim must equal.
`audiences`	no	`[]`	Accepted `aud` values. Empty disables the check.
`identity_claim_ref`	no	`/oid`	JSON Pointer to the principal identifier.
`groups_claim_ref`	no	`/roles`	JSON Pointer to the group/role list.
`session_max_ttl`	no	`1h`	Upper bound on the minted session's lifetime.
`leeway`	no	`60s`	Clock-skew tolerance for `exp` / `iat` / `nbf`.
`cleanup_interval`	no	`5m`	Period of the expired-session sweep.
`required_claims`	no	none	Map of claim name → exact string value.

Endpoint¶

POST /api/v1/auth/jwt/login
Content-Type: application/json

{ "token": "<external JWT>" }

Status	Meaning
200	Success. Body: `{ "token": "<lakeFS bearer>", "token_expiration": <unix-seconds> }`.
401	Verification failed: signature, expiry, audience, issuer, missing identity claim, etc.
501	JWT login is not configured (no `jwks_url`).

Mapping IdP groups to lakeFS permissions¶

For every value in the JWT's groups claim, lakeFS looks up a matching group. The union of policies attached to those groups is recorded on the session and consulted by every authorization check.

The claim value must match the lakeFS group identifier. What that identifier is depends on the RBAC backend:

Deployment	Group identifier the claim must carry
lakeFS Enterprise (built-in RBAC)	the group's name, e.g. `data-engineers`
lakeFS Cloud (external RBAC)	the group's generated ID, e.g. `LGIDAfIbmpkx-711slxX-BGKt` — not its display name

On lakeFS Cloud, groups have generated IDs (LGID…) and the friendly string is only a display name — the same reason the OIDC configuration references groups by ID (default_initial_groups: [LGID…]). A claim carrying the display name will not resolve to any policies. Find a group's ID with lakectl auth groups list or in the UI.

Warning

Unmatched group values are silently skipped. A token whose groups claim matches no group identifier yields a session with no permissions, and every authorized call returns 401 insufficient permissions (the token still authenticates — the session is minted — it just can't do anything). Create the groups, attach policies, and make sure the IdP emits the correct identifier before issuing JWT-driven calls.

Microsoft Entra ID¶

Entra setup (one-time)¶

App Registration — "lakeFS" (the resource)
1. Expose an API → Application ID URI: e.g. api://lakefs. This value is your audience.
2. App roles → Create app role:
  - Allowed member types: Applications
  - Value: lakefs-data-engineers (this is the value emitted in the roles claim and must match the lakeFS group ID).
App Registration — "lakeFS-client" (the caller)
1. Certificates & secrets → New client secret — record the value.
2. API permissions → Add a permission → My APIs → lakeFS → tick the lakefs-data-engineers application permission → Grant admin consent.
(Optional, recommended in multi-tenant deployments) Add a claims-mapping policy that injects an org_id (or similar) claim from the service principal's app metadata, then pin it via required_claims in the lakeFS config. See Per-tenant isolation below.

lakeFS config¶

auth:
  providers:
    jwt:
      jwks_url: https://login.microsoftonline.com/<tenant-id>/discovery/v2.0/keys
      issuer:   https://login.microsoftonline.com/<tenant-id>/v2.0
      audiences: ["api://lakefs"]
      identity_claim_ref: /oid
      groups_claim_ref:   /roles
      session_max_ttl:    1h

Pre-create the matching lakeFS group and attach a policy:

lakectl auth groups create --id lakefs-data-engineers
lakectl auth policies create --id ReadAll --statement-document - <<'EOF'
{ "statement": [
    { "effect": "allow", "action": ["fs:*"], "resource": "*" }
]}
EOF
lakectl auth groups policies attach --id lakefs-data-engineers --policy ReadAll

Exchange flow¶

# 1. Acquire an Entra access token via client_credentials.
TOKEN=$(curl -s -X POST \
  "https://login.microsoftonline.com/$TENANT_ID/oauth2/v2.0/token" \
  -H 'content-type: application/x-www-form-urlencoded' \
  -d "client_id=$ENTRA_CLIENT_ID\
&client_secret=$ENTRA_CLIENT_SECRET\
&scope=api%3A%2F%2Flakefs%2F.default\
&grant_type=client_credentials" | jq -r .access_token)

# 2. Exchange for a lakeFS bearer.
BEARER=$(curl -s -X POST "https://lakefs.example.com/api/v1/auth/jwt/login" \
  -H 'content-type: application/json' \
  -d "{\"token\": \"$TOKEN\"}" | jq -r .token)

# 3. Drive authenticated API calls.
curl -H "Authorization: Bearer $BEARER" \
  "https://lakefs.example.com/api/v1/repositories"

Verifying the token shape¶

Decode the payload to confirm the right claims are present:

echo "$TOKEN" | cut -d. -f2 | base64 -d 2>/dev/null | jq .

Expected fields:

{
  "iss": "https://login.microsoftonline.com/<tenant-id>/v2.0",
  "oid": "<service-principal-object-id>",
  "aud": "api://lakefs",
  "roles": ["lakefs-data-engineers"],
  "iat": 1700000000,
  "exp": 1700003600
}

If roles is missing, confirm the App Role was created with Allowed member types = Applications and that admin consent was granted to the client application for that role.

Auth0¶

Auth0 setup (one-time)¶

APIs → Create API
- Identifier (this becomes the audience): e.g. https://lakefs/api
- Signing Algorithm: RS256 (default)
- Enable RBAC: on
- Add Permissions in the Access Token: on
- Permissions → Add a Permission: define one permission per lakeFS group you intend to use. The permission's name must equal the lakeFS group identifier — on lakeFS Cloud that's the generated group ID (e.g. LGIDAfIbmpkx-711slxX-BGKt), not the display name. Watch for a stray trailing character when pasting the ID.
Applications → Create Application → Machine to Machine
- Authorize the new application for the API above.
- Expand the application's row and tick the permission(s) you defined in step 1. The M2M client will then receive these in the permissions claim of every minted access token.
- Record the client_id and client_secret.

lakeFS config¶

auth:
  providers:
    jwt:
      jwks_url: https://YOUR_TENANT.us.auth0.com/.well-known/jwks.json
      issuer:   https://YOUR_TENANT.us.auth0.com/
      audiences: ["https://lakefs/api"]
      identity_claim_ref: /sub
      groups_claim_ref:   /permissions
      session_max_ttl:    1h

Create the lakeFS group whose ID matches the Auth0 permission name and attach a policy (same lakectl auth … commands as the Entra example).

Exchange flow¶

# 1. Acquire an Auth0 access token via client_credentials.
TOKEN=$(curl -s -X POST "https://YOUR_TENANT.us.auth0.com/oauth/token" \
  -H 'content-type: application/x-www-form-urlencoded' \
  -d "grant_type=client_credentials\
&client_id=$AUTH0_CLIENT_ID\
&client_secret=$AUTH0_CLIENT_SECRET\
&audience=https://lakefs/api" | jq -r .access_token)

# 2. Exchange for a lakeFS bearer.
BEARER=$(curl -s -X POST "https://lakefs.example.com/api/v1/auth/jwt/login" \
  -H 'content-type: application/json' \
  -d "{\"token\": \"$TOKEN\"}" | jq -r .token)

# 3. Drive authenticated API calls.
curl -H "Authorization: Bearer $BEARER" \
  "https://lakefs.example.com/api/v1/repositories"

Verifying the token shape¶

echo "$TOKEN" | cut -d. -f2 | base64 -d 2>/dev/null | jq .

Expected fields:

{
  "iss": "https://YOUR_TENANT.us.auth0.com/",
  "sub": "<m2m-client-id>@clients",
  "aud": "https://lakefs/api",
  "permissions": ["lakefs-data-engineers"],
  "iat": 1700000000,
  "exp": 1700003600
}

If permissions is missing or empty, re-check that Enable RBAC and Add Permissions in the Access Token are both on for the API, and that the M2M application has the permission ticked.

Security¶

Asymmetric signing only¶

The verifier accepts RS256/384/512, ES256/384/512, and PS256/384/512. HMAC variants (HS*) and none are rejected at startup — symmetric signing is incompatible with a JWKS trust model.

Per-tenant isolation¶

In multi-tenant deployments where a single IdP issues tokens for many lakeFS instances (for example, one Entra tenant serving multiple lakeFS Cloud organisations), a token minted for tenant A would otherwise verify cleanly against tenant B's lakeFS — same iss, valid signature, audience matches if both use the same Application ID URI.

Mitigate by pinning a per-tenant claim with required_claims:

auth:
  providers:
    jwt:
      ...
      required_claims:
        https://lakefscloud.io/org_id: <this-tenant>

Combined with an IdP-side claims-mapping policy that injects the same claim from the service principal's app metadata, this guarantees a token issued for one tenant cannot be used against another.

If your IdP can't emit a per-tenant claim for M2M tokens — for example Auth0 tenants without Organizations for Client Credentials, where the native org_id claim isn't available to the client_credentials grant — pin a dedicated client instead by matching its azp (authorized party = the M2M client_id):

auth:
  providers:
    jwt:
      ...
      required_claims:
        azp: <m2m-client-id>

Each tenant then uses its own M2M application, and only that client's tokens are accepted. This keeps a real isolation guarantee without depending on IdP Organization features.

Warning

Skipping per-tenant isolation in a shared-IdP deployment is a cross-tenant escalation risk. Always configure required_claims (an org/tenant claim, or at minimum azp) when more than one lakeFS instance trusts the same IdP.

Logged data¶

Per-request logs and audit records identify session-authenticated callers via principal_type=session, the session's subject (e.g. jwt:<iss>:<oid>), and a unique session_id. The raw external JWT is never logged. Verifier errors surface claim names and timestamps but never raw token bytes.

Revocation¶

Sessions are stored server-side, so revocation is immediate: deleting a session — explicitly or by letting it expire — makes every bearer minted from that session return 401 on the next request, even though the bearer's signature is still valid. The cleanup_interval setting controls how often expired sessions are purged from KV; expired sessions are also reaped lazily on read.

Audit logs¶

Every action driven by a JWT-login bearer carries the session principal in the audit record:

Column	Value
`principal_type`	`session`
`subject`	`jwt:<iss>:<oid>` (the session's subject)
`session_id`	the live session entity's id
`user`	mirrors `subject` (back-compat column)

Operators can filter audit rows by principal_type=session to see every M2M call across the install, or by session_id to correlate every action driven by a single login.