Skip to content

Merges in lakeFS

The merge operation in lakeFS is similar to Git. It incorporates changes from a merge source (a commit/reference) into a merge destination (a branch).

How does it work?

lakeFS first finds the merge base: the nearest common ancestor of the two commits. It can now perform a three-way merge, by examining the presence and identity of files in each commit. In the table below, "A", "B" and "C" are possible file contents, "X" is a missing file, and "conflict" (which only appears as a result) is a merge failure.

In base In source In destination Result Comment
A A A A Unchanged file
A B B B Files changed on both sides in same way
A B C conflict Files changed on both sides differently
A A B B File changed only on one branch
A B A B File changed only on one branch
A X X X Files deleted on both sides
A B X conflict File changed on one side, deleted on the other
A X B conflict File changed on one side, deleted on the other
A A X X File deleted on one side
A X A X File deleted on one side

Merge Strategies

The API and lakectl allow passing an optional strategy flag with the following values:

source-wins

In case of a conflict, merge will pick the source objects.

Example

lakectl merge lakefs://example-repo/validated-data lakefs://example-repo/production --strategy source-wins

When a merge conflict arises, the conflicting objects in the validated-data branch will be chosen to end up in production.

dest-wins

In case of a conflict, merge will pick the destination objects.

Example

lakectl merge lakefs://example-repo/validated-data lakefs://example-repo/production --strategy dest-wins

When a merge conflict arises, the conflicting objects in the production branch will be chosen to end up in validated-data. The production branch will not be affected by object changes from validated-data conflicting objects.

The strategy will affect all conflicting objects in the merge if it is set. Currently it is not possible to treat conflicts individually.

As a format-agnostic system, lakeFS currently merges by complete files. Format-specific and other user-defined merge strategies for handling conflicts are on the roadmap.

Async Merge (Enterprise)

Info

This feature is available with lakeFS Enterprise. Start a free trial.

lakeFS Enterprise supports asynchronous merge operations for improved scalability.

Overview

In lakeFS Enterprise, merge operations execute asynchronously:

  1. The API returns immediately with a task ID
  2. The merge executes in the background
  3. Clients poll for completion status
  4. On success, the status response includes the merge commit information (same as a synchronous merge response)
  5. On failure, the status response includes the error with its status code (same as a synchronous merge error)

Client Support

Async merge is supported by:

  • lakectl - Uses async merge automatically when connected to lakeFS Enterprise
  • lakeFS UI - Uses async merge automatically
  • Python (lakefs-sdk) - Supports async operations via the API. Support in the high-level Python SDK will be added soon.

Backwards compatibility: Older clients that don't support async operations will continue to work, as both sync and async APIs are supported.

lakectl and UI behavior

When using lakeFS Enterprise:

  • lakectl merge uses async merge by default
  • The lakeFS UI uses async merge by default
  • Both handle polling automatically, so the experience is seamless

API usage

For direct API access, use the experimental async endpoints:

  • POST /repositories/{repo}/refs/{source}/merge/{dest}/async - Start async merge
  • GET /repositories/{repo}/refs/{source}/merge/{dest}/async/{id}/status - Poll status