Merges in lakeFS¶

The merge operation in lakeFS is similar to Git. It incorporates changes from a merge source (a commit/reference) into a merge destination (a branch).

How does it work?¶

lakeFS first finds the merge base: the nearest common ancestor of the two commits. It can now perform a three-way merge, by examining the presence and identity of files in each commit. In the table below, "A", "B" and "C" are possible file contents, "X" is a missing file, and "conflict" (which only appears as a result) is a merge failure.

In base	In source	In destination	Result	Comment
A	A	A	A	Unchanged file
A	B	B	B	Files changed on both sides in same way
A	B	C	conflict	Files changed on both sides differently
A	A	B	B	File changed only on one branch
A	B	A	B	File changed only on one branch
A	X	X	X	Files deleted on both sides
A	B	X	conflict	File changed on one side, deleted on the other
A	X	B	conflict	File changed on one side, deleted on the other
A	A	X	X	File deleted on one side
A	X	A	X	File deleted on one side

Merge Strategies¶

The API and lakectl allow passing an optional strategy flag with the following values:

`source-wins`¶

In case of a conflict, merge will pick the source objects.

Example

lakectl merge lakefs://example-repo/validated-data lakefs://example-repo/production --strategy source-wins

When a merge conflict arises, the conflicting objects in the validated-data branch will be chosen to end up in production.

`dest-wins`¶

In case of a conflict, merge will pick the destination objects.

Example

lakectl merge lakefs://example-repo/validated-data lakefs://example-repo/production --strategy dest-wins

When a merge conflict arises, the conflicting objects in the production branch will be chosen to end up in validated-data. The production branch will not be affected by object changes from validated-data conflicting objects.

The strategy will affect all conflicting objects in the merge if it is set. Currently it is not possible to treat conflicts individually.

As a format-agnostic system, lakeFS currently merges by complete files. Format-specific and other user-defined merge strategies for handling conflicts are on the roadmap.

Async Merge (Enterprise)¶

Info

Available in lakeFS Cloud and lakeFS Enterprise from v1.76.0.

lakeFS Enterprise supports asynchronous merge operations for improved scalability.

Overview¶

In lakeFS Enterprise, merge operations execute asynchronously:

The API returns immediately with a task ID
The merge executes in the background
Clients poll for completion status
On success, the status response includes the merge commit information (same as a synchronous merge response)
On failure, the status response includes the error with its status code (same as a synchronous merge error)

Client Support¶

Async merge is supported by:

lakectl - Uses async merge automatically when connected to lakeFS Enterprise
lakeFS UI - Uses async merge automatically
Python (lakefs-sdk) - Supports async operations via the API. Support in the high-level Python SDK will be added soon.

Backwards compatibility: Older clients that don't support async operations will continue to work, as both sync and async APIs are supported.

lakectl and UI behavior¶

When using lakeFS Enterprise:

lakectl merge uses async merge by default
The lakeFS UI uses async merge by default
Both handle polling automatically, so the experience is seamless

API usage¶

For direct API access, use the experimental async endpoints:

POST /repositories/{repo}/refs/{source}/merge/{dest}/async - Start async merge
GET /repositories/{repo}/refs/{source}/merge/{dest}/async/{id}/status - Poll status