Merges in lakeFS¶
The merge operation in lakeFS is similar to Git. It incorporates changes from a merge source (a commit/reference) into a merge destination (a branch).
How does it work?¶
lakeFS first finds the merge base: the nearest common ancestor of the two commits. It can now perform a three-way merge, by examining the presence and identity of files in each commit. In the table below, "A", "B" and "C" are possible file contents, "X" is a missing file, and "conflict" (which only appears as a result) is a merge failure.
| In base | In source | In destination | Result | Comment |
|---|---|---|---|---|
| A | A | A | A | Unchanged file |
| A | B | B | B | Files changed on both sides in same way |
| A | B | C | conflict | Files changed on both sides differently |
| A | A | B | B | File changed only on one branch |
| A | B | A | B | File changed only on one branch |
| A | X | X | X | Files deleted on both sides |
| A | B | X | conflict | File changed on one side, deleted on the other |
| A | X | B | conflict | File changed on one side, deleted on the other |
| A | A | X | X | File deleted on one side |
| A | X | A | X | File deleted on one side |
Merge Strategies¶
The API and lakectl allow passing an optional strategy flag with the following values:
source-wins¶
In case of a conflict, merge will pick the source objects.
Example
When a merge conflict arises, the conflicting objects in the validated-data branch will be chosen to end up in production.
dest-wins¶
In case of a conflict, merge will pick the destination objects.
Example
When a merge conflict arises, the conflicting objects in the production branch will be chosen to end up in validated-data. The production branch will not be affected by object changes from validated-data conflicting objects.
The strategy will affect all conflicting objects in the merge if it is set. Currently it is not possible to treat conflicts individually.
As a format-agnostic system, lakeFS currently merges by complete files. Format-specific and other user-defined merge strategies for handling conflicts are on the roadmap.
Async Merge (Enterprise)¶
Info
This feature is available with lakeFS Enterprise. Start a free trial.
lakeFS Enterprise supports asynchronous merge operations for improved scalability.
Overview¶
In lakeFS Enterprise, merge operations execute asynchronously:
- The API returns immediately with a task ID
- The merge executes in the background
- Clients poll for completion status
- On success, the status response includes the merge commit information (same as a synchronous merge response)
- On failure, the status response includes the error with its status code (same as a synchronous merge error)
Client Support¶
Async merge is supported by:
- lakectl - Uses async merge automatically when connected to lakeFS Enterprise
- lakeFS UI - Uses async merge automatically
- Python (lakefs-sdk) - Supports async operations via the API. Support in the high-level Python SDK will be added soon.
Backwards compatibility: Older clients that don't support async operations will continue to work, as both sync and async APIs are supported.
lakectl and UI behavior¶
When using lakeFS Enterprise:
lakectl mergeuses async merge by default- The lakeFS UI uses async merge by default
- Both handle polling automatically, so the experience is seamless
API usage¶
For direct API access, use the experimental async endpoints:
POST /repositories/{repo}/refs/{source}/merge/{dest}/async- Start async mergeGET /repositories/{repo}/refs/{source}/merge/{dest}/async/{id}/status- Poll status