Airbyte is an open-source platform for syncing data from applications, APIs, and databases to warehouses, lakes, and other destinations. You can use Airbyte’s connectors to get your data pipelines to consolidate many input sources.
The integration between the two open-source projects brings resilience and manageability when you use Airbyte connectors to sync data to your S3 buckets by leveraging lakeFS branches and atomic commits and merges.
You can take advantage of lakeFS consistency guarantees and CI/CD capabilities when ingesting data to S3 using lakeFS:
- Consolidate many data sources to a single branch and expose them to consumers simultaneously when merging to the
- Test incoming data for breaking schema changes using lakeFS hooks.
- Prevent consumers from reading partial data from connectors which failed half-way through sync.
- Experiment with ingested data on a branch before exposing it.
Set the following parameters when creating a new Destination of type S3:
|Endpoint||The lakeFS S3 gateway URL||
|S3 Bucket Name||The lakeFS repository where the data will be written||
|S3 Bucket Path||The branch and the path where the data will be written||
|S3 Bucket Region||Not applicable to lakeFS, use
|S3 Key ID||The lakeFS access key id used to authenticate to lakeFS.||
|S3 Access Key||The lakeFS secret access key used to authenticate to lakeFS.||
The S3 Destination connector supports custom S3 endpoints starting with Airbyte’s version
v0.26.0-alpha released on Jun 17th 2021
The UI configuration will look as follows: