Airbyte is an open-source platform to sync data from applications, APIs & databases to warehouses, lakes and other destinations. Using Airbyte’s connectors you can get your data pipelines running to consolidate many input sources.
Table of contents
Using lakeFS with Airbyte
The integration between the two open-source projects brings resilience and manageability when using Airbyte connectors to sync data to your S3 buckets by leveraging lakeFS branches and atomic commits and merges.
Use-cases
You can leverage lakeFS ACID guarantees and CI/CD capabilities when ingesting data to S3 using lakeFS:
- Consolidate many data source into a single branch and expose them to the consumers simultaneously when merging to the
main
branch. - Test the incoming data for breaking schema changes, using lakeFS hooks.
- Avoid having consumers reading partial data from connectors which failed half-way through the sync operation.
- Experiment with ingested data before exposing it.
S3 Connector
lakeFS exposes an S3 Gateway that enables applications to communicate with lakeFS the same way they would with Amazon S3. You can use Airbyte’s S3 Destination for uploading the data to lakeFS.
Configuring lakeFS using the connector
Set the following parameters when creating a new Destination of type S3:
Name | Value | Example |
---|---|---|
Endpoint | The lakeFS S3 gateway URL | http://lakefs.example.com |
S3 Bucket Name | The lakeFS repository where the data will be written | example-repo |
S3 Bucket Path | The branch and the path where the data will be written | main/data/from/airbyte Where main is the branch name, and data/from/airbyte is the path under the branch. |
S3 Bucket Region | Not applicable to lakeFS, use us-east-1 |
us-east-1 |
S3 Key ID | The lakeFS access key id used to authenticate to lakeFS. | AKIAlakefs12345EXAMPLE |
S3 Access Key | The lakeFS secret access key used to authenticate to lakeFS. | abc/lakefs/1234567bPxRfiCYEXAMPLEKEY |
Note
S3 Destination connector supports custom S3 endpoints strating from Airbyte’s version
v0.26.0-alpha
released on Jun 17th 2021
The UI configuration will look like: