There are two aspects we will need to handle in order to run Airflow with lakeFS:
Since lakeFS supports AWS S3 API, it works seamlessly with all operators that work on top of S3 (such as SparkSubmitOperator, S3FileTransormOperator, etc.)
All we need to do is set lakeFS as the endpoint-url and use our lakeFS credentials instead of our S3 credentials and that’s about it.
We could then run tasks on lakeFS using the lakeFS path convention
For example, a commit task using the bashOperator:
commit_extract = BashOperator( task_id='commit_extract', bash_command='lakectl commit lakefs://example_repo@example_dag_branch -m "extract data"', dag=dag, )