Calling the lakeFS API from Python
The lakeFS API is OpenAPI 3.0-compliant, allowing the generation of clients from multiple languages or directly accessed by any HTTP client.
For Python, this example uses lakeFS’s python package. The lakefs-client pacakge was created by OpenAPI Generator using our OpenAPI definition served by a lakeFS server.
Install lakeFS Python Client API
Install the Python client using pip:
pip install 'lakefs_client==<lakeFS version>'
The package is available from version >= 0.34.0.
Working with the Client API
Here’s how to instantiate a client:
import lakefs_client
from lakefs_client import models
from lakefs_client.client import LakeFSClient
# lakeFS credentials and endpoint
configuration = lakefs_client.Configuration()
configuration.username = 'AKIAIOSFODNN7EXAMPLE'
configuration.password = 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'
configuration.host = 'http://localhost:8000'
client = LakeFSClient(configuration)
Using the generated client
Now that you have a client object, you can use it to interact with the API.
Creating a repository
repo = models.RepositoryCreation(name='example-repo', storage_namespace='s3://storage-bucket/repos/example-repo', default_branch='main')
client.repositories.create_repository(repo)
# output:
# {'creation_date': 1617532175,
# 'default_branch': 'main',
# 'id': 'example-repo',
# 'storage_namespace': 's3://storage-bucket/repos/example-repo'}
Creating a branch, uploading files, committing changes
List the repository branches:
client.branches.list_branches('example-repo')
# output:
# [{'commit_id': 'cdd673a4c5f42d33acdf3505ecce08e4d839775485990d231507f586ebe97656', 'id': 'main'}]
Create a new branch:
client.branches.create_branch(repository='example-repo', branch_creation=models.BranchCreation(name='experiment-aggregations1', source='main'))
# output:
# 'cdd673a4c5f42d33acdf3505ecce08e4d839775485990d231507f586ebe97656'
List again to see your newly created branch:
client.branches.list_branches('example-repo').results
# output:
# [{'commit_id': 'cdd673a4c5f42d33acdf3505ecce08e4d839775485990d231507f586ebe97656', 'id': 'experiment-aggregations1'}, {'commit_id': 'cdd673a4c5f42d33acdf3505ecce08e4d839775485990d231507f586ebe97656', 'id': 'main'}]
Great. Now, let’s upload a file into your new branch:
with open('file.csv', 'rb') as f:
client.objects.upload_object(repository='example-repo', branch='experiment-aggregations1', path='path/to/file.csv', content=f)
# output:
# {'checksum': '0d3b39380e2500a0f60fb3c09796fdba',
# 'mtime': 1617534834,
# 'path': 'path/to/file.csv',
# 'path_type': 'object',
# 'physical_address': 'local://example-repo/1865650a296c42e28183ad08e9b068a3',
# 'size_bytes': 18}
Diffing a single branch will show all the uncommitted changes on that branch:
client.branches.diff_branch(repository='example-repo', branch='experiment-aggregations1').results
# output:
# [{'path': 'path/to/file.csv', 'path_type': 'object', 'type': 'added'}]
As expected, our change appears here. Let’s commit it and attach some arbitrary metadata:
client.commits.commit(
repository='example-repo',
branch='experiment-aggregations1',
commit_creation=models.CommitCreation(message='Added a CSV file!', metadata={'using': 'python_api'}))
# output:
# {'committer': 'barak',
# 'creation_date': 1617535120,
# 'id': 'e80899a5709509c2daf797c69a6118be14733099f5928c14d6b65c9ac2ac841b',
# 'message': 'Added a CSV file!',
# 'meta_range_id': '',
# 'metadata': {'using': 'python_api'},
# 'parents': ['cdd673a4c5f42d33acdf3505ecce08e4d839775485990d231507f586ebe97656']}
Diffing again, this time there should be no uncommitted files:
client.branches.diff_branch(repository='example-repo', branch='experiment-aggregations1').results
# output:
# []
Merging changes from a branch into main
Let’s diff between your branch and the main branch:
client.refs.diff_refs(repository='example-repo', left_ref='main', right_ref='experiment-aggregations1').results
# output:
# [{'path': 'path/to/file.csv', 'path_type': 'object', 'type': 'added'}]
Looks like you have a change. Let’s merge it:
client.refs.merge_into_branch(repository='example-repo', source_ref='experiment-aggregations1', destination_branch='main')
# output:
# {'reference': 'd0414a3311a8c1cef1ef355d6aca40db72abe545e216648fe853e25db788fa2e',
# 'summary': {'added': 1, 'changed': 0, 'conflict': 0, 'removed': 0}}
Let’s diff again - there should be no changes as all changes are on our main branch already:
client.refs.diff_refs(repository='example-repo', left_ref='main', right_ref='experiment-aggregations1').results
# output:
# []
Python Client documentation
For the documentation of lakeFS’s Python package, see https://pydocs.lakefs.io
Full API reference
For a full reference of the lakeFS API, see lakeFS API