Link Search Menu Expand Document

Calling the lakeFS API from Python

The lakeFS API is OpenAPI 3.0-compliant, allowing the generation of clients from multiple languages or directly accessed by any HTTP client.

For Python, this example uses lakeFS’s python package. The lakefs-client pacakge was created by OpenAPI Generator using our OpenAPI definition served by a lakeFS server.

Install lakeFS Python Client API

Install the Python client using pip:

pip install 'lakefs_client==<lakeFS version>'

The package is available from version >= 0.34.0.

Working with the Client API

Here’s how to instantiate a client:

import lakefs_client
from lakefs_client import models
from lakefs_client.client import LakeFSClient

# lakeFS credentials and endpoint
configuration = lakefs_client.Configuration()
configuration.username = 'AKIAIOSFODNN7EXAMPLE'
configuration.password = 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'
configuration.host = 'http://localhost:8000'

client = LakeFSClient(configuration)

Using the generated client

Now that you have a client object, you can use it to interact with the API.

Creating a repository

repo = models.RepositoryCreation(name='example-repo', storage_namespace='s3://storage-bucket/repos/example-repo', default_branch='main')
client.repositories.create_repository(repo)
# output:
# {'creation_date': 1617532175,
#  'default_branch': 'main',
#  'id': 'example-repo',
#  'storage_namespace': 's3://storage-bucket/repos/example-repo'}

Creating a branch, uploading files, committing changes

List the repository branches:

client.branches.list_branches('example-repo')
# output:
# [{'commit_id': 'cdd673a4c5f42d33acdf3505ecce08e4d839775485990d231507f586ebe97656', 'id': 'main'}]

Create a new branch:

client.branches.create_branch(repository='example-repo', branch_creation=models.BranchCreation(name='experiment-aggregations1', source='main'))
# output:
# 'cdd673a4c5f42d33acdf3505ecce08e4d839775485990d231507f586ebe97656'

List again to see your newly created branch:

client.branches.list_branches('example-repo').results
# output:
# [{'commit_id': 'cdd673a4c5f42d33acdf3505ecce08e4d839775485990d231507f586ebe97656', 'id': 'experiment-aggregations1'}, {'commit_id': 'cdd673a4c5f42d33acdf3505ecce08e4d839775485990d231507f586ebe97656', 'id': 'main'}]

Great. Now, let’s upload a file into your new branch:

with open('file.csv', 'rb') as f:
    client.objects.upload_object(repository='example-repo', branch='experiment-aggregations1', path='path/to/file.csv', content=f)
# output:
# {'checksum': '0d3b39380e2500a0f60fb3c09796fdba',
#  'mtime': 1617534834,
#  'path': 'path/to/file.csv',
#  'path_type': 'object',
#  'physical_address': 'local://example-repo/1865650a296c42e28183ad08e9b068a3',
#  'size_bytes': 18}

Diffing a single branch will show all the uncommitted changes on that branch:

client.branches.diff_branch(repository='example-repo', branch='experiment-aggregations1').results
# output:
# [{'path': 'path/to/file.csv', 'path_type': 'object', 'type': 'added'}]

As expected, our change appears here. Let’s commit it and attach some arbitrary metadata:

client.commits.commit(
    repository='example-repo',
    branch='experiment-aggregations1',
    commit_creation=models.CommitCreation(message='Added a CSV file!', metadata={'using': 'python_api'}))
# output:
# {'committer': 'barak',
#  'creation_date': 1617535120,
#  'id': 'e80899a5709509c2daf797c69a6118be14733099f5928c14d6b65c9ac2ac841b',
#  'message': 'Added a CSV file!',
#  'meta_range_id': '',
#  'metadata': {'using': 'python_api'},
#  'parents': ['cdd673a4c5f42d33acdf3505ecce08e4d839775485990d231507f586ebe97656']}

Diffing again, this time there should be no uncommitted files:

client.branches.diff_branch(repository='example-repo', branch='experiment-aggregations1').results
# output:
# []

Merging changes from a branch into main

Let’s diff between your branch and the main branch:

client.refs.diff_refs(repository='example-repo', left_ref='main', right_ref='experiment-aggregations1').results
# output:
# [{'path': 'path/to/file.csv', 'path_type': 'object', 'type': 'added'}]

Looks like you have a change. Let’s merge it:

client.refs.merge_into_branch(repository='example-repo', source_ref='experiment-aggregations1', destination_branch='main')
# output:
# {'reference': 'd0414a3311a8c1cef1ef355d6aca40db72abe545e216648fe853e25db788fa2e',
#  'summary': {'added': 1, 'changed': 0, 'conflict': 0, 'removed': 0}}

Let’s diff again - there should be no changes as all changes are on our main branch already:

client.refs.diff_refs(repository='example-repo', left_ref='main', right_ref='experiment-aggregations1').results
# output:
# []

Python Client documentation

For the documentation of lakeFS’s Python package, see https://pydocs.lakefs.io

Full API reference

For a full reference of the lakeFS API, see lakeFS API