Link Search Menu Expand Document

Import data into lakeFS

The simplest way to bring data into lakeFS is by copying it, but this approach may not be suitable when a lot of data is involved. To avoid copying the data, lakeFS offers Zero-copy import. With this approach, lakeFS only creates pointers to your existing objects in your new repository.

Zero-copy import


User Permissions

To run import you need the following permissions: fs:WriteObject, fs:CreateMetaRange, fs:CreateCommit, fs:ImportFromStorage and fs:ImportCancel. The first 3 permissions are available by default to users in the default Developers group (RBAC) or the Writers group (ACL). The Import* permissions enable the user to import data from any location of the storage provider that lakeFS has access to and cancel the operation if needed. Thus, it’s only available to users in group Supers (ACL) or SuperUsers(RBAC). RBAC installations can modify policies to add that permission to any group, such as Developers.

lakeFS Permissions

lakeFS must have permissions to list the objects in the source object store, and the source bucket must be in the same region of your destination bucket.
In addition, see the following storage provider specific instructions:

AWS S3: Importing from public buckets

lakeFS needs access to the imported location to first list the files to import and later read the files upon users request.

There are some use cases where the user would like to import from a destination which isn’t owned by the account running lakeFS. For example, importing public datasets to experiment with lakeFS and Spark.

lakeFS will require additional permissions to read from public buckets. For example, for S3 public buckets, the following policy needs to be attached to the lakeFS S3 service-account to allow access to public buckets, while blocking access to other owned buckets:

     "Version": "2012-10-17",
     "Statement": [
         "Sid": "PubliclyAccessibleBuckets",
         "Effect": "Allow",
         "Action": [
         "Resource": ["*"],
         "Condition": {
           "StringNotEquals": {
             "s3:ResourceAccount": "<YourAccountID>"

See Azure deployment on limitations when using account credentials.

Azure Data Lake Gen2

lakeFS requires a hint in the import source URL to understand that the provided storage account is ADLS Gen2

   For source account URL:

   Please add the *adls* subdomain to the URL as follows:

No specific prerequisites

Using the lakeFS UI

To import using the UI, lakeFS must have permissions to list the objects in the source object store.

  1. In your repository’s main page, click the Import button to open the import dialog:


  2. Under Import from, fill in the location on your object store you would like to import from.
  3. Fill in the import destination in lakeFS
  4. Add a commit message, and optionally metadata.
  5. Press Import

Once the import is complete, the changes are merged into the destination branch.


  • Import uses the src-wins merge strategy. Therefore - import of existing objects nad prefixes in destination will override them.
  • The import duration depends on the amount of imported objects, but will roughly be a few thousand objects per second.

lakectl import

Prerequisite: have lakectl installed.

The lakectl import command acts the same as the UI import wizard. It commits the changes to a dedicated branch, with an optional flag to merge the changes to <branch_name>.

lakectl import \
  --from s3://bucket/optional/prefix/ \
  --to lakefs://my-repo/my-branch/optional/path/
lakectl import \
   --from \
   --to lakefs://my-repo/my-branch/optional/path/
lakectl import \
   --from gs://bucket/optional/prefix/ \
   --to lakefs://my-repo/my-branch/optional/path/


  1. Importing is only possible from the object storage service in which your installation stores its data. For example, if lakeFS is configured to use S3, you cannot import data from Azure.
  2. Import is available for S3, GCP and Azure.
  3. For security reasons, if you are lakeFS on top of your local disk, you need to enable the import feature explicitly. To do so, set the blockstore.local.import_enabled to true and specify the allowed import paths in blockstore.local.allowed_external_prefixes (see configuration reference). Since there are some differences between object-stores and file-systems in the way directories/prefixes are treated, local import is allowed only for directories.

Working with imported data

Note that lakeFS cannot manage your metadata if you make changes to data in the original bucket. The following table describes the results of making changes in the original bucket, without importing it to lakeFS:

Object action in the original bucket ListObjects result in lakeFS GetObject result in lakeFS
Create Object not visible Object not accessible
Overwrite Object visible with outdated metadata Updated object accessible
Delete Object visible Object not accessible

Copying data into a lakeFS repository

Another way of getting existing data into a lakeFS repository is by copying it. This has the advantage of having the objects along with their metadata managed by the lakeFS installation, along with lifecycle rules, immutability guarantees and consistent listing. However, do make sure to account for storage cost and time.

To copy data into lakeFS you can use the following tools:

  1. The lakectl command line tool - see the reference to learn more about using it to copy local data into lakeFS. Using lakectl fs upload --recursive you can upload multiple objects together from a given directory.
  2. Using rclone
  3. Using Hadoop’s DistCp