This section describes how to import existing data into a lakeFS repository, without copying it. If you are interested in copying data into lakeFS, see Copying data to/from lakeFS.
Importing data into lakeFS
Prerequisites
- Importing is permitted for users in the Supers (open-source) group or the SuperUsers (Cloud/Enterprise) group. To learn how lakeFS Cloud and lakeFS Enterprise users can fine-tune import permissions, see Fine-grained permissions below.
- The lakeFS server must have permissions to list the objects in the source bucket.
- The source bucket must be on the same cloud provider and in the same region as your repository.
Using the lakeFS UI
- In your repository’s main page, click the Import button to open the import dialog.
- Under Import from, fill in the location on your object store you would like to import from.
- Fill in the import destination in lakeFS. This should be a path under the current branch.
- Add a commit message, and optionally commit metadata.
- Press Import.
Once the import is complete, a new commit containing the imported objects will be created in the destination branch.
Using the CLI: lakectl import
The lakectl import command acts the same as the UI import wizard. It commits the changes to the selected branch.
Notes
- Any previously existing objects under the destination prefix will be deleted.
- The import duration depends on the amount of imported objects, but will roughly be a few thousand objects per second.
- For security reasons, if you are using lakeFS on top of your local disk (
blockstore.type=local
), you need to enable the import feature explicitly. To do so, set theblockstore.local.import_enabled
totrue
and specify the allowed import paths inblockstore.local.allowed_external_prefixes
(see configuration reference). When using lakectl or the lakeFS UI, you can currently import only directories locally. If you need to import a single file, use the HTTP API or API Clients withtype=object
in the request body anddestination=<full-path-to-file>
. - Making changes to data in the original bucket will not be reflected in lakeFS, and may cause inconsistencies.
Examples
To explore practical examples and real-world use cases of importing data into lakeFS, we recommend checking out our comprehensive blog post on the subject.
Fine-grained permissions
lakeFS Cloud
lakeFS Enterprise
With RBAC support, The lakeFS user running the import command should have the following permissions in lakeFS:
fs:WriteObject
, fs:CreateCommit
, fs:ImportFromStorage
and fs:ImportCancel
.
As mentioned above, all of these permissions are available by default to the Supers (open-source) group or the SuperUsers (Cloud/Enterprise).
Provider-specific permissions
In addition, the following for provider-specific permissions may be required: