Prepare Your AWS S3 Bucket
- From the S3 Administration console, choose Create Bucket.
- Make sure that you block public access and disable object lock.
- Use the following as your bucket policy, filling in the placeholders:
{
"Id": "lakeFSPolicy",
"Version": "2012-10-17",
"Statement": [
{
"Sid": "lakeFSObjects",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:AbortMultipartUpload",
"s3:ListMultipartUploadParts"
],
"Effect": "Allow",
"Resource": ["arn:aws:s3:::<BUCKET_NAME_AND_PREFIX>/*"],
"Principal": {
"AWS": ["arn:aws:iam::<ACCOUNT_ID>:role/<IAM_ROLE>"]
}
},
{
"Sid": "lakeFSBucket",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation",
"s3:ListBucketMultipartUploads"
],
"Effect": "Allow",
"Resource": ["arn:aws:s3:::<BUCKET>"],
"Principal": {
"AWS": ["arn:aws:iam::<ACCOUNT_ID>:role/<IAM_ROLE>"]
}
}
]
}
- Replace
<BUCKET_NAME>
,<ACCOUNT_ID>
and<IAM_ROLE>
with values relevant to your environment. <BUCKET_NAME_AND_PREFIX>
can be the bucket name. If you want to minimize the bucket policy permissions, use the bucket name together with a prefix (e.g.example-bucket/a/b/c
). This way, lakeFS will be able to create repositories only under this specific path (see: Storage Namespace).- lakeFS will try to assume the role
<IAM_ROLE>
.
You’re now ready to create your first lakeFS repository.
Alternative: use an AWS user
lakeFS can authenticate with your AWS account using an AWS user, using an access key and secret. To allow this, change the policy’s Principal accordingly:
"Principal": {
"AWS": ["arn:aws:iam::<ACCOUNT_ID>:user/<IAM_USER>"]
}
Advanced: minimal permissions
lakeFS requires permissions to access the _lakefs
prefix under your storage namespace, in which the metadata
objects are stored (learn more).
By setting this policy you’ll be able to perform only metadata operations through lakeFS, meaning that you’ll not be able
to use lakeFS to upload or download objects. Specifically you won’t be able to:
- Upload objects using the lakeFS GUI
- Upload objects through Spark using the S3 gateway
- Run
lakectl fs
commands (unless using the--direct
flag)
This permission is useful if you upload/download objects to/from your bucket using external tools. For example, you can use the lakeFS Hadoop FileSystem Spark integration to directly access your S3 bucket while performing metadata operations through lakeFS on the objects in that bucket.
{
"Id": "<POLICY_ID>",
"Version": "2012-10-17",
"Statement": [
{
"Sid": "lakeFSObjects",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::<STORAGE_NAMESPACE>/_lakefs/*"
],
"Principal": {
"AWS": ["arn:aws:iam::<ACCOUNT_ID>:role/<IAM_ROLE>"]
}
},
{
"Sid": "lakeFSBucket",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Effect": "Allow",
"Resource": ["arn:aws:s3:::<BUCKET>"],
"Principal": {
"AWS": ["arn:aws:iam::<ACCOUNT_ID>:role/<IAM_ROLE>"]
}
}
]
}
You’re now ready to create your first lakeFS repository.