Link Search Menu Expand Document

Prepare Your AWS S3 Bucket

  1. From the S3 Administration console, choose Create Bucket.
  2. Make sure that you block public access and disable object lock.
  3. Use the following as your bucket policy, filling in the placeholders:
{
   "Id": "lakeFSPolicy",
   "Version": "2012-10-17",
   "Statement": [
      {
         "Sid": "lakeFSObjects",
         "Action": [
            "s3:GetObject",
            "s3:PutObject",
            "s3:AbortMultipartUpload",
            "s3:ListMultipartUploadParts"
         ],
         "Effect": "Allow",
         "Resource": ["arn:aws:s3:::<BUCKET_NAME_AND_PREFIX>/*"],
         "Principal": {
            "AWS": ["arn:aws:iam::<ACCOUNT_ID>:role/<IAM_ROLE>"]
         }
      },
      {
         "Sid": "lakeFSBucket",
         "Action": [
            "s3:ListBucket",
            "s3:GetBucketLocation",
            "s3:ListBucketMultipartUploads"
         ],
         "Effect": "Allow",
         "Resource": ["arn:aws:s3:::<BUCKET>"],
         "Principal": {
            "AWS": ["arn:aws:iam::<ACCOUNT_ID>:role/<IAM_ROLE>"]
         }
      }
   ]
}
  • Replace <BUCKET_NAME>, <ACCOUNT_ID> and <IAM_ROLE> with values relevant to your environment.
  • <BUCKET_NAME_AND_PREFIX> can be the bucket name. If you want to minimize the bucket policy permissions, use the bucket name together with a prefix (e.g. example-bucket/a/b/c). This way, lakeFS will be able to create repositories only under this specific path (see: Storage Namespace).
  • lakeFS will try to assume the role <IAM_ROLE>.

You’re now ready to create your first lakeFS repository.

Alternative: use an AWS user

lakeFS can authenticate with your AWS account using an AWS user, using an access key and secret. To allow this, change the policy’s Principal accordingly:

 "Principal": {
   "AWS": ["arn:aws:iam::<ACCOUNT_ID>:user/<IAM_USER>"]
 }

Advanced: minimal permissions

lakeFS requires permissions to access the _lakefs prefix under your storage namespace, in which the metadata objects are stored (learn more).
By setting this policy you’ll be able to perform only metadata operations through lakeFS, meaning that you’ll not be able to use lakeFS to upload or download objects. Specifically you won’t be able to:

  • Upload objects using the lakeFS GUI
  • Upload objects through Spark using the S3 gateway
  • Run lakectl fs commands (unless using the --direct flag)

This permission is useful if you upload/download objects to/from your bucket using external tools. For example, you can use the lakeFS Hadoop FileSystem Spark integration to directly access your S3 bucket while performing metadata operations through lakeFS on the objects in that bucket.

{
  "Id": "<POLICY_ID>",
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "lakeFSObjects",
      "Action": [
         "s3:GetObject",
         "s3:PutObject"
      ],
      "Effect": "Allow",
      "Resource": [
         "arn:aws:s3:::<STORAGE_NAMESPACE>/_lakefs/*"
      ],
      "Principal": {
        "AWS": ["arn:aws:iam::<ACCOUNT_ID>:role/<IAM_ROLE>"]
      }
    },
     {
        "Sid": "lakeFSBucket",
        "Action": [
           "s3:ListBucket",
           "s3:GetBucketLocation"
        ],
        "Effect": "Allow",
        "Resource": ["arn:aws:s3:::<BUCKET>"],
        "Principal": {
           "AWS": ["arn:aws:iam::<ACCOUNT_ID>:role/<IAM_ROLE>"]
        }
     }
  ]
}

You’re now ready to create your first lakeFS repository.