Link Search Menu Expand Document

Object Retention

LakeFS allows configuration of object retention policies using a lifecycle configuration. Concepts are similar to S3 lifecycle configuration, however not all properties exist. Most notably current support is only for object expiration and not storage class transition.

Note: Currently only S3 block adapter supports retention

Retention

System configuration

lakeFS requires some global system configuration to be able to perform S3 lifecycle actions. This is configured in configuration variables under blockstore.s3.retention.

Per-repo configuration

Configuration uses a JSON configuration input. For example:

{
  "rules": [
	{
	  "filter": {
		"prefix": "master/logs/"
      },
      "status": "enabled",
      "expiration": {
        "all": {
          "weeks": 2,
          "days": 1
        },
        "noncurrent": {
          "days": 7
        }
      }
    },
    {
      "filter": {
        "prefix": "users/"
      },
      "expiration": {
        "all": {
          "days": 3
        }
      }
    }
  ]
}

To view the retention policy for a bucket, use:

lakectl repo retention get lakefs://repo/

To load a new retention policy for a bucket, use:

lakectl repo retention set lakefs://repo/ --policy-file /path/to/policy.yml

Format

Exact format is given by swagger.yml in the definition of retention_policy.

A configuration is a single JSON document. It applies to a single repository and holds similar fields to AWS Lifecycle Policy documents. It is an object with a single field rules, which holds an array of rules.

Every rule is an object with these fields:

  • a filter: an object with a field prefix. A prefix has the form branch/path and matches all objects on branch starting with the prefix path. If no prefix is present, the filter matches all objects.
  • a status: enabled or disabled.
  • an expiration: must specify expiration time periods for at least one of these types of files:
    • all: expire all objects after this time period
    • uncommitted: expire all uncommitted objects after this time period
    • noncurrent: expire all committed objects that are not at HEAD after this time period

A time period is an object with integer-valued properties days and weeks.

Operation

The command lakefs expire checks and expires any matching objects. Make sure it runs occasionally (usually once per day). Any expired objects are removed from underlying storage.

Canonical object names

An object can be seen from multiple branches. However every visible object was committed to a single branch. The path to the object using that branch is the canonical object name. Retention rules apply only according to that canonical name.

For example, if branch staging is rooted in branch master then objects are visible in staging if they were committed to that branch or to master and were not deleted from staging. Objects expire according to the branch to which they were committed:

  • Objects committed to staging expire according to rules for prefix staging/. After expiring, an object with the same name in master will become visible, or there will be no object visible with that name.
  • Objects committed to master expire according to rules for prefix master/, but not for prefix staging/. After expiring, no object will be visible with that name.

    Attempting to open an expired object will fail with HTTP status code 410 Gone. This can happen e.g. via using an already-known path.

Filters

Filters consist currently a single type prefix. These is a filename prefix that must match the object name. Trailing slashes are treated as part of the filename, so prefix /master/logs/ matches only objects inside “directory” /master/logs, but prefix /master/logs will also match objects inside “directory” /master/logstash/.

Action

These action types are supported:

  • expiration: All currently committed objects (whether latest or not) with that prefix will expire after the given length of time.
  • uncommitted_expiration: Any uncommitted objects with that prefix will expire after the given length of time.
  • noncurrent_expiration: Any “previous” (not currently visible) objects will expire after the given length of time.

Each action takes a time specification. These two time specification types are supported:

  • days: Number of days after which to expire.
  • weeks: Number of weeks after which to expire.

Differences from S3 Lifecycle Configuration

  1. Object lifecycles respect the underlying branch model.
  2. Only expiration is supported.
  3. Lifecycle is configured in JSON format, not XML.
  4. S3 object versioning is not supported by LakeFS (however LakeFS versions are of course supported).
  5. Expiration on a specific date is not supported.
  6. Retention filtering on LakeFS currently supports only prefixes; S3 has additional tag support.