Lua Hooks
lakeFS supports running hooks without relying on external components using an embedded Lua VM
Using Lua hooks, it is possible to pass a Lua script to be executed directly by the lakeFS server when an action occurs.
The Lua runtime embedded in lakeFS is limited for security reasons. It provides a narrow set of APIs and functions that by default do not allow:
- Accessing any of the running lakeFS server’s environment
- Accessing the local filesystem available the lakeFS process
Action File Lua Hook Properties
See the Action configuration for overall configuration schema and details.
Property | Description | Data Type | Required | Default Value |
---|---|---|---|---|
args |
One or more arguments to pass to the hook | Dictionary | false | |
script |
An inline Lua script | String | either this or script_file must be specified |
|
script_file |
The lakeFS path to a Lua script | String | either this or script must be specified |
Example Lua Hooks
For more examples and configuration samples, check out the examples/hooks/ directory in the lakeFS repository. You’ll also find step-by-step examples of hooks in action in the lakeFS samples repository.
Display information about an event
This example will print out a JSON representation of the event that occurred:
name: dump_all
on:
post-commit:
post-merge:
post-create-tag:
post-create-branch:
hooks:
- id: dump_event
type: lua
properties:
script: |
json = require("encoding/json")
print(json.marshal(action))
Ensure that a commit includes a mandatory metadata field
A more useful example: ensure every commit contains a required metadata field:
name: pre commit metadata field check
on:
pre-commit:
branches:
- main
- dev
hooks:
- id: ensure_commit_metadata
type: lua
properties:
args:
notebook_url: {"pattern": "my-jupyter.example.com/.*"}
spark_version: {}
script: |
regexp = require("regexp")
for k, props in pairs(args) do
current_value = action.commit.metadata[k]
if current_value == nil then
error("missing mandatory metadata field: " .. k)
end
if props.pattern and not regexp.match(props.pattern, current_value) then
error("current value for commit metadata field " .. k .. " does not match pattern: " .. props.pattern .. " - got: " .. current_value)
end
end
For more examples and configuration samples, check out the examples/hooks/ directory in the lakeFS repository.
Lua Library reference
The Lua runtime embedded in lakeFS is limited for security reasons. The provided APIs are shown below.
aws/s3.get_object(bucket, key)
Returns the body (as a Lua string) of the requested object and a boolean value that is true if the requested object exists
aws/s3.put_object(bucket, key, value)
Sets the object at the given bucket and key to the value of the supplied value string
aws/s3.delete_object(bucket [, key])
Deletes the object at the given key
aws/s3.list_objects(bucket [, prefix, continuation_token, delimiter])
Returns a table of results containing the following structure:
is_truncated
: (boolean) whether there are more results to paginate through using the continuation tokennext_continuation_token
: (string) to pass in the next request to get the next page of resultsresults
(table of tables) information about the objects (and prefixes if a delimiter is used)
a result could in one of the following structures
{
["key"] = "a/common/prefix/",
["type"] = "prefix"
}
or:
{
["key"] = "path/to/object",
["type"] = "object",
["etag"] = "etagString",
["size"] = 1024,
["last_modified"] = "2023-12-31T23:10:00Z"
}
aws/s3.delete_recursive(bucket, prefix)
Deletes all objects under the given prefix
crypto/aes/encryptCBC(key, plaintext)
Returns a ciphertext for the aes encrypted text
crypto/aes/decryptCBC(key, ciphertext)
Returns the decrypted (plaintext) string for the encrypted ciphertext
crypto/hmac/sign_sha256(message, key)
Returns a SHA256 hmac signature for the given message with the supplied key (using the SHA256 hashing algorithm)
crypto/hmac/sign_sha1(message, key)
Returns a SHA1 hmac signature for the given message with the supplied key (using the SHA1 hashing algorithm)
crypto/md5/digest(data)
Returns the MD5 digest (string) of the given data
crypto/sha256/digest(data)
Returns the SHA256 digest (string) of the given data
encoding/base64/encode(data)
Encodes the given data to a base64 string
encoding/base64/decode(data)
Decodes the given base64 encoded data and return it as a string
encoding/base64/url_encode(data)
Encodes the given data to an unpadded alternate base64 encoding defined in RFC 4648.
encoding/base64/url_decode(data)
Decodes the given unpadded alternate base64 encoding defined in RFC 4648 and return it as a string
encoding/hex/encode(value)
Encode the given value string to hexadecimal values (string)
encoding/hex/decode(value)
Decode the given hexadecimal string back to the string it represents (UTF-8)
encoding/json/marshal(table)
Encodes the given table into a JSON string
encoding/json/unmarshal(string)
Decodes the given string into the equivalent Lua structure
encoding/yaml/marshal(table)
Encodes the given table into a YAML string
encoding/yaml/unmarshal(string)
Decodes the given YAML encoded string into the equivalent Lua structure
encoding/parquet/get_schema(payload)
Read the payload (string) as the contents of a Parquet file and return its schema in the following table structure:
{
{ ["name"] = "column_a", ["type"] = "INT32" },
{ ["name"] = "column_b", ["type"] = "BYTE_ARRAY" }
}
gcloud
gcloud/gs_client(gcs_credentials_json_string)
Create a new Google Cloud Storage client using a string that contains a valid credentials.json
file content.
gcloud/gs.write_fuse_symlink(source, destination, mount_info)
Will create a gcsfuse symlink from the source (typically a lakeFS physical address for an object) to a given destination.
mount_info
is a Lua table with "from"
and "to"
keys - since symlinks don’t work for gs://...
URIs, they need to point
to the mounted location instead. from
will be removed from the beginning of source
, and destination
will be added instead.
Example:
source = "gs://bucket/lakefs/data/abc/def"
destination = "gs://bucket/exported/path/to/object"
mount_info = {
["from"] = "gs://bucket",
["to"] = "/home/user/gcs-mount"
}
gs.write_fuse_symlink(source, destination, mount_info)
-- Symlink: "/home/user/gcs-mount/exported/path/to/object" -> "/home/user/gcs-mount/lakefs/data/abc/def"
lakefs
The Lua Hook library allows calling back to the lakeFS API using the identity of the user that triggered the action.
For example, if user A tries to commit and triggers a pre-commit
hook - any call made inside that hook to the lakeFS
API, will automatically use user A’s identity for authorization and auditing purposes.
lakefs/create_tag(repository_id, reference_id, tag_id)
Create a new tag for the given reference
lakefs/diff_refs(repository_id, lef_reference_id, right_reference_id [, after, prefix, delimiter, amount])
Returns an object-wise diff between left_reference_id
and right_reference_id
.
lakefs/list_objects(repository_id, reference_id [, after, prefix, delimiter, amount])
List objects in the specified repository and reference (branch, tag, commit ID, etc.).
If delimiter is empty, will default to a recursive listing. Otherwise, common prefixes up to delimiter
will be shown as a single entry.
lakefs/get_object(repository_id, reference_id, path)
Returns 2 values:
- The HTTP status code returned by the lakeFS API
- The content of the specified object as a lua string
lakefs/diff_branch(repository_id, branch_id [, after, amount, prefix, delimiter])
Returns an object-wise diff of uncommitted changes on branch_id
.
path/parse(path_string)
Returns a table for the given path string with the following structure:
> require("path")
> path.parse("a/b/c.csv")
{
["parent"] = "a/b/"
["base_name"] = "c.csv"
}
path/join(*path_parts)
Receives a variable number of strings and returns a joined string that represents a path:
> require("path")
> path.join("path/", "to", "a", "file.data")
path/o/a/file.data
path/is_hidden(path_string [, seperator, prefix])
returns a boolean - true
if the given path string is hidden (meaning it starts with prefix
) - or if any of its parents start with prefix
.
> require("path")
> path.is_hidden("a/b/c") -- false
> path.is_hidden("a/b/_c") -- true
> path.is_hidden("a/_b/c") -- true
> path.is_hidden("a/b/_c/") -- true
path/default_separator()
Returns a constant string (/
)
regexp/match(pattern, s)
Returns true if the string s
matches pattern
.
This is a thin wrapper over Go’s regexp.MatchString.
regexp/quote_meta(s)
Escapes any meta-characters in string s
and returns a new string
regexp/compile(pattern)
Returns a regexp match object for the given pattern
regexp/compiled_pattern.find_all(s, n)
Returns a table list of all matches for the pattern, (up to n
matches, unless n == -1
in which case all possible matches will be returned)
regexp/compiled_pattern.find_all_submatch(s, n)
Returns a table list of all sub-matches for the pattern, (up to n
matches, unless n == -1
in which case all possible matches will be returned).
Submatches are matches of parenthesized subexpressions (also known as capturing groups) within the regular expression,
numbered from left to right in order of opening parenthesis.
Submatch 0 is the match of the entire expression, submatch 1 is the match of the first parenthesized subexpression, and so on
regexp/compiled_pattern.find(s)
Returns a string representing the left-most match for the given pattern in string s
regexp/compiled_pattern.find_submatch(s)
find_submatch returns a table of strings holding the text of the leftmost match of the regular expression in s
and the matches, if any, of its submatches
strings/split(s, sep)
returns a table of strings, the result of splitting s
with sep
.
strings/trim(s)
Returns a string with all leading and trailing white space removed, as defined by Unicode
strings/replace(s, old, new, n)
Returns a copy of the string s with the first n non-overlapping instances of old
replaced by new
.
If old
is empty, it matches at the beginning of the string and after each UTF-8 sequence, yielding up to k+1 replacements for a k-rune string.
If n < 0, there is no limit on the number of replacements
strings/has_prefix(s, prefix)
Returns true
if s
begins with prefix
strings/has_suffix(s, suffix)
Returns true
if s
ends with suffix
strings/contains(s, substr)
Returns true
if substr
is contained anywhere in s
time/now()
Returns a float64
representing the amount of nanoseconds since the unix epoch (01/01/1970 00:00:00).
time/format(epoch_nano, layout, zone)
Returns a string representation of the given epoch_nano timestamp for the given Timezone (e.g. "UTC"
, "America/Los_Angeles"
, …)
The layout
parameter should follow Go’s time layout format.
time/format_iso(epoch_nano, zone)
Returns a string representation of the given epoch_nano
timestamp for the given Timezone (e.g. "UTC"
, "America/Los_Angeles"
, …)
The returned string will be in ISO8601 format.
time/sleep(duration_ns)
Sleep for duration_ns
nanoseconds
time/since(epoch_nano)
Returns the amount of nanoseconds elapsed since epoch_nano
time/add(epoch_time, duration_table)
Returns a new timestamp (in nanoseconds passed since 01/01/1970 00:00:00) for the given duration
.
The duration
should be a table with the following structure:
> require("time")
> time.add(time.now(), {
["hour"] = 1,
["minute"] = 20,
["second"] = 50
})
You may omit any of the fields from the table, resulting in a default value of 0
for omitted fields
time/parse(layout, value)
Returns a float64
representing the amount of nanoseconds since the unix epoch (01/01/1970 00:00:00).
This timestamp will represent date value
parsed using the layout
format.
The layout
parameter should follow Go’s time layout format
time/parse_iso(value)
Returns a float64
representing the amount of nanoseconds since the unix epoch (01/01/1970 00:00:00 for value
.
The value
string should be in ISO8601 format
uuid/new()
Returns a new 128-bit RFC 4122 UUID in string representation.
net/url
Provides a parse
function parse a URL string into parts, returns a table with the URL’s host, path, scheme, query and fragment.
> local url = require("net/url")
> url.parse("https://example.com/path?p1=a#section")
{
["host"] = "example.com"
["path"] = "/path"
["scheme"] = "https"
["query"] = "p1=a"
["fragment"] = "section"
}
net/http
(optional)
Provides a request
function that performs an HTTP request.
For security reasons, this package is not available by default as it enables http requests to be sent out from the lakeFS instance network. The feature should be enabled under actions.lua.net_http_enabled
configuration.
Request will time out after 30 seconds.
http.request(url [, body])
http.request{
url = string,
[method = string,]
[headers = header-table,]
[body = string,]
}
Returns a code (number), body (string), headers (table) and status (string).
- code - status code number
- body - string with the response body
- headers - table with the response request headers (key/value or table of values)
- status - status code text
The first form of the call will perform GET requests or POST requests if the body parameter is passed.
The second form accepts a table and allows you to customize the request method and headers.
Example of a GET request
local http = require("net/http")
local code, body = http.request("https://example.com")
if code == 200 then
print(body)
else
print("Failed to get example.com - status code: " .. code)
end
Example of a POST request
local http = require("net/http")
local code, body = http.request{
url="https://httpbin.org/post",
method="POST",
body="custname=tester",
headers={["Content-Type"]="application/x-www-form-urlencoded"},
}
if code == 200 then
print(body)
else
print("Failed to post data - status code: " .. code)
end