Actions and Hooks in lakeFS
When we interact with lakeFS it can be useful to have certain checks performed at stages along the way. Let’s see how actions in lakeFS can be of benefit here.
We’re going to enforce a rule that when a commit is made to any branch that begins with etl
:
- the commit message must not be blank
- there must be
job_name
andversion
metadata - the
version
must be numeric
To do this we’ll create an action. In lakeFS, an action specifies one or more events that will trigger it, and references one or more hooks to run when triggered. Actions are YAML files written to lakeFS under the _lakefs_actions/
folder of the lakeFS repository.
Hooks can be either a Lua script that lakeFS will execute itself, an external web hook, or an Airflow DAG. In this example, we’re using a Lua hook.
Configuring the Action
-
In lakeFS create a new branch called
add_action
. You can do this through the UI or withlakectl
:docker exec lakefs lakectl branch create lakefs://quickstart/add_action --source lakefs://quickstart/main
-
Open up your favorite text editor (or emacs), and paste the following YAML:
name: Check Commit Message and Metadata on: pre-commit: branches: - etl** hooks: - id: check_metadata type: lua properties: script: | commit_message=action.commit.message if commit_message and #commit_message>0 then print("✅ The commit message exists and is not empty: " .. commit_message) else error("\n\n❌ A commit message must be provided") end job_name=action.commit.metadata["job_name"] if job_name == nil then error("\n❌ Commit metadata must include job_name") else print("✅ Commit metadata includes job_name: " .. job_name) end version=action.commit.metadata["version"] if version == nil then error("\n❌ Commit metadata must include version") else print("✅ Commit metadata includes version: " .. version) if tonumber(version) then print("✅ Commit metadata version is numeric") else error("\n❌ Version metadata must be numeric: " .. version) end end
-
Save this file as
/tmp/check_commit_metadata.yml
- You can save it elsewhere, but make sure you change the path below when uploading
-
Upload the
check_commit_metadata.yml
file to theadd_action
branch under_lakefs_actions/
. As above, you can use the UI (make sure you select the correct branch when you do), or withlakectl
:docker exec lakefs lakectl fs upload lakefs://quickstart/add_action/_lakefs_actions/check_commit_metadata.yml --source /tmp/check_commit_metadata.yml
-
Go to the Uncommitted Changes tab in the UI, and make sure that you see the new file in the path shown:
Click Commit Changes and enter a suitable message to commit this new file to the branch.
-
Now we’ll merge this new branch into
main
. From the Compare tab in the UI compare themain
branch withadd_action
and click Merge
Testing the Action
Let’s remind ourselves what the rules are that the action is going to enforce.
When a commit is made to any branch that begins with
etl
:
- the commit message must not be blank
- there must be
job_name
andversion
metadata- the
version
must be numeric
We’ll start by creating a branch that’s going to match the etl
pattern, and then go ahead and commit a change and see how the action works.
-
Create a new branch (see above instructions on how to do this if necessary) called
etl_20230504
. Make sure you usemain
as the source branch.In your new branch you should see the action that you created and merged above:
-
To simulate an ETL job we’ll use the built-in DuckDB editor to run some SQL and write the result back to the lakeFS branch.
Open the
lakes.parquet
file on theetl_20230504
branch from the Objects tab. Replace the SQL statement with the following:COPY ( WITH src AS ( SELECT lake_name, country, depth_m, RANK() OVER ( ORDER BY depth_m DESC) AS lake_rank FROM READ_PARQUET('lakefs://quickstart/etl_20230504/lakes.parquet')) SELECT * FROM SRC WHERE lake_rank <= 10 ) TO 'lakefs://quickstart/etl_20230504/top10_lakes.parquet'
-
Head to the Uncommitted Changes tab in the UI and notice that there is now a file called
top10_lakes.parquet
waiting to be committed.Now we’re ready to start trying out the commit rules, and seeing what happens if we violate them.
-
Click on Commit Changes, leave the Commit message blank, and click Commit Changes to confirm.
Note that the commit fails because the hook did not succeed
pre-commit hook aborted
with the output from the hook’s code displayed
❌ A commit message must be provided
-
Do the same as the previous step, but provide a message this time:
The commit still fails as we need to include metadata too, which is what the error tells us
❌ Commit metadata must include job_name
-
Repeat the Commit Changes dialog and use the Add Metadata field to add the required metadata:
We’re almost there, but this still fails (as it should), since the version is not entirely numeric but includes
v
andß
:❌ Version metadata must be numeric: v1.00ß
Repeat the commit attempt specify the version as
1.00
this time, and rejoice as the commit succeeds
You can view the history of all action runs from the Action tab: