Link Search Menu Expand Document

Actions and Hooks in lakeFS

When we interact with lakeFS it can be useful to have certain checks performed at stages along the way. Let’s see how actions in lakeFS can be of benefit here.

We’re going to enforce a rule that when a commit is made to any branch that begins with etl:

  • the commit message must not be blank
  • there must be job_name and version metadata
  • the version must be numeric

To do this we’ll create an action. In lakeFS, an action specifies one or more events that will trigger it, and references one or more hooks to run when triggered. Actions are YAML files written to lakeFS under the _lakefs_actions/ folder of the lakeFS repository.

Hooks can be either a Lua script that lakeFS will execute itself, an external web hook, or an Airflow DAG. In this example, we’re using a Lua hook.

Configuring the Action

  1. In lakeFS create a new branch called add_action. You can do this through the UI or with lakectl:

     docker exec lakefs \
         lakectl branch create \
                 lakefs://quickstart/add_action \
                         --source lakefs://quickstart/main
  2. Open up your favorite text editor (or emacs), and paste the following YAML:

    name: Check Commit Message and Metadata
        - etl**
    - id: check_metadata
      type: lua
        script: |
            if commit_message and #commit_message>0 then
                print("✅ The commit message exists and is not empty: " .. commit_message)
                error("\n\n❌ A commit message must be provided")
            if job_name == nil then
                error("\n❌ Commit metadata must include job_name")
                print("✅ Commit metadata includes job_name: " .. job_name)
            if version == nil then
                error("\n❌ Commit metadata must include version")
                print("✅ Commit metadata includes version: " .. version)
                if tonumber(version) then
                    print("✅ Commit metadata version is numeric")
                    error("\n❌ Version metadata must be numeric: " .. version)
  3. Save this file as /tmp/check_commit_metadata.yml

    • You can save it elsewhere, but make sure you change the path below when uploading
  4. Upload the check_commit_metadata.yml file to the add_action branch under _lakefs_actions/. As above, you can use the UI (make sure you select the correct branch when you do), or with lakectl:

     docker exec lakefs \
         lakectl fs upload \
             lakefs://quickstart/add_action/_lakefs_actions/check_commit_metadata.yml \
             --source /tmp/check_commit_metadata.yml
  5. Go to the Uncommitted Changes tab in the UI, and make sure that you see the new file in the path shown:

    lakeFS Uncommitted Changes view showing a file called `check_commit_metadata.yml` under the path `_lakefs_actions/`

    Click Commit Changes and enter a suitable message to commit this new file to the branch.

  6. Now we’ll merge this new branch into main. From the Compare tab in the UI compare the main branch with add_action and click Merge

    lakeFS Compare view showing the difference between `main` and `add_action` branches

Testing the Action

Let’s remind ourselves what the rules are that the action is going to enforce.

When a commit is made to any branch that begins with etl:

  • the commit message must not be blank
  • there must be job_name and version metadata
  • the version must be numeric

We’ll start by creating a branch that’s going to match the etl pattern, and then go ahead and commit a change and see how the action works.

  1. Create a new branch (see above instructions on how to do this if necessary) called etl_20230504. Make sure you use main as the source branch.

    In your new branch you should see the action that you created and merged above:

    lakeFS branch etl_20230504 with object /_lakefs_actions/check_commit_metadata.yml

  2. To simulate an ETL job we’ll use the built-in DuckDB editor to run some SQL and write the result back to the lakeFS branch.

    Open the lakes.parquet file on the etl_20230504 branch from the Objects tab. Replace the SQL statement with the following:

     COPY (
         WITH src AS (
             SELECT lake_name, country, depth_m,
                 RANK() OVER ( ORDER BY depth_m DESC) AS lake_rank
             FROM READ_PARQUET('lakefs://quickstart/etl_20230504/lakes.parquet'))
         SELECT * FROM SRC WHERE lake_rank <= 10
     ) TO 'lakefs://quickstart/etl_20230504/top10_lakes.parquet'    
  3. Head to the Uncommitted Changes tab in the UI and notice that there is now a file called top10_lakes.parquet waiting to be committed.

    lakeFS branch etl_20230504 with uncommitted file top10_lakes.parquet

    Now we’re ready to start trying out the commit rules, and seeing what happens if we violate them.

  4. Click on Commit Changes, leave the Commit message blank, and click Commit Changes to confirm.

    Note that the commit fails because the hook did not succeed

    pre-commit hook aborted

    with the output from the hook’s code displayed

    ❌ A commit message must be provided

    lakeFS blocking an attempt to commit with no commit message

  5. Do the same as the previous step, but provide a message this time:

    A commit to lakeFS with commit message in place

    The commit still fails as we need to include metadata too, which is what the error tells us

    ❌ Commit metadata must include job_name

  6. Repeat the Commit Changes dialog and use the Add Metadata field to add the required metadata:

    A commit to lakeFS with commit message and metadata in place

    We’re almost there, but this still fails (as it should), since the version is not entirely numeric but includes v and ß:

    ❌ Version metadata must be numeric: v1.00ß

    Repeat the commit attempt specify the version as 1.00 this time, and rejoice as the commit succeeds

    Commit history in lakeFS showing that the commit met the rules set by the action and completed successfully.

You can view the history of all action runs from the Action tab:

Action run history in lakeFS

Bonus Challenge

And so with that, this quickstart for lakeFS draws to a close. If you’re simply having too much fun to stop then here’s an exercise for you.

Implement the requirement from the beginning of this quickstart correctly, such that you write denmark-lakes.parquet in the respective branch and successfully merge it back into main. Look up how to list the contents of the main branch and verify that it looks like this:

object          2023-03-21 17:33:51 +0000 UTC    20.9 kB         denmark-lakes.parquet
object          2023-03-21 14:45:38 +0000 UTC    916.4 kB        lakes.parquet

Finishing Up

Once you’ve finished the quickstart, shut down your local environment with the following command:

docker stop lakefs