Let’s Query Something
The lakeFS server has been loaded with a sample parquet datafile. Fittingly enough for a piece of software to help users of data lakes, the lakes.parquet
file holds data about lakes around the world.
You’ll notice that the branch is set to main
. This is conceptually the same as your main branch in Git against which you develop software code.
Let’s have a look at the data, ahead of making some changes to it on a branch in the following steps.
Click on lakes.parquet
and notice that the built-it DuckDB runs a query to show a preview of the file’s contents.
Now we’ll run our own query on it to look at the top five countries represented in the data.
Copy and paste the following SQL statement into the DuckDB query panel and click on Execute.
SELECT country, COUNT(*)
FROM READ_PARQUET('lakefs://quickstart/main/lakes.parquet')
GROUP BY country
ORDER BY COUNT(*)
DESC LIMIT 5;
Next we’re going to make some changes to the data—but on a development branch so that the data in the main branch remains untouched.