Link Search Menu Expand Document

Let’s Query Something

The lakeFS server has been loaded with a sample parquet datafile. Fittingly enough for a piece of software to help users of data lakes, the lakes.parquet file holds data about lakes around the world.

You’ll notice that the branch is set to main. This is conceptually the same as your main branch in Git against which you develop software code.

The lakeFS objects list with a highlight to indicate that the branch is set to main.

Let’s have a look at the data, ahead of making some changes to it on a branch in the following steps.

Click on lakes.parquet and notice that the built-it DuckDB runs a query to show a preview of the file’s contents.

The lakeFS object viewer with embedded DuckDB to query parquet files. A query has run automagically to preview the contents of the selected parquet file.

Now we’ll run our own query on it to look at the top five countries represented in the data.

Copy and paste the following SQL statement into the DuckDB query panel and click on Execute.

SELECT   country, COUNT(*)
FROM     READ_PARQUET('lakefs://quickstart/main/lakes.parquet')
GROUP BY country
ORDER BY COUNT(*) 
DESC LIMIT 5;

An embedded DuckDB query showing a count of rows per country in the dataset.

Next we’re going to make some changes to the data—but on a development branch so that the data in the main branch remains untouched.