Mount CSI Driver (Everest on Kubernetes)
lakeFS Cloud
lakeFS Enterprise
The lakeFS CSI (Container Storage Interface) Driver is an extension for Kubernetes that enables seamless access to data within a lakeFS repository, allowing Pods to interact with lakeFS data as if it were part of the local filesystem. This driver builds on the functionality of Everest, which provides a read-only view of lakeFS data by virtually mounting a repository.
Note ⚠️ The CSI Driver is in private preview. Please contact us to get access.
How mount is executed on a Host
- While the
csi
driver is responsible for mounting and unmounting the volume on the host, it does not need permissions to execute themount
andumount
commands directly. - The
everest
commands are executed bysystemd
service on the Host itself (i.eeverest mount...
). - The
csi
driver communicates with thesystemd
service via a unix socket to execute themount
andumount
commands.
Status and Limitations
- Tested OS: BottleRocket-OS, Amazon Linux 2 and RHEL 8.
- Minimal Kubernetes versions
>=1.23.0
. - Tested Cluster providers EKS, Openshift (Partially).
- Static provisioning only explain below.
- Setting Pods
securityContext
UID and GID (i.erunAsUser: 1000
,runAsGroup: 2000
) is very neuanced in nature and does not have wide coverage currently, not supported but might work. - Pod only supports access mode
ReadOnlyMany
Static Provisioning only (Relevant for pods)
When requesting a mount from the CSI driver, the driver will create a PersistentVolume
(PV) and PersistentVolumeClaim
(PVC) for the Pod.
The driver only supports Static Provisioning as of today, and you need an existing lakeFS repository to use.
To use Static Provisioning, you should set storageClassName
field of your PersistentVolume (PV)
and PersistentVolumeClaim (PVC)
to ""
(empty string). Also, in order to make sure no other PVCs can claim your PV, you should define a one-to-one mapping using claimRef
.
Requirements
- For enterprise installations: lakeFS Version
1.25.0
or higher. - You have a Kubernetes cluster with version
>=1.23.0
and Helm installed. - lakeFS Server that can be access from pods in the cluster.
- Access to download treeverse/everest-lakefs-csi-driver from Docker Hub. Contact us to gain access to lakeFS Enterprise features.
Deploy the CSI Driver
The CSI Driver is deployed to K8S cluster using a dedicated Helm chart everest-lakefs-csi-driver.
1. Update your helm with the chart:
Add lakeFS Helm repository if not already added:
helm repo add lakefs https://charts.lakefs.io
Fetch the chart from lakeFS repository:
helm repo update lakefs
Verify the chart is available and updated:
helm show chart lakefs/everest-lakefs-csi-driver
List all available chart versions:
helm search repo lakefs/everest-lakefs-csi-driver -l
2. Configure the values for the CSI Driver in a values.yaml
file
Helm Chart default values:
helm show values lakefs/everest-lakefs-csi-driver --version <version>
CSI driver config:
All the driver CLI flags can be configured via environment variables (prefiedx CSI_DRIVER_
) and can be passed to the driver.
Example values.yaml
file - Minimal required arguments not commented:
# image:
# repository: treeverse/everest-lakefs-csi-driver
# # Optional CSI Driver override version (default .Chart.AppVersion)
# tag: 1.2.3
# Same as fluffy https://github.com/treeverse/fluffy?tab=readme-ov-file#1-dockerhub-token-for-fluffy
imagePullSecret:
token: <dockerhub-token>
username: <dockerhub-user>
# Credentials that will be used by everest as a default to access lakeFS mount paths
lakeFSAccessSecret:
keyId: <lakefs-key-id>
accessKey: <lakefs-access-key>
endpoint: <lakefs-endpoint>
node:
# verbosity level of the driver (normal values are 0-4, 5 would be most verbose)
logLevel: 4
# Only set if having issues with running or installing the everest binary
# Path directory where the everest binary accessed by the underlying K8S Nodes (${everestInstallPath}/everest)
# The binary will copied from the CSI pod into that location by the init container job in the node.yaml
# This path will be a host path on the K8S Nodes
# depending on the underlying OS and the SELinux policy the binary will be executed by systemd on the Host.
# Known issue when using Bottlerocket OS https://github.com/bottlerocket-os/bottlerocket/pull/3779
# everestInstallPath: /opt/everest-mount/bin/ # should end with "/"
# Additional environment variables that will be passed to the driver can be used to configure the csi driver
# extraEnvVars:
# - name: CSI_DRIVER_MOUNT_TIMEOUT
# value: "30s"
# - name: CSI_DRIVER_EVEREST_DEFAULT_CACHE_SIZE
# value: "10000000000"
# - name: VALUE_FROM_SECRET
# valueFrom:
# secretKeyRef:
# name: secret_name
# key: secret_key
3. Install the Chart to K8S cluster
Install the chart with the values file:
helm install -f values.yaml lakefs lakefs/everest-lakefs-csi-driver --version <version>
Use in Pods
Once the CSI Driver is installed, we can start using it similarly to how all PersistentVolume
(PV) and PersistentVolumeClaim
(PVC) are used in Kubernetes.
The only required argument to set is lakeFSMountUri
in the PV
(See examples below).
Mount Options
Most of the options are optional and can be omitted, but each mount request can be configured with everest mount cli options, they are passed as mountOptions
in the PVC
spec.
Examples
The examples demonstrates different mount sncenarios with the CSI Driver.
All of them are essentially running ls <mount-dir>
and tail -f /dev/null
in a centos container.
If the mount succeeded you will see the contents of your mount directory.
- Set
lakeFSMountUri
(i.elakefs://<repo>/<repo>/[prefix/]
) to the lakeFS mount URI you want to mount. - Run
kubectl apply -f values.yaml
- View the example pod logs to see the mount output
kubectl logs -f <pod-name>
- Single Pod and mount
- Multiple Pods, one mount (Deployment)
- Multiple mounts, single Pod
- StatefulSet (Advanced)
- Mount Options
Configure lakeFSMountUri
to the target URI.
apiVersion: v1
kind: PersistentVolume
metadata:
name: everest-pv
spec:
capacity:
storage: 100Gi # ignored, required
accessModes:
- ReadOnlyMany # supported options: ReadOnlyMany
# everest mount options goes under mountOptions and forwarded to the everest mount command
# mountOptions:
# set cache size in bytes
# - cache-size 1000000000
csi:
driver: csi.everest.lakefs.io # required
volumeHandle: everest-csi-driver-volume
volumeAttributes:
# mount target, replace with your lakeFS mount URI
lakeFSMountUri: <LAKEFS_MOUNT_URI>
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: everest-claim
spec:
accessModes:
- ReadOnlyMany # supported options: ReadOnlyMany
storageClassName: "" # required for static provisioning
resources:
requests:
storage: 5Gi # ignored, required
volumeName: everest-pv
---
apiVersion: v1
kind: Pod
metadata:
name: everest-app
spec:
containers:
- name: app
image: centos
command: ["/bin/sh"]
args: ["-c", "ls /data/; tail -f /dev/null"]
volumeMounts:
- name: persistent-storage-isan
mountPath: /data
volumes:
- name: persistent-storage-isan
persistentVolumeClaim:
claimName: everest-claim
Configure lakeFSMountUri
to the target URI.
In this example a deployment is created with 3 replicas, all sharing a single PersistentVolume
and PVC
Behind the scenes each pod get’s their own mount, even if on the same k8s node, each pod will get their own mount directory.
Unlike in StatefulSet, this can scale-up-down with no additional interference and deleted easily the same way.
apiVersion: v1
kind: PersistentVolume
metadata:
name: multiple-pods-one-pv
spec:
capacity:
storage: 1200Gi # ignored, required
accessModes:
- ReadOnlyMany # supported options: ReadOnlyMany
# everest mount options goes under mountOptions and forwarded to the everest mount command
# mountOptions:
# - cache-size 1000000555
csi:
driver: csi.everest.lakefs.io # required
volumeHandle: everest-csi-driver-volume
volumeAttributes:
# mount target, replace with your lakeFS mount URI
lakeFSMountUri: <LAKEFS_MOUNT_URI>
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: multiple-pods-one-claim
spec:
accessModes:
- ReadOnlyMany # supported options: ReadOnlyMany
storageClassName: "" # required for static provisioning
resources:
requests:
storage: 1200Gi # ignored, required
volumeName: multiple-pods-one-pv
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: multiple-pods-one-pv-app
labels:
app: multiple-pods-one-pv-app
spec:
replicas: 3
selector:
matchLabels:
app: multiple-pods-one-pv-app
template:
metadata:
labels:
app: multiple-pods-one-pv-app
spec:
containers:
- name: multiple-pods-one-pv-app
image: centos
command: ["/bin/sh"]
args: ["-c", "ls /data/; tail -f /dev/null"]
volumeMounts:
- name: persistent-storage
mountPath: /data
ports:
- containerPort: 80
volumes:
- name: persistent-storage
persistentVolumeClaim:
claimName: multiple-pods-one-claim
Deploy a pod with two mounts to different mount points.
Configure lakeFSMountUri
for each PersistentVolume
.
apiVersion: v1
kind: PersistentVolume
metadata:
name: multiple-mounts-one-pod-pv
spec:
capacity:
storage: 1200Gi # ignored, required
accessModes:
- ReadOnlyMany # supported options: ReadOnlyMany
mountOptions:
- cache-size 1000000111
csi:
driver: csi.everest.lakefs.io # required
volumeHandle: everest-csi-driver-volume # must be unique
volumeAttributes:
# mount target local-lakefs dir, replace with your lakeFS mount URI
lakeFSMountUri: <LAKEFS_URI_1>
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: multple-mounts-one-pod-claim
spec:
accessModes:
- ReadOnlyMany # supported options: ReadOnlyMany
storageClassName: "" # required for static provisioning
resources:
requests:
storage: 1200Gi # ignored, required
volumeName: multiple-mounts-one-pod-pv
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: multiple-mounts-one-pod-pv-2
spec:
capacity:
storage: 1200Gi # ignored, required
accessModes:
- ReadOnlyMany # ReadOnlyMany
mountOptions:
- cache-size 1000000555
csi:
driver: csi.everest.lakefs.io # required
volumeHandle: everest-csi-driver-volume-2 # must be unique
volumeAttributes:
# mount target images dir, replace with your lakeFS mount URI
lakeFSMountUri: <LAKEFS_URI_2>
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: multple-mounts-one-pod-claim-2
spec:
accessModes:
- ReadOnlyMany # supported options: ReadOnlyMany
storageClassName: "" # required for static provisioning
resources:
requests:
storage: 1200Gi # ignored, required
volumeName: multiple-mounts-one-pod-pv-2
---
apiVersion: v1
kind: Pod
metadata:
name: everest-multi-mounts-one-pod
spec:
containers:
- name: app
image: centos
command: ["/bin/sh"]
args: ["-c", "ls /data/; ls /data2/; tail -f /dev/null"]
volumeMounts:
- name: persistent-storage
mountPath: /data
- name: persistent-storage-2
mountPath: /data2
volumes:
- name: persistent-storage
persistentVolumeClaim:
claimName: multple-mounts-one-pod-claim
- name: persistent-storage-2
persistentVolumeClaim:
claimName: multple-mounts-one-pod-claim-2
Configure lakeFSMountUri
to the target URI.
Because of the neuances described below, if not required it is best to avoid using a StatefulSet
.
Deletion:
It’s intended behavior for StatefulSet in K8S that the PVC is not deleted automatically when the pod is deleted since the StatefulSet controller does not manage the PVC. To completley delete use k delete with –force flag or first delete the PVC: ‘kubectl delete pvc -l app=sts-app-simple-everest’
Scale Down:
replicas: 0 can be set to scale down the StatefulSet and bring back up with replicas: 1.
Replicas > 1:
not supported in this example, since the driver only supports static provisoning. to use Statefulset with replica > 1 we need to add PersistentVolume(s) manually.
apiVersion: v1
kind: PersistentVolume
metadata:
name: sts-simple-mount
labels:
app: sts-app-simple-everest
spec:
capacity:
storage: 100Gi # ignored, required
accessModes:
- ReadOnlyMany # supported options: ReadOnlyMany
mountOptions:
# override default cache size for the mount (in bytes)
- cache-size 1000000555
- log-level debug
csi:
driver: csi.everest.lakefs.io # required
volumeHandle: everest-csi-driver-volume
volumeAttributes:
# mount target, replace with your lakeFS mount URI
lakeFSMountUri: <LAKEFS_MOUNT_URI>
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: sts-app-simple-everest
spec:
replicas: 1
selector:
matchLabels:
app: sts-app-simple-everest
template:
metadata:
labels:
app: sts-app-simple-everest
spec:
containers:
- name: app
image: centos
command: ["/bin/sh"]
args: ["-c", "ls /data/; tail -f /dev/null"]
volumeMounts:
- name: sts-simple-mount
mountPath: /data
volumeClaimTemplates:
- metadata:
name: sts-simple-mount
spec:
selector:
matchLabels:
app: sts-app-simple-everest
storageClassName: "" # required for static provisioning
accessModes: [ "ReadOnlyMany" ]
resources:
requests:
storage: 5Gi # ignored, required
This demonstrates common flags and uncommon flags that can be used for a mount.
In general, the flags are set in mountOptions
and are passed to the everest mount command.
apiVersion: v1
kind: PersistentVolume
metadata:
name: options-demo-pv
spec:
capacity:
storage: 100Gi # ignored, required
accessModes:
- ReadOnlyMany # supported options: ReadOnlyMany
# everest mount options goes under mountOptions and forwarded to the everest mount command
mountOptions:
# set cache size in bytes
- cache-size 10000000
# set log level to debug when inspecting mount logs (very noisy!)
- log-level trace
# WARN: lakeFS credentials / endpoint should be managed securely by the CSI-driver, this is an advanced flag use-case
# override default lakeFS credentials (for use-cases where the default csi-driver credentials are not sufficient)
- lakectl-access-key-id <LAKEFS_ACCESS_KEY_ID>
- lakectl-secret-access-key <LAKEFS_SECRET_ACCESS_KEY>
- lakectl-server-url <LAKEFS_ENDPOINT>
# WARN: an advanced flag and rarelly needed if at all, performs mount directly using fuser relying on it to exist on the host server without using FUSE syscalls
# be default fuse-direct-mount is true
# - fuse-direct-mount false
# - mount-gid 2000
# - mount-uid 1000
# - presign false
# - log-enable-syslog false
csi:
driver: csi.everest.lakefs.io # required
volumeHandle: everest-csi-driver-volume
volumeAttributes:
# mount target, staging org (non default credentials on csi), replace with your lakeFS mount URI
lakeFSMountUri: <LAKEFS_MOUNT_URI>
# REST OF THE RESOURCES
# ...
Troubleshooting
- Use
kubectl
and check the CSI driver pod and failed Pod for logs and events. - If a specific mount request failed, specifically inspect csi-node that the failed mount pod was deployed on.
- Check the events and status of the
PVC
andPV
of the failing podkubectl get pv && kubectl get pvc
Advanced: SSH into the underlying K8S node:
Find the failed mount service systemctl list-units --type=service
:
everest-lakefs-mount-0.0.8-everest-123.service loaded active running CSI driver FUSE daemon
Get systemd service status:
# service name example: everest-lakefs-mount-0.0.8-everest-123.service
systemctl status <service>
# output contains many things including the exec command to run, example:
# ExecStart=/opt/bin/everest mount lakefs://test-mount/main/local-lakefs/ /var/lib/kubelet/pods/123/volumes/kubernetes.io~csi/everest-pv/mount --log-level=trace --cache-dir=/var/lib/kubelet/pods/123/volumes/kubernetes.io~csi/everest-pv/.everest --lakectl-config=/opt/mountpoint-s3-csi/bin/lakectl.yaml
See systemd logs of a service:
journalctl -f -u <service>
# example:
journalctl -f -u everest-lakefs-mount-0.0.8-everest-123.service