diff --git a/docs/src/reference/mount.md b/docs/src/reference/mount.md index 23a4ddcc4c9..3b0d40ef8ef 100644 --- a/docs/src/reference/mount.md +++ b/docs/src/reference/mount.md @@ -1,6 +1,6 @@ --- title: Mount (Everest) -description: This section covers the Everest feature for mounting a lakeFS path to your local filesystem. +description: Mount a lakeFS path to your local filesystem or in Kubernetes. status: enterprise --- @@ -9,968 +9,859 @@ status: enterprise !!! info Available in **lakeFS Cloud** and **lakeFS Enterprise** -Everest is a complementary binary to lakeFS that allows users to virtually mount a remote lakeFS repository onto a local directory. -Once mounted, users can access the data as if it resides on their local filesystem, using any tool, library, or framework that reads from a local filesystem. +Everest is a complementary binary to lakeFS that allows you to virtually mount a remote lakeFS repository onto a local directory or within a Kubernetes environment. Once mounted, you can access data as if it resides on your local filesystem, using any tool, library, or framework. -!!! note - No installation is required. Please [contact us](http://info.lakefs.io/thanks-lakefs-mounts) to get access to the Everest binary. - -!!! tip - Everest mount supports writing to the file system for both NFS and FUSE protocols starting version **0.2.0**! - - [Everest mount write mode semantics →](#mount-write-mode-file-system-behavior). - - - + ## Use Cases -* **Simplified Data Loading**: With lakeFS Mount, there's no need to write custom data loaders or use special SDKs. You can use your existing tools to read and write files directly from the filesystem. -* **Handle Large-scale Data Without changing Work Habits**: Seamlessly scale from a few local files to millions without changing your tools or workflow. Use the same code from early experimentation all the way to production. -* **Enhanced Data Loading Efficiency**: lakeFS Mount supports billions of files and offers fast data fetching, making it ideal for optimizing GPU utilization and other performance-sensitive tasks. - -## Requirements - -- For enterprise installations: lakeFS Version `1.25.0` or higher. - -### OS and Protocol Support - -Currently, the implemented protocols are `nfs` and `fuse`. +- **Simplified Data Loading**: Use your existing tools to read and write files directly from the filesystem with no need for custom data loaders or SDKs. +- **Seamless Scalability**: Scale from a few local files to billions without changing your tools or workflow. Use the same code from experimentation to production. +- **Enhanced Performance**: Everest supports billions of files and offers fast, lazy data fetching, making it ideal for optimizing GPU utilization and other performance-sensitive tasks. -- NFS V3 (Network File System) is supported on macOS. +--- -## Authentication Chain for lakeFS +## Getting Started -When running an Everest `mount` command, authentication occurs in the following order: +This guide will walk you through setting up and using Everest to mount a lakeFS repository on your local machine. -1. **Session token** from the environment variable `EVEREST_LAKEFS_CREDENTIALS_SESSION_TOKEN` or `LAKECTL_CREDENTIALS_SESSION_TOKEN`. - If the token is expired, authentication will fail. -2. **lakeFS key pair**, using lakeFS access key ID and secret key. (picked up from lakectl if Everest not provided) -3. **IAM authentication**, if configured and **no static credentials are set**. +!!! tip "New to Everest?" + After completing this getting started guide, we recommend reading the [Core Concepts](#core-concepts) section to understand caching, consistency, and performance characteristics. -## Authenticate with lakeFS Credentials +### Prerequisites -The authentication with the target lakeFS server is equal to [lakectl CLI][lakectl]. -Searching for lakeFS credentials and server endpoint in the following order: +- lakeFS Cloud account or lakeFS Enterprise Version `1.25.0` or higher. +- **Supported OS:** macOS (with NFS V3) or Linux. +- **Get the Everest Binary:** Everest is a self-contained binary with no installation required. Please [contact us](http://info.lakefs.io/thanks-lakefs-mounts) to get access. -- Command line flags `--lakectl-access-key-id`, `--lakectl-secret-access-key` and `--lakectl-server-url` -- `LAKECTL_*` Environment variables -- `~/.lakectl.yaml` Configuration file or via `--lakectl-config` flag +### Authentication & Configuration -## Authenticating with AWS IAM Role +Everest uses the same configuration and authentication methods as `lakectl`. It discovers credentials and the server endpoint in the following order: -Starting from **lakeFS ≥ v1.57.0** and **Everest ≥ v0.4.0**, authenticating with IAM roles is supported! -When IAM authentication is configured, Everest will use AWS SDK default behavior that will pick your **AWS environment** to generate a **session token** used for authenticating against lakeFS (i.e use `AWS_PROFILE`, `AWS_ACCESS_KEY_ID`, etc). This token is seamlessly refreshed as long as the AWS session remains valid. +1. **Command-Line Flags:** `--lakectl-access-key-id`, `--lakectl-secret-access-key`, and `--lakectl-server-url`. +2. **Environment Variables:** `LAKECTL_*` or `EVEREST_LAKEFS_*` prefixed variables. +3. **Configuration File:** `~/.lakectl.yaml` (or the file specified by `--lakectl-config`). -### Prerequisites +

Authentication Methods

-1. Make sure your lakeFS server supports [AWS IAM Role Login](../security/external-principals-aws.md). -2. Make sure your IAM role is attached to lakeFS. See [Administration of IAM Roles in lakeFS](../security/external-principals-aws.md#administration-of-iam-roles-in-lakefs) +Everest will attempt to authenticate in the following order: -### Configure everest to use IAM +1. **Session Token:** From `EVEREST_LAKEFS_CREDENTIALS_SESSION_TOKEN` or `LAKECTL_CREDENTIALS_SESSION_TOKEN`. If the token is expired, authentication will fail. +2. **lakeFS Key Pair:** Standard access key ID and secret access key (credentials are picked up from lakectl configuration if Everest-specific credentials are not provided). +3. **IAM Authentication:** If your lakeFS environment is configured for [AWS IAM Role Login](../security/external-principals-aws.md), Everest (≥ v0.4.0) can authenticate using your AWS environment (e.g., `AWS_PROFILE`). IAM authentication is only attempted when no static credentials are set. To enable this, [configure your .lakectl.yaml](../security/external-principals-aws.md#lakectl-configuration) with `provider_type: aws_iam`. The token is seamlessly refreshed as long as the AWS session remains valid. -Everest uses the lakectl configuration file. -To enable IAM authentication, refer to the guide on how to [configure .lakectl.yaml](../security/external-principals-aws.md#lakectl-configuration) accordingly. + To configure IAM authentication using environment variables, use the `EVEREST_LAKEFS_*` or `LAKECTL_*` prefix: + ```bash + export EVEREST_LAKEFS_CREDENTIALS_PROVIDER_TYPE=aws_iam + # or + export LAKECTL_CREDENTIALS_PROVIDER_TYPE=aws_iam + ``` -To set using environment variables - those will start with the prefix `EVEREST_LAKEFS_*` or `LAKECTL_*`. -For example, setting the provider type using env vars: -`export EVEREST_LAKEFS_CREDENTIALS_PROVIDER_TYPE=aws_iam` or `LAKECTL_CREDENTIALS_PROVIDER_TYPE=aws_iam`. +!!! warning "lakectl Version Compatibility" + If you configure the IAM provider using the same `lakectl.yaml` file that you use for the lakectl CLI, you must upgrade lakectl to version `≥ v1.57.0`. Otherwise, lakectl will raise errors when using it. -!!! tip - To troubleshoot presign request issues, you can enable debug logging for presign requests using the environment variable: - +!!! tip "Troubleshooting IAM Presign Requests" + To troubleshoot presign request issues with IAM authentication, you can enable debug logging for presign requests using the environment variable: ```bash - EVEREST_LAKEFS_CREDENTIALS_PROVIDER_AWS_IAM_CLIENT_LOG_PRE_SIGNING_REQUEST=true + export EVEREST_LAKEFS_CREDENTIALS_PROVIDER_AWS_IAM_CLIENT_LOG_PRE_SIGNING_REQUEST=true ``` -!!! warning - If you choose to configure IAM provider using the same lakectl file (i.e `lakectl.yaml`) that you use for the **lakectl cli**, - you must upgrade lakectl to version (`≥ v1.57.0`) otherwise lakectl will raise errors when using it. +### Create Your First Mount +Let's mount a prefix from a lakeFS repository to a local directory. In read-only mode, Everest mounts a specific commit ID. If you provide a branch name, it will resolve to the HEAD commit at the time of mounting. -## Consistency Model +1. **Mount the repository:** + This command mounts the `datasets/pets/` prefix from the `main` branch of the `image-repo` repository into a new local directory named `./pets`. -### File System Consistency - -Everest mount provides a strong read-after-write consistency model within a single mount point. -Once a write operation is done, the data is guaranteed to be available for subsequent read operations. + ```bash + everest mount "lakefs://image-repo/main/datasets/pets/" "./pets" + ``` -### lakeFS Consistency +2. **Explore the data:** + You can now use standard filesystem commands to interact with your data. Files are downloaded lazily only when you access their content. -Local changes are reflected in lakeFS only after the changes are **committed**. Until then, the data is not visible to other users. -If, for example, two users mount the same branch, they will not see each other's changes until they are committed. + ```bash + # List files - this only fetches metadata + ls -l "./pets/dogs/" -### Sync local changes to lakeFS + # Find files + find ./pets -name "*.small.jpg" -- As part of `commit` and `diff` commands a `sync` operation will upload all the local changes to a temporary write-branch. + # Open a file - this triggers a download + open -a Preview "./pets/dogs/golden_retrievers/cute.jpg" + ``` -## Mount Write Mode File System Behavior +3. **Unmount the directory:** + When you are finished, unmount the directory. -### Functionality Limitations + ```bash + everest umount "./pets" + ``` -- Newly created empty directories will not reflect as directory markers in lakeFS. -- lakeFS allows having 2 path keys that one is a "directory" prefix of the other, for example the following 2 lakeFS keys are valid: `animals/cat.png` and `animals` (empty object) but since a file system cannot contain both a file and a directory of the same name it will lead to an undefined behavior depending on the Filesystem type (e.g., dir and dir/file). +--- -### File System Behavior +## Core Concepts -#### Not Supported +This section will help you understand how Everest manages performance, consistency, and caching in both local and Kubernetes deployments. -- Rename is not supported. -- Temporary files are not supported. -- Hard/symbolic links are not supported. -- POSIX file locks (lockf) are not supported. -- POSIX permissions are not supported - default permissions are given to files and dirs. -- A deleted file's name cannot be used as a directory type later and the same for opposite types (e.g, Not allowed: touch foo; rm foo; mkdir foo;). -- Calling remove on a directory type will fail explicitly with an error. +### Cache Behavior -#### Behavior modified +Everest uses a local cache to improve performance when accessing files from lakeFS. Understanding how the cache works will help you optimize performance for your specific use case. -- Modifying file metadata (chmod, chown, chgrp, time) will result in noop (the file metadata will not be changed). -- When calling `remove` we mark a file as a tombstone using [Extended Attributes](https://en.wikipedia.org/wiki/Extended_file_attributes) API's. -- Removal is not an atomic operation, calling remove and open at the same time might result in a race condition where the open might succeed. +

How Caching Works

+When you access a file through a mounted lakeFS path, Everest follows this process: -## Command Line Interface +1. **Lazy Fetching**: Files are only downloaded when their content is accessed (e.g., reading a file, not just listing it with `ls`). +2. **Cache Storage**: When an object is not found in the local cache, Everest fetches the data from the object store and stores it in the cache for subsequent access. +3. **Cache Reuse**: Subsequent reads of the same file are served directly from the cache, eliminating network requests and improving performance. Cached can't be shared between different instances of mount. -### Mount Command +

Default Cache Behavior

-The `mount` command is used to mount a lakeFS repository to a local directory, it does it in 2 steps: +By default, Everest creates a temporary cache directory when you run `everest mount`. This directory is automatically cleared when the mount is terminated via `everest umount`. -1. Starting a server that listens on a local address and serves the data from the remote lakeFS repository. -2. Running the required mount command on the OS level to connect the server to the local directory. +**Key points:** -#### Tips: +- Each new mount creates a fresh cache directory. +- By default cache location is managed by Everest and cleaned up automatically. +- The cache is ephemeral and does not persist between mount sessions. Unless you specify the cache directory. -- Since the server runs in the background set `--log-output /some/file` to view the logs in a file. -- Cache: Everest uses a local cache to store the data and metadata of the lakeFS repository. The optimal cache size is the size of the data you are going to read/write. -- Reusing Cache: between restarts of the same mount endpoint, set `--cache-dir` to make sure the cache is reused. -- Mounted data consistency (read-mode): When providing lakeFS URI mount endpoint `lakefs:////` the `` should be a specific commit ID. If a branch/tag is provided, Everest will use the HEAD commit instead. -- When running mount in write-mode, the lakeFS URI must be a branch name, not a commit ID or a tag. +

Persistent Cache

-#### Usage +To reuse cache data across multiple mount sessions, you can specify a custom cache directory using the `--cache-dir` flag: ```bash -everest mount - -Flags ---presign: Use presign for downloading. ---cache-dir: Directory to cache read files in. ---cache-size: Size of the local cache in bytes. ---cache-create-provided-dir: If cache-dir is explicitly provided and does not exist, create it. ---listen: Address to listen on. ---no-spawn: Do not spawn a new server, assume one is already running. ---protocol: Protocol to use (default: nfs). ---log-level: Set logging level. ---log-format: Set logging output format. ---log-output: Set logging output(s). ---write-mode: Enable write mode (default: false). +everest mount lakefs://image-repo/main/datasets/ ./datasets --cache-dir ~/.everest-cache ``` -### Umount Command +**Benefits of persistent cache:** -The `umount` command is used to unmount a currently mounted lakeFS repository. +- Faster startup times when remounting the same data. +- Reduced bandwidth usage by reusing previously downloaded files. +- Useful for iterative workflows where you repeatedly mount and unmount the same repository. -```bash -everest umount -``` +

Cache Management

-### Diff Command (write-mode only) +Everest manages cached data based on the commit ID of the mounted reference: -The `diff` command Show the diff between the source branch and the current mount directory. -If `` not specified, the command searches for the mount directory in the current working directory and upwards based on `.everest` directory existence. -Please note that the diffs are from the source branch state at the time of mounting and not the current state of the source branch, i.e., changes to the source branch from other operations will not be reflected in the diff result. +- **Commit-Based Caching**: Each commit ID has its own cache namespace. This ensures that cached data always corresponds to the correct version of your files. +- **Cache Invalidation on Commit**: When you commit changes in write mode using `everest commit`, the mount point's source commit ID is updated to the new HEAD of the branch. As a result, the cache associated with the old commit ID is no longer used, and new data will be cached under the new commit ID. -```bash -everest diff +!!! tip "Optimizing Cache Size" + Set `--cache-size` to match the amount of data you plan to read or write. A larger cache reduces the need to evict and re-fetch files, improving performance for workloads that access many files. -#Example output: -# - removed datasets/pets/cats/persian/cute.jpg -# ~ modified datasets/pets/dogs/golden_retrievers/cute.jpg -# + added datasets/pets/birds/parrot/cute.jpg -``` +### Consistency & Data Behavior -### Commit Command (write-mode only) +

File System Consistency

-The `commit` command commits the changes made in the mounted directory to the original lakeFS branch. -If `` not specified, the command searches for the mount directory in the current working directory and upwards based on `.everest` directory existence. -The new commit will be merged to the original branch with the `source-wins` strategy. -After the commit is successful, the mounted directory source commit will be updated to the HEAD of the latest commit at the source branch; that means that changes made to the source branch out of the mount scope will also be reflected in the mounted directory. +Everest mount provides **strong read-after-write consistency** within a single mount point. Once a write operation completes, the data is guaranteed to be available for subsequent read operations on that same mount. -!!! warning - Writes to a mount directory during commit may be lost. +

lakeFS Consistency

-```bash -everest commit -m -``` +Local changes are reflected in lakeFS only after they are **committed** using the `everest commit` command. Until then: -### mount-server Command (Advanced) +- Changes are only visible within your local mount point +- Other users or mounts will not see your changes +- If two users mount the same branch, they will not see each other's changes until those changes are committed -!!! note - The `mount-server` command is for advanced use cases and will only spin the server without calling OS mount command. +

Sync Operation

-The mount-server command starts a mount server manually. Generally, users would use the mount command which handles server operations automatically. +When you run `everest diff` or `everest commit`, Everest performs a **sync operation** that uploads all local changes to a temporary location in lakeFS for processing. This ensures your changes are safely transferred before being committed to the branch. -```bash -everest mount-server -Flags ---cache-dir: Directory to cache read files and metadata. ---cache-create-provided-dir: Create the cache directory if it does not exist. ---listen: Address to listen on. ---protocol: Protocol to use (nfs | webdav). ---callback-addr: Callback address to report back to. ---log-level: Set logging level. ---log-format: Set logging output format. ---log-output: Set logging output(s). ---cache-size: Size of the local cache in bytes. ---parallelism: Number of parallel downloads for metadata. ---presign: Use presign for downloading. ---write-mode: Enable write mode (default: false). -``` +See the [Write-Mode Operations](#write-mode-operations) section for more details on working with writable mounts. -### Partial Reads +### Performance Considerations -!!! warning "Experimental" +Everest achieves high-performance data access through: -When reading large files, Everest can fetch from lakeFS only the parts actually accessed. -This can be useful for streaming workloads or for applications handling file formats such as Parquet, m4a, zip, tar that do not need to read the entire file. +- **Direct Object Store Access**: By default, Everest uses pre-signed URLs to read and write data directly to and from the underlying object store, bypassing the lakeFS server for data transfer. Only metadata operations go through the lakeFS server. +- **Lazy Metadata Loading**: Directory listings are fetched on-demand, allowing you to work with repositories containing billions of files without upfront overhead. +- **Partial Reads**: The experimental `--partial-reads` flag enables reading only the accessed portions of large files, which is useful for file formats like Parquet that support column pruning. +- **Cache Sizing**: Setting an appropriate `--cache-size` prevents frequent eviction and re-fetching. As a rule of thumb, size your cache to accommodate your working set. +- **Network Bandwidth**: Since data is fetched directly from object storage, ensure your network connection has adequate bandwidth for your workload. -To enable partial reads, pass the `--partial-reads` flag to the `mount` (or `mount-server`) command: +!!! tip "Optimizing for ML Workloads" + For training jobs, consider using a persistent cache directory (`--cache-dir`) and sizing the cache to fit your entire dataset. This eliminates repeated downloads across training epochs. -```bash -everest mount --partial-reads "lakefs://image-repo/main/datasets/pets/" "./pets" -``` - -## Examples +--- -### Read-Only Mode (default) +## Working with Data (Local Mount) -!!! info - For simplicity, the examples show `main` as the ref, Everest will always mount a specific commit ID when using read-only mode, given a ref it will use the HEAD (e.g the most recent commit). +### Read-Only Operations -!!! example "Data Exploration" - Mount the lakeFS repository and explore data as if it's on the local filesystem. +Read-only mode is the default and is ideal for data exploration, analysis, and feeding data into local applications without the risk of accidental changes. - ```bash - everest mount "lakefs://image-repo/main/datasets/pets/" "./pets" - ls -l "./pets/dogs/" - find ./pets -name "*.small.jpg" - open -a Preview "./pets/dogs/golden_retrievers/cute.jpg" - everest umount "./pets" - ``` +For information about how data is cached and accessed, see the [Cache Behavior](#cache-behavior) section. !!! example "Working with Data Locally" - Mount the remote lakeFS server and use all familiar tools without changing the workflow. + Mount a repository and use your favorite tools directly on the data. ```bash everest mount lakefs://image-repo/main/datasets/pets/ ./pets - pytorch_train.py --input ./pets - duckdb "SELECT * FROM read_parquet('pets/labels.parquet')" - everest umount ./pets - ``` -### Write Mode - -!!! example "Changing Data Locally" - Mount the remote lakeFS server in write mode and change data locally. + # Run a python script + pytorch_train.py --input ./pets - ```bash - everest mount lakefs://image-repo/main/datasets/pets/ ./pets --write-mode - # Add a new file - echo "new data" >> ./pets/birds/parrot/cute.jpg - # Update an existing file - echo "new data" >> ./pets/dogs/golden_retrievers/cute.jpg - # Delete a file - rm ./pets/cats/persian/cute.jpg - - # Check the changes - everest diff ./pets - # - removed datasets/pets/cats/persian/cute.jpg - # ~ modified datasets/pets/dogs/golden_retrievers/cute.jpg - # + added datasets/pets/birds/parrot/cute.jpg - - # Commit the changes to the original lakeFS branch - everest commit ./pets - - everest diff ./pets - # No changes + # Query data with DuckDB + duckdb "SELECT * FROM read_parquet('pets/labels.parquet')" everest umount ./pets ``` -To learn more, read about [Mount Write Mode Semantics](#mount-write-mode-file-system-behavior). +### Write-Mode Operations +By enabling write mode (--write-moed), you can modify, add, and delete files locally and then commit those changes back to the lakeFS branch. +When running in write mode, the lakeFS URI must point to a branch, not a commit ID or a tag. -[lakectl]: ./cli.md +

Example of changing data locally

-## Mount CSI Driver (Everest on Kubernetes) + 1. **Mount in write mode:** + Use the `--write-mode` flag to enable writes. -!!! warning "Private Preview" - The CSI Driver is in private preview. Please [contact us](http://info.lakefs.io/thanks-lakefs-mounts) to get access. - -The lakeFS CSI (Container Storage Interface) Driver is an extension for Kubernetes that enables seamless access to data within a lakeFS repository, allowing Pods to interact with lakeFS data as if it were part of the local filesystem. This driver builds on the functionality of [Everest](./mount.md), which provides a read-only view of lakeFS data by virtually mounting a repository. + ```bash + everest mount lakefs://image-repo/main/datasets/pets/ ./pets --write-mode + ``` -### How mount is executed on a Host + 2. **Modify files:** + Make any changes you need using standard shell commands. -- While the `csi` driver is responsible for mounting and unmounting the volume on the host, it does not need permissions to execute the `mount` and `umount` commands directly. -- The `everest` commands are executed by `systemd` service on the Host itself (i.e `everest mount...`). -- The `csi` driver communicates with the `systemd` service via a unix socket to execute the `mount` and `umount` commands. + ```bash + # Add a new file + echo "new data" > ./pets/birds/parrot/cute.jpg -### Status and Limitations + # Update an existing file + echo "new data" >> ./pets/dogs/golden_retrievers/cute.jpg -- Tested OS: BottleRocket-OS, Amazon Linux 2 and RHEL 8. -- Minimal Kubernetes versions `>=1.23.0`. -- Tested Cluster providers EKS, Openshift (Partially). -- Static provisioning only explain below. -- Setting Pods `securityContext` UID and GID (i.e `runAsUser: 1000`, `runAsGroup: 2000`) is very nuanced in nature and does not have wide coverage currently, not supported but might work. -- Pod only supports access mode `ReadOnlyMany` + # Delete a file + rm ./pets/cats/persian/cute.jpg + ``` -**Static Provisioning only (Relevant for pods)** + 3. **Review your changes:** + The `diff` command shows the difference between your local state and the branch's state at the time of mounting. -When requesting a mount from the CSI driver, the driver will create a `PersistentVolume` (PV) and `PersistentVolumeClaim` (PVC) for the Pod. -The driver only supports Static Provisioning as of today, and you need an existing lakeFS repository to use. + ```bash + everest diff ./pets + # Output: + # + added datasets/pets/birds/parrot/cute.jpg + # ~ modified datasets/pets/dogs/golden_retrievers/cute.jpg + # - removed datasets/pets/cats/persian/cute.jpg + ``` -To use Static Provisioning, you should set `storageClassName` field of your `PersistentVolume (PV)` and `PersistentVolumeClaim (PVC)` to `""` (empty string). Also, in order to make sure no other PVCs can claim your PV, you should define a one-to-one mapping using `claimRef`. + 4. **Commit your changes:** + The `commit` command uploads your local changes and commits them to the source branch in lakeFS. -### Requirements + ```bash + everest commit ./pets -m "Updated pet images" + ``` + After committing, your local mount will be synced to the new HEAD of the branch. Running `diff` again will show no changes. -1. For enterprise installations: lakeFS Version `1.25.0` or higher. -2. You have a Kubernetes cluster with version `>=1.23.0` and [Helm](https://helm.sh/docs/intro/install/) installed. -3. lakeFS Server that can be access from pods in the cluster. -4. Access to download *treeverse/everest-lakefs-csi-driver* from [Docker Hub](https://hub.docker.com/u/treeverse). [Contact us](https://lakefs.io/contact-sales/) to gain access to lakeFS Enterprise features. + 5. **Unmount when finished:** + ```bash + everest umount ./pets + ``` -### Deploy the CSI Driver +!!! info "Write Mode Limitations" + Write mode has some limitations on supported operations. See [Write Mode Limitations](#write-mode-limitations) for details on unsupported operations and modified behaviors. -The CSI Driver is deployed to K8S cluster using a dedicated Helm chart [everest-lakefs-csi-driver](https://github.com/treeverse/charts/releases). -#### 1. Update your helm with the chart: +--- -Add lakeFS Helm repository if not already added: +## Everest on Kubernetes (CSI Driver) -```bash -helm repo add lakefs https://charts.lakefs.io -``` - -Fetch the chart from lakeFS repository: +!!! warning "Private Preview" + The CSI Driver is in private preview. Please [contact us](http://info.lakefs.io/thanks-lakefs-mounts) to get access. + The driver currently provides only read-only access. -```bash -helm repo update lakefs -``` +The lakeFS CSI (Container Storage Interface) Driver allows Kubernetes Pods to mount and interact with data in a lakeFS repository as if it were a local filesystem. -Verify the chart is available and updated: +**In this section:** -```bash -helm show chart lakefs/everest-lakefs-csi-driver -``` +- [How it Works](#how-it-works) - Understanding the CSI driver architecture +- [Status and Limitations](#status-and-limitations) - Supported platforms and current limitations +- [Prerequisites](#1-prerequisites) - Requirements for deploying the CSI driver +- [Deploy the CSI Driver](#2-deploy-the-csi-driver) - Installation instructions using Helm +- [Use in Pods](#3-use-in-pods) - How to mount lakeFS URIs in your Kubernetes workloads +- [Troubleshooting](#4-troubleshooting) - Common issues and debugging steps -List all available chart versions: +### How it Works -```bash -helm search repo lakefs/everest-lakefs-csi-driver -l -``` +The CSI driver, installed in your cluster, orchestrates mount operations on each Kubernetes node. It does not execute `mount` commands directly. Instead, it communicates via a Unix socket with a `systemd` service running on the host. This service is responsible for executing the `everest mount` and `umount` commands, making lakeFS URIs available to Pods as persistent volumes. -#### 2. Configure the values for the CSI Driver in a `values.yaml` file +### Status and Limitations -**Helm Chart default values:** +- **Tested OS:** BottleRocket-OS, Amazon Linux 2, RHEL 8. +- **Kubernetes:** Version `>=1.23.0`. +- **Provisioning:** Static provisioning only. +- **Access Modes:** `ReadOnlyMany` is supported. +- **Security Context:** Setting Pod `securityContext` (e.g., `runAsUser`) is not currently supported. -```bash -helm show values lakefs/everest-lakefs-csi-driver --version -``` +### Prerequisites -**CSI driver configuration:** +1. lakeFS Enterprise Version `1.25.0` or higher. +2. A Kubernetes cluster (`>=1.23.0`) with [Helm](https://helm.sh/docs/intro/install/) installed. +3. Network access from the cluster pods to your lakeFS server. +4. Access to the `treeverse/everest-lakefs-csi-driver` Docker Hub image. -All the driver CLI flags can be configured via environment variables (prefixed `CSI_DRIVER_`) and can be passed to the driver. +### Deploy the CSI Driver -!!! example "`values.yaml` example (minimal required arguments not commented)" +The driver is deployed using a Helm chart. - ```yaml - # image: - # repository: treeverse/everest-lakefs-csi-driver - # # Optional CSI Driver override version (default .Chart.AppVersion) - # tag: 1.2.3 - - # Same as fluffy https://github.com/treeverse/fluffy?tab=readme-ov-file#1-dockerhub-token-for-fluffy - imagePullSecret: - token: - username: - - # Credentials that will be used by everest as a default to access lakeFS mount paths - lakeFSAccessSecret: - keyId: - accessKey: - endpoint: - - node: - # verbosity level of the driver (normal values are 0-4, 5 would be most verbose) - logLevel: 4 - # Only set if having issues with running or installing the everest binary - # Path directory where the everest binary accessed by the underlying K8S Nodes (${everestInstallPath}/everest) - # The binary will copied from the CSI pod into that location by the init container job in the node.yaml - # This path will be a host path on the K8S Nodes - # depending on the underlying OS and the SELinux policy the binary will be executed by systemd on the Host. - # Known issue when using Bottlerocket OS https://github.com/bottlerocket-os/bottlerocket/pull/3779 - # everestInstallPath: /opt/everest-mount/bin/ # should end with "/" - - # Additional environment variables that will be passed to the driver can be used to configure the csi driver - # extraEnvVars: - # - name: CSI_DRIVER_MOUNT_TIMEOUT - # value: "30s" - # - name: CSI_DRIVER_EVEREST_DEFAULT_CACHE_SIZE - # value: "10000000000" - # - name: VALUE_FROM_SECRET - # valueFrom: - # secretKeyRef: - # name: secret_name - # key: secret_key +1. **Add the lakeFS Helm repository:** + ```bash + helm repo add lakefs https://charts.lakefs.io + helm repo update lakefs + ``` + Verify the chart is available and see the latest version: + ```bash + helm search repo lakefs/everest-lakefs-csi-driver + ``` + To see all available chart versions, use the `-l` flag: + ```bash + helm search repo lakefs/everest-lakefs-csi-driver -l ``` -#### 3. Install the Chart to K8S cluster - -Install the chart with the values file: - -```bash -helm install -f values.yaml lakefs lakefs/everest-lakefs-csi-driver --version -``` +2. **Configure `values.yaml`:** + Create a `values.yaml` file to configure the driver. At a minimum, you must provide credentials for Docker Hub and your lakeFS server. + You can view the complete list of configuration options by running `helm show values lakefs/everest-lakefs-csi-driver --version `. + + !!! example "`values.yaml` example" + ```yaml + # Docker Hub credentials to pull the CSI driver image + imagePullSecret: + token: + username: + + # Default lakeFS credentials for Everest to use when mounting volumes + lakeFSAccessSecret: + keyId: + accessKey: + endpoint: + + node: + # Logging verbosity (0-4 is normal, 5 is most verbose) + logLevel: 4 + # (Advanced) Only set if you have issues with the Everest binary installation on the node. + # This path must end with a "/" + # everestInstallPath: /opt/everest-mount/bin/ + + # (Advanced) Additional environment variables for the CSI driver pod + # extraEnvVars: + # - name: CSI_DRIVER_MOUNT_TIMEOUT + # value: "30s" + ``` + +3. **Install the chart:** + ```bash + helm install -f values.yaml lakefs lakefs/everest-lakefs-csi-driver --version + ``` ### Use in Pods -Once the CSI Driver is installed, we can start using it similarly to how all `PersistentVolume` (PV) and `PersistentVolumeClaim` (PVC) are used in Kubernetes. - -The only required argument to set is `lakeFSMountUri` in the `PV` (See examples below). +To use the driver, you create a `PersistentVolume` (PV) and a `PersistentVolumeClaim` (PVC) to mount a lakeFS URI into your Pod. -#### Mount Options +- **Static Provisioning:** You must set `storageClassName: ""` in your PVC. To ensure a PVC is bound to a specific PV, you can use a `claimRef` in the PV definition to create a one-to-one mapping. +- **Mount URI:** The `lakeFSMountUri` is the only required attribute in the PV spec. +- **Mount Options:** Additional `everest mount` flags can be passed via `mountOptions` in the PV spec. -Most of the options are optional and can be omitted, but each mount request can be configured with [everest mount cli options](./mount.md#mount-command), they are passed as `mountOptions` in the `PVC` spec. +

Examples

-#### Examples +The following examples demonstrate how to mount a lakeFS URI in different Kubernetes scenarios. -The examples demonstrates different mount scenarios with the CSI Driver. -All of them are essentially running `ls ` and `tail -f /dev/null` in a centos container. -If the mount succeeded you will see the contents of your mount directory. - -1. Set `lakeFSMountUri` (i.e `lakefs:////[prefix/]`) to the lakeFS mount URI you want to mount. -1. Run `kubectl apply -f values.yaml` -1. View the example pod logs to see the mount output `kubectl logs -f ` - - -=== "Single Pod and mount" - - Configure `lakeFSMountUri` to the target URI. +=== "Single Pod and Mount" + This example mounts a single lakeFS URI into one Pod. ```yaml apiVersion: v1 kind: PersistentVolume metadata: - name: everest-pv + name: everest-pv spec: - capacity: - storage: 100Gi # ignored, required - accessModes: - - ReadOnlyMany # supported options: ReadOnlyMany - # everest mount options goes under mountOptions and forwarded to the everest mount command - # mountOptions: - # set cache size in bytes - # - cache-size 1000000000 - csi: - driver: csi.everest.lakefs.io # required - volumeHandle: everest-csi-driver-volume + capacity: + storage: 100Gi # Required by Kubernetes, but ignored by Everest + accessModes: + - ReadOnlyMany + csi: + driver: csi.everest.lakefs.io + volumeHandle: everest-csi-driver-volume-1 # Must be unique volumeAttributes: - # mount target, replace with your lakeFS mount URI - lakeFSMountUri: - + # Replace with your lakeFS mount URI + lakeFSMountUri: lakefs://// --- apiVersion: v1 kind: PersistentVolumeClaim metadata: - name: everest-claim + name: everest-claim spec: - accessModes: - - ReadOnlyMany # supported options: ReadOnlyMany - storageClassName: "" # required for static provisioning - resources: + accessModes: + - ReadOnlyMany + storageClassName: "" # Required for static provisioning + resources: requests: - storage: 5Gi # ignored, required - volumeName: everest-pv + storage: 5Gi # Required by Kubernetes, but ignored by Everest + volumeName: everest-pv --- apiVersion: v1 kind: Pod metadata: - name: everest-app + name: everest-app spec: - containers: + containers: - name: app - image: centos - command: ["/bin/sh"] - args: ["-c", "ls /data/; tail -f /dev/null"] - volumeMounts: - - name: persistent-storage-isan - mountPath: /data - volumes: - - name: persistent-storage-isan - persistentVolumeClaim: + image: centos + command: ["/bin/sh", "-c", "ls /data/; tail -f /dev/null"] + volumeMounts: + - name: my-lakefs-data + mountPath: /data + volumes: + - name: my-lakefs-data + persistentVolumeClaim: claimName: everest-claim - ``` -=== "Multiple Pods, one mount (Deployment)" - Configure `lakeFSMountUri` to the target URI. - - In this example a deployment is created with 3 replicas, all sharing a single `PersistentVolume` and PVC - Behind the scenes each pod get's their own mount, even if on the same k8s node, each pod will get their own mount directory. - - Unlike in StatefulSet, this can scale-up-down with no additional interference and deleted easily the same way. +=== "Multiple Pods, One Mount (Deployment)" + A Deployment where multiple Pods share the same lakeFS mount. Each Pod gets its own independent mount. ```yaml apiVersion: v1 kind: PersistentVolume metadata: - name: multiple-pods-one-pv + name: multiple-pods-one-pv spec: - capacity: - storage: 1200Gi # ignored, required - accessModes: - - ReadOnlyMany # supported options: ReadOnlyMany - # everest mount options goes under mountOptions and forwarded to the everest mount command - # mountOptions: - # - cache-size 1000000555 - csi: - driver: csi.everest.lakefs.io # required - volumeHandle: everest-csi-driver-volume + capacity: + storage: 100Gi + accessModes: + - ReadOnlyMany + csi: + driver: csi.everest.lakefs.io + volumeHandle: everest-csi-driver-volume-2 # Must be unique volumeAttributes: - # mount target, replace with your lakeFS mount URI - lakeFSMountUri: + lakeFSMountUri: lakefs://// --- apiVersion: v1 kind: PersistentVolumeClaim metadata: - name: multiple-pods-one-claim + name: multiple-pods-one-claim spec: - accessModes: - - ReadOnlyMany # supported options: ReadOnlyMany - storageClassName: "" # required for static provisioning - resources: + accessModes: + - ReadOnlyMany + storageClassName: "" + resources: requests: - storage: 1200Gi # ignored, required - volumeName: multiple-pods-one-pv + storage: 5Gi + volumeName: multiple-pods-one-pv --- apiVersion: apps/v1 kind: Deployment metadata: - name: multiple-pods-one-pv-app - labels: - app: multiple-pods-one-pv-app + name: multi-pod-app spec: - replicas: 3 - selector: + replicas: 3 + selector: matchLabels: - app: multiple-pods-one-pv-app - template: + app: multi-pod-app + template: metadata: - labels: - app: multiple-pods-one-pv-app + labels: + app: multi-pod-app spec: - containers: - - name: multiple-pods-one-pv-app + containers: + - name: app image: centos - command: ["/bin/sh"] - args: ["-c", "ls /data/; tail -f /dev/null"] + command: ["/bin/sh", "-c", "ls /data/; tail -f /dev/null"] volumeMounts: - - name: persistent-storage - mountPath: /data - ports: - - containerPort: 80 - volumes: - - name: persistent-storage + - name: lakefs-storage + mountPath: /data + volumes: + - name: lakefs-storage persistentVolumeClaim: - claimName: multiple-pods-one-claim - + claimName: multiple-pods-one-claim ``` +=== "Multiple Mounts, Single Pod" -=== "Multiple mounts, single Pod" - - Deploy a pod with two mounts to different mount points. - Configure `lakeFSMountUri` for each `PersistentVolume`. - + A single Pod with two different lakeFS URIs mounted to two different paths. ```yaml + # Define two PVs and two PVCs, one for each mount. + # PV 1 apiVersion: v1 kind: PersistentVolume metadata: - name: multiple-mounts-one-pod-pv + name: multi-mount-pv-1 spec: - capacity: - storage: 1200Gi # ignored, required - accessModes: - - ReadOnlyMany # supported options: ReadOnlyMany - mountOptions: - - cache-size 1000000111 - csi: - driver: csi.everest.lakefs.io # required - volumeHandle: everest-csi-driver-volume # must be unique + capacity: { storage: 100Gi } + accessModes: [ReadOnlyMany] + csi: + driver: csi.everest.lakefs.io + volumeHandle: everest-csi-driver-volume-3 # Must be unique volumeAttributes: - # mount target local-lakefs dir, replace with your lakeFS mount URI - lakeFSMountUri: + lakeFSMountUri: lakefs://// --- + # PVC 1 apiVersion: v1 kind: PersistentVolumeClaim metadata: - name: multple-mounts-one-pod-claim + name: multi-mount-claim-1 spec: - accessModes: - - ReadOnlyMany # supported options: ReadOnlyMany - storageClassName: "" # required for static provisioning - resources: - requests: - storage: 1200Gi # ignored, required - volumeName: multiple-mounts-one-pod-pv + accessModes: [ReadOnlyMany] + storageClassName: "" + resources: { requests: { storage: 5Gi } } + volumeName: multi-mount-pv-1 --- + # PV 2 apiVersion: v1 kind: PersistentVolume metadata: - name: multiple-mounts-one-pod-pv-2 + name: multi-mount-pv-2 spec: - capacity: - storage: 1200Gi # ignored, required - accessModes: - - ReadOnlyMany # ReadOnlyMany - mountOptions: - - cache-size 1000000555 - csi: - driver: csi.everest.lakefs.io # required - volumeHandle: everest-csi-driver-volume-2 # must be unique + capacity: { storage: 100Gi } + accessModes: [ReadOnlyMany] + csi: + driver: csi.everest.lakefs.io + volumeHandle: everest-csi-driver-volume-4 # Must be unique volumeAttributes: - # mount target images dir, replace with your lakeFS mount URI - lakeFSMountUri: + lakeFSMountUri: lakefs://// --- + # PVC 2 apiVersion: v1 kind: PersistentVolumeClaim metadata: - name: multple-mounts-one-pod-claim-2 + name: multi-mount-claim-2 spec: - accessModes: - - ReadOnlyMany # supported options: ReadOnlyMany - storageClassName: "" # required for static provisioning - resources: - requests: - storage: 1200Gi # ignored, required - volumeName: multiple-mounts-one-pod-pv-2 + accessModes: [ReadOnlyMany] + storageClassName: "" + resources: { requests: { storage: 5Gi } } + volumeName: multi-mount-pv-2 --- + # Pod apiVersion: v1 kind: Pod metadata: - name: everest-multi-mounts-one-pod + name: multi-mount-pod spec: - containers: + containers: - name: app - image: centos - command: ["/bin/sh"] - args: ["-c", "ls /data/; ls /data2/; tail -f /dev/null"] - volumeMounts: - - name: persistent-storage - mountPath: /data - - name: persistent-storage-2 - mountPath: /data2 - volumes: - - name: persistent-storage - persistentVolumeClaim: - claimName: multple-mounts-one-pod-claim - - name: persistent-storage-2 - persistentVolumeClaim: - claimName: multple-mounts-one-pod-claim-2 - + image: centos + command: ["/bin/sh", "-c", "echo 'Path 1:'; ls /data1; echo 'Path 2:'; ls /data2; tail -f /dev/null"] + volumeMounts: + - name: lakefs-data-1 + mountPath: /data1 + - name: lakefs-data-2 + mountPath: /data2 + volumes: + - name: lakefs-data-1 + persistentVolumeClaim: + claimName: multi-mount-claim-1 + - name: lakefs-data-2 + persistentVolumeClaim: + claimName: multi-mount-claim-2 ``` === "StatefulSet (Advanced)" - Configure `lakeFSMountUri` to the target URI. - Because of the neuances described below, if not required it is best to avoid using a `StatefulSet`. - - **Deletion:** - - It's [intended behavior](https://kubernetes.io/docs/tasks/run-application/delete-stateful-set/#complete-deletion-of-a-statefulset) for StatefulSet in K8S that the PVC is not deleted automatically when the pod is deleted since the StatefulSet controller does not manage the PVC. - To completley delete use k delete with --force flag or first delete the PVC: 'kubectl delete pvc -l app=sts-app-simple-everest' - - **Scale Down:** - - replicas: 0 can be set to scale down the StatefulSet and bring back up with replicas: 1. - - **Replicas > 1:** + Due to the nuances of how StatefulSets manage PersistentVolumeClaims, it is often simpler to use a `Deployment`. - not supported in this example, since the driver only supports static provisioning. - to use Statefulset with replica > 1 we need to add PersistentVolume(s) manually. + * **Deletion:** When you delete a StatefulSet, its PVCs are not automatically deleted. You must delete them manually. + * **Replicas > 1:** Using more than one replica requires manually creating a corresponding number of `PersistentVolume` resources, as static provisioning does not automatically create them. ```yaml apiVersion: v1 kind: PersistentVolume metadata: - name: sts-simple-mount - labels: + name: sts-simple-mount + labels: app: sts-app-simple-everest spec: - capacity: + capacity: storage: 100Gi # ignored, required - accessModes: - - ReadOnlyMany # supported options: ReadOnlyMany - mountOptions: - # override default cache size for the mount (in bytes) - - cache-size 1000000555 - - log-level debug - csi: - driver: csi.everest.lakefs.io # required - volumeHandle: everest-csi-driver-volume + accessModes: + - ReadOnlyMany + csi: + driver: csi.everest.lakefs.io + volumeHandle: everest-csi-driver-volume-5 # Must be unique volumeAttributes: - # mount target, replace with your lakeFS mount URI - lakeFSMountUri: + lakeFSMountUri: --- apiVersion: apps/v1 kind: StatefulSet metadata: - name: sts-app-simple-everest + name: sts-app-simple-everest spec: - replicas: 1 - selector: + replicas: 1 + selector: matchLabels: - app: sts-app-simple-everest - template: + app: sts-app-simple-everest + template: metadata: - labels: + labels: app: sts-app-simple-everest spec: - containers: + containers: - name: app - image: centos - command: ["/bin/sh"] - args: ["-c", "ls /data/; tail -f /dev/null"] - volumeMounts: + image: centos + command: ["/bin/sh", "-c", "ls /data/; tail -f /dev/null"] + volumeMounts: - name: sts-simple-mount - mountPath: /data - volumeClaimTemplates: - - metadata: - name: sts-simple-mount + mountPath: /data + volumeClaimTemplates: + - metadata: + name: sts-simple-mount spec: - selector: + selector: matchLabels: - app: sts-app-simple-everest - storageClassName: "" # required for static provisioning - accessModes: [ "ReadOnlyMany" ] - resources: + app: sts-app-simple-everest + storageClassName: "" # required for static provisioning + accessModes: [ "ReadOnlyMany" ] + resources: requests: - storage: 5Gi # ignored, required - + storage: 5Gi # ignored, required ``` === "Mount Options" - This demonstrates common flags and uncommon flags that can be used for a mount. - In general, the flags are set in `mountOptions` and are passed to the everest [mount command](https://docs.lakefs.io/reference/mount.html#mount-command). + This example demonstrates how to pass various `everest mount` flags via `mountOptions` in the `PersistentVolume` spec. ```yaml apiVersion: v1 kind: PersistentVolume metadata: - name: options-demo-pv + name: options-demo-pv spec: - capacity: + capacity: storage: 100Gi # ignored, required - accessModes: - - ReadOnlyMany # supported options: ReadOnlyMany - # everest mount options goes under mountOptions and forwarded to the everest mount command - mountOptions: - # set cache size in bytes + accessModes: + - ReadOnlyMany + # everest mount flags are passed here + mountOptions: + # set cache size in bytes - cache-size 10000000 - # set log level to debug when inspecting mount logs (very noisy!) + # set log level to trace for debugging (very noisy!) - log-level trace - # WARN: lakeFS credentials / endpoint should be managed securely by the CSI-driver, this is an advanced flag use-case - # override default lakeFS credentials (for use-cases where the default csi-driver credentials are not sufficient) + # WARN: Overriding credentials should only be used in advanced cases. + # It is more secure to rely on the default credentials configured in the CSI driver. - lakectl-access-key-id - lakectl-secret-access-key - lakectl-server-url - # WARN: an advanced flag and rarely needed if at all, performs mount directly using fuser relying on it to exist on the host server without using FUSE syscalls - # be default fuse-direct-mount is true - # - fuse-direct-mount false - # - mount-gid 2000 - # - mount-uid 1000 - # - presign false - # - log-enable-syslog false - csi: - driver: csi.everest.lakefs.io # required - volumeHandle: everest-csi-driver-volume - volumeAttributes: - # mount target, staging org (non default credentials on csi), replace with your lakeFS mount URI + driver: csi.everest.lakefs.io + volumeHandle: everest-csi-driver-volume-6 # Must be unique + volumeAttributes: lakeFSMountUri: - - # REST OF THE RESOURCES - # ... + --- + # PVC and Pod definitions follow... ``` - ### Troubleshooting -- Use `kubectl` and check the CSI driver pod and failed Pod for logs and events. -- If a specific mount request failed, specifically inspect csi-node that the failed mount pod was deployed on. -- Check the events and status of the `PVC` and `PV` of the failing pod `kubectl get pv && kubectl get pvc` +- Check logs from the CSI driver pods and the application pod that failed to mount. +- Inspect the events and status of the `PV` and `PVC` (`kubectl get pv`, `kubectl get pvc`, `kubectl describe ...`). +- **Advanced: SSH into the Kubernetes node** to inspect the `systemd` service logs for the specific mount operation- + 1. Find the failed mount service: + ```sh + systemctl list-units --type=service | grep everest-lakefs-mount + # Example output: + # everest-lakefs-mount-0.0.8-everest-123.service loaded active running CSI driver FUSE daemon + ``` + 2. Get the status and view the exact command that was executed: + ```sh + systemctl status everest-lakefs-mount-0.0.8-everest-123.service + ``` + 3. View the logs for the service: + ```sh + journalctl -f -u everest-lakefs-mount-0.0.8-everest-123.service + ``` -**Advanced: SSH into the underlying K8S node:** +--- + +## Command-Line Reference -Find the failed mount service `systemctl list-units --type=service`: +This section provides detailed documentation for all Everest CLI commands. For conceptual information about how Everest works, see the [Core Concepts](#core-concepts) section. -```sh -everest-lakefs-mount-0.0.8-everest-123.service loaded active running CSI driver FUSE daemon +### `everest mount` +Mounts a lakeFS URI to a local directory. + +```bash +everest mount [flags] ``` -Get systemd service status: +**Tips:** + +- Since the server runs in the background, use `--log-output /path/to/file` to view logs. +- The optimal cache size is the size of the data you are going to read/write. +- To reuse the cache between restarts of the same mount, set the `--cache-dir` flag. +- In read-only mode, if you provide a branch or tag, Everest will resolve and mount the HEAD commit. For a stable mount, use a specific commit ID in the URI. + +**Flags:** + +- `--write-mode`: Enable write mode (default: `false`). +- `--cache-dir`: Directory to cache files. +- `--cache-size`: Size of the local cache in bytes. +- `--cache-create-provided-dir`: If `cache-dir` is provided and does not exist, create it. +- `--listen`: Address for the mount server to listen on. +- `--no-spawn`: Do not spawn a new server; assume one is already running. +- `--protocol`: Protocol to use (default: `nfs`). +- `--log-level`: Set logging level. +- `--log-format`: Set logging output format. +- `--log-output`: Set logging output(s). +- `--presign`: Use pre-signed URLs for direct object store access (default: `true`). +- `--partial-reads`: (Experimental) Fetch only the accessed parts of large files. This can be useful for streaming workloads or for applications handling file formats such as Parquet, m4a, zip, and tar that do not need to read the entire file. -```sh -# service name example: everest-lakefs-mount-0.0.8-everest-123.service -systemctl status +

`everest umount`

-# output contains many things including the exec command to run, example: -# ExecStart=/opt/bin/everest mount lakefs://test-mount/main/local-lakefs/ /var/lib/kubelet/pods/123/volumes/kubernetes.io~csi/everest-pv/mount --log-level=trace --cache-dir=/var/lib/kubelet/pods/123/volumes/kubernetes.io~csi/everest-pv/.everest --lakectl-config=/opt/mountpoint-s3-csi/bin/lakectl.yaml +Unmounts a lakeFS directory. + +```bash +everest umount ``` -See systemd logs of a service: +

`everest diff` (Write Mode Only)

-```sh -journalctl -f -u +Shows the difference between the local mount directory and the source branch. -# example: -journalctl -f -u everest-lakefs-mount-0.0.8-everest-123.service +```bash +everest diff [mount_directory] ``` +

`everest commit` (Write Mode Only)

+ +Commits local changes to the source lakeFS branch. The new commit is merged to the original branch using a `source-wins` strategy. After the commit succeeds, the mounted directory's source commit is updated to the new HEAD of the branch. + +!!! warning + Writes to a mount directory during a commit operation may be lost. + +```bash +everest commit [mount_directory] -m +``` + +

`everest mount-server` (Advanced)

+ +Starts the mount server without performing the OS-level mount. This is intended for advanced use cases where you want to manage the server process and the OS mount command separately. + +```bash +everest mount-server [flags] +``` + +**Flags:** + +- `--cache-dir`: Directory to cache read files and metadata. +- `--cache-create-provided-dir`: Create the cache directory if it does not exist. +- `--listen`: Address to listen on. +- `--protocol`: Protocol to use (nfs | webdav). +- `--callback-addr`: Callback address to report back to. +- `--log-level`: Set logging level. +- `--log-format`: Set logging output format. +- `--log-output`: Set logging output(s). +- `--cache-size`: Size of the local cache in bytes. +- `--parallelism`: Number of parallel downloads for metadata. +- `--presign`: Use presign for downloading. +- `--write-mode`: Enable write mode (default: false). + +--- -## FAQs +## Advanced Topics -

How do I get started with lakeFS Mount (Everest)?

+### Write Mode Limitations -lakeFS Mount is available for lakeFS Cloud and lakeFS Enterprise customers. Once your setup is complete, [contact us](http://info.lakefs.io/thanks-lakefs-mounts) to access the lakeFS Mounts binary and follow the provided docs. +When using write mode (`--write-mode`), be aware of the following limitations and modified behaviors. For more details on write mode operations, see the [Write-Mode Operations](#write-mode-operations) section. -* Want to try lakeFS Cloud? [Signup](https://lakefs.cloud/register) for a 30-day free trial. -* Interested in lakeFS Enterprise? [Contact sales](https://lakefs.io/contact-sales/) for a 30-day free license. +

Unsupported Operations

-

What operating systems are supported by lakeFS Mount?

+- **Rename:** File and directory rename operations are not supported. +- **Temporary Files:** Temporary files are not supported. +- **Hard/Symbolic Links:** Hard links and symbolic links are not supported. +- **POSIX File Locks:** POSIX file locks (`lockf`) are not supported. +- **POSIX Permissions:** POSIX permissions are not supported. Default permissions are assigned to files and directories. -lakeFS Mount supports Linux and MacOS. Windows support is on the roadmap. +

Modified Behavior

-

How can I control access to my data when using lakeFS Mount?

+- **Metadata Operations:** Modifying file metadata (`chmod`, `chown`, `chgrp`, time attributes) results in a no-op. The file metadata will not be changed. +- **Deletion Implementation:** When calling `remove`, Everest marks a file as a tombstone using [Extended Attributes](https://en.wikipedia.org/wiki/Extended_file_attributes) APIs. +- **Deletion Race Conditions:** Removal is not an atomic operation. Calling `remove` and `open` simultaneously on the same file may result in a race condition where the `open` operation might succeed. +- **Type Reuse Restriction:** A deleted file's name cannot be reused as a directory, and vice-versa. For example, this sequence is not allowed: `touch foo; rm foo; mkdir foo;`. +- **Directory Removal:** Calling `remove` on a directory will fail explicitly with an error. Use appropriate directory removal commands instead. -You can use lakeFS's existing [Role-Based Access Control mechanism](../security/rbac.md), which includes repository and path-level policies. lakeFS Mount translates filesystem operations into lakeFS API operations and authorizes them based on these policies. +

Functionality Limitations

-The minimal RBAC permissions required for mounting a prefix from a lakeFS repository in read-only mode: +- **Empty Directories:** Newly created empty directories will not reflect as directory markers in lakeFS. +- **Path Conflicts:** lakeFS allows having two path keys where one is a "directory" prefix of the other (e.g., both `animals/cat.png` and `animals` as an empty object are valid in lakeFS). However, since a filesystem cannot contain both a file and a directory with the same name, this will lead to undefined behavior depending on the filesystem type. + +### Integration with Git + +It is safe to mount a lakeFS path inside a Git repository. Everest automatically creates a virtual `.gitignore` file in the mount directory. This file instructs Git to ignore all mounted content *except* for a single file: `.everest/source`. + +By committing the `.everest/source` file, which contains the `lakefs://` URI, you ensure that anyone who clones your Git repository and uses Everest will mount the exact same version of the data, making your project fully reproducible. + +!!! tip "Reproducible Data Science Projects" + This feature is particularly useful for data science projects where you want to version both your code (in Git) and your data (in lakeFS). Team members can clone the repository and automatically mount the correct data version. + +--- + +## FAQ + +### How does data access work? Does it stream through the lakeFS server? + +No. By default (`--presign=true`), Everest uses pre-signed URLs to read and write data directly to and from the underlying object store, ensuring high performance. Metadata operations still go through the lakeFS server. + +For more details, see [Performance Considerations](#performance-considerations). + +### What happens if the lakeFS branch is updated after I mount it? + +In read-only mode, your mount points to the commit that was at the HEAD of the branch *at the time of mounting*. It will not reflect subsequent commits to that branch unless you unmount and remount. In write mode, after a successful `commit`, the mount is updated to the new HEAD of the branch. + +### When are files downloaded? + +Everest uses a lazy fetching strategy. Files are only downloaded when their content is accessed (e.g., with `cat`, `open`, or reading in a script). Metadata-only operations like `ls` do not trigger downloads. + +Downloaded files are cached locally for performance. See [Cache Behavior](#cache-behavior) for details on how caching works and how to configure it. + +### What are the RBAC permissions required for mounting? + +You can use lakeFS's [Role-Based Access Control](../security/rbac.md) to manage access. + +**Minimal Read-Only Permissions:** ```json { - "id": "MountPolicy", + "id": "MountReadOnlyPolicy", "statement": [ { - "action": [ - "fs:ReadObject" - ], + "action": ["fs:ReadObject"], "effect": "allow", - "resource": "arn:lakefs:fs:::repository//object//*" + "resource": "arn:lakefs:fs:::repository//object//*" }, { - "action": [ - "fs:ListObjects", - "fs:ReadCommit", - "fs:ReadBranch", - "fs:ReadTag", - "fs:ReadRepository" - ], + "action": ["fs:ListObjects", "fs:ReadCommit", "fs:ReadBranch", "fs:ReadTag", "fs:ReadRepository"], "effect": "allow", - "resource": "arn:lakefs:fs:::repository/" + "resource": "arn:lakefs:fs:::repository/" }, - { - "action": ["fs:ReadConfig"], - "effect": "allow", - "resource": "*" - } + { "action": ["fs:ReadConfig"], "effect": "allow", "resource": "*" } ] } ``` -The minimal RBAC permissions required for mounting a prefix from a lakeFS repository in write mode: +**Minimal Write-Mode Permissions:** ```json { - "id": "MountPolicy", + "id": "MountWritePolicy", "statement": [ { - "action": [ - "fs:ReadObject", - "fs:WriteObject", - "fs:DeleteObject" - ], + "action": ["fs:ReadObject", "fs:WriteObject", "fs:DeleteObject"], "effect": "allow", - "resource": "arn:lakefs:fs:::repository//object//*" + "resource": "arn:lakefs:fs:::repository//object//*" }, { "action": [ - "fs:ListObjects", - "fs:ReadCommit", - "fs:ReadBranch", - "fs:ReadRepository", - "fs:CreateCommit", - "fs:CreateBranch", - "fs:DeleteBranch", - "fs:RevertBranch" + "fs:ListObjects", "fs:ReadCommit", "fs:ReadBranch", "fs:ReadRepository", + "fs:CreateCommit", "fs:CreateBranch", "fs:DeleteBranch", "fs:RevertBranch" ], "effect": "allow", - "resource": "arn:lakefs:fs:::repository/" + "resource": "arn:lakefs:fs:::repository/" }, - { - "action": ["fs:ReadConfig"], - "effect": "allow", - "resource": "*" - } + { "action": ["fs:ReadConfig"], "effect": "allow", "resource": "*" } ] } ``` -

Does data pass through the lakeFS server when using lakeFS Mount?

- -lakeFS Mount leverages pre-signed URLs to read data directly from the underlying object store, meaning data doesn't pass through the lakeFS server. By default, presign is enabled. To disable it, use: - -```shell -everest mount --presign=false -``` - -

What happens if a lakeFS branch is updated after I mount it?

- -lakeFS Mount points to the commit that was the HEAD commit of the branch at the time of mounting. This means the local directory reflects the branch state at the time of mounting and does not update with subsequent branch changes. - -

When are files downloaded to my local environment?

- -lakeFS Mount uses a lazy prefetch strategy. Files are not downloaded at mount time or during operations that only inspect file metadata (e.g., `ls`). Files are downloaded only when commands that require file access (e.g., `cat`) are used. - -

What are the scale limitations of lakeFS Mount, and what are the recommended configurations for dealing with large datasets?

- -When using lakeFS Mount, the volume of data accessed by the local machine influences the scale limitations more than the total size of the dataset under the mounted prefix. This is because lakeFS Mount uses a lazy downloading approach, meaning it only downloads the accessed files. - -
Recommended Configuration
- -Ensure your **cache size** is large enough to accommodate the volume of files being accessed. - -

How does lakeFS Mount integrate with a Git repository?

- -It is perfectly safe to mount a lakeFS path within a Git repository. -lakeFS Mount prevents git from adding mounted objects to the git repository (i.e when running `git add -A`) by adding a virtual `.gitignore` file to the mounted directory. - - -The `.gitignore` file will also instruct Git to ignore all files except `.everest/source` and in its absence, it will try to find a `.everest/source` file in the destination folder, and read the lakeFS URI from there. -Since `.everest/source` is in source control, it will mount the same lakeFS commit every time! - -

I'm already using lakectl local for working with lakeFS data locally, why should I use lakeFS Mount?

- -While both lakectl local and lakeFS Mount enable working with lakeFS data locally, they serve different purposes: - -
Use lakectl local
- -* For enabling lakeFS writes with [lakectl local commit](../reference/cli.md#lakectl-local-commit). -* To integrate seamlessly with [Git](../integrations/git.md). - -
Use lakeFS Mount
- -For local data access, lakeFS Mount offers several benefits over lakectl local: +### Why use lakeFS Mount instead of `lakectl local`? -* **Optimized selective data access**: The lazy prefetch strategy saves storage space and reduces latency by only fetching the required data. -* **Reduced initial latency**: Start working on your data immediately without waiting for downloads. +While both tools work with local data, they serve different needs. Use `lakectl local` for Git-like workflows where you need to pull and push entire directories. Use **lakeFS Mount** when you need **immediate, on-demand access to a large repository without downloading it first**, making it ideal for exploration, training ML models, or any task that benefits from lazy loading.