Skip to content

Conversation

@nopcoder
Copy link
Contributor

@nopcoder nopcoder commented Oct 25, 2025

  • improved structure of mount documentation
  • more information about how everest mount cache works

@github-actions
Copy link

github-actions bot commented Oct 25, 2025

📚 Documentation preview at https://pr-9601.docs-lakefs-preview.io/

(Updated: 10/30/2025, 10:11:18 AM - Commit: 8075bcb)

@nopcoder nopcoder self-assigned this Oct 25, 2025
@nopcoder nopcoder added docs Improvements or additions to documentation exclude-changelog PR description should not be included in next release changelog minor-change Used for PRs that don't require issue attached labels Oct 25, 2025
@nopcoder
Copy link
Contributor Author

@talSofer updated the documentation structure - let me know if it is better now/
@yonipeleg33 added information on how everest v1 cache works

@nopcoder nopcoder marked this pull request as ready for review October 26, 2025 09:37
Copy link
Contributor

@talSofer talSofer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for improving the docs!!

I suggested multiple changes to structure, let me know what you think

### OS and Protocol Support
- **Simplified Data Loading**: Use your existing tools to read and write files directly from the filesystem with no need for custom data loaders or SDKs.
- **Seamless Scalability**: Scale from a few local files to billions without changing your tools or workflow. Use the same code from experimentation to production.
- **Enhanced Performance**: Everest supports billions of files and offers fast, lazy data fetching, making it ideal for optimizing GPU utilization and other performance-sensitive tasks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use case title sounds like a technical benefit of mount. But there is a use case for performant data loading which I believe should be highlighted instead. WDYT?


### OS and Protocol Support
- **Simplified Data Loading**: Use your existing tools to read and write files directly from the filesystem with no need for custom data loaders or SDKs.
- **Seamless Scalability**: Scale from a few local files to billions without changing your tools or workflow. Use the same code from experimentation to production.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the use case be called "workflow scalability"? If you can describe what scales with lakeFS mount it will add clarity.

After completing this getting started guide, we recommend reading the [Core Concepts](#core-concepts) section to understand caching, consistency, and performance characteristics.

## Authenticate with lakeFS Credentials
### 1. Prerequisites
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; I would remove the numbers, it reduces clarity

3. **Configuration File:** `~/.lakectl.yaml` (or the file specified by `--lakectl-config`).

### Prerequisites
#### Authentication Methods
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep this heading outside the toc?

If you choose to configure IAM provider using the same lakectl file (i.e `lakectl.yaml`) that you use for the **lakectl cli**,
you must upgrade lakectl to version (`≥ v1.57.0`) otherwise lakectl will raise errors when using it.

### 3. Your First Mount (Read-Only)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### 3. Your First Mount (Read-Only)
### Create Your First Mount

### Consistency & Data Behavior

### Commit Command (write-mode only)
Understanding how Everest handles data consistency is crucial for working effectively with mounted lakeFS repositories.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this line

- **Security Context:** Setting Pod `securityContext` (e.g., `runAsUser`) is not currently supported.

**Helm Chart default values:**
### 1. Prerequisites
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's confusing to have this prerequisites section after we have a general prerequisites section

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prerequisites at this level is part of the CSI driver

---

## Authentication Chain for lakeFS
## Getting Started
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the getting started part is only relevant to local mounts (as opposed to CSI mounts).
I think that it will be easier to follow the docs if we:

  1. Change the overall outline (see suggested outline below)
  2. Exclude headings 4+ from the toc

Suggested outline:

  • Use Cases
  • Core Concepts
    • Cache Behavior
    • Consistency & Data Behavior
    • Performance Considerations
  • Mount a local filesystem or Working with local data (whatever works better)
    • Getting Started
      • Prerequisites
      • Authentication & Configuration
      • Create Your First Mount
    • Mount Modes
      • Read-Only
      • Write
  • Mount on Kubernetes (CSI Driver)
    • How it Works
    • Getting Started
      • Prerequisites
      • Deploy the CSI Driver
      • Use in Pods
      • Troubleshooting
    • Limitations
  • Command-Line Reference
  • Advanced Topics
    • Write Mode Limitations
    • Integration with Git
  • FAQ

WDYT?

* **Optimized selective data access**: The lazy prefetch strategy saves storage space and reduces latency by only fetching the required data.
* **Reduced initial latency**: Start working on your data immediately without waiting for downloads.
While both tools work with local data, they serve different needs. Use `lakectl local` for Git-like workflows where you need to pull and push entire directories. Use **lakeFS Mount** for cases where you want immediate, on-demand access to a large repository without downloading it first, making it ideal for exploration, training ML models, or any task that benefits from lazy loading.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I would not highlight any advantages of lakectl local, because mount can do anything it does. I would say that mount enables anything lakectl local enables plus all the advantages you mentioned here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update - don't think we say that lakectl is better, as we wrote that it will download all the files. try to emphasis the part where mount can be use for cases you like to have fast and transparent work for ml.

Copy link
Contributor

@yonipeleg33 yonipeleg33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only reviewed cache-related parts - LGTM, thanks!

**Benefits of persistent cache:**
The `umount` command is used to unmount a currently mounted lakeFS repository.
- Faster startup times when remounting the same data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify - Is this referring to downloading metadata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is relevant for both. if we remount using the same cache directory, all the data we accessed should be already available in the cache.

Comment on lines +147 to +148
- **Commit-Based Caching**: Each commit ID has its own cache namespace. This ensures that cached data always corresponds to the correct version of your files.
- **Cache Invalidation on Commit**: When you commit changes in write mode using `everest commit`, the mount point's source commit ID is updated to the new HEAD of the branch. As a result, the cache associated with the old commit ID is no longer used, and new data will be cached under the new commit ID.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently true, but might not be in the foreseeable future (we might want to share cached objects across commits) - so just remember to update the docs accordingly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure - each release we will need to update the documentation to reflect the user whats running on their machine.

@nopcoder nopcoder requested a review from talSofer October 30, 2025 09:56
@nopcoder
Copy link
Contributor Author

@talSofer addressed some of the feedback, I would like to do it incremental and enable other updates for the Everest for Windows before address more layout changes.

@nopcoder nopcoder requested review from talSofer and removed request for talSofer October 30, 2025 09:58
@nopcoder nopcoder dismissed talSofer’s stale review October 30, 2025 09:59

Will open a new PR to address the rest of the comments.

@nopcoder nopcoder enabled auto-merge (squash) October 30, 2025 09:59
@nopcoder nopcoder merged commit 4a26109 into master Oct 30, 2025
41 checks passed
@nopcoder nopcoder deleted the docs/mount-refresh branch October 30, 2025 10:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Improvements or additions to documentation exclude-changelog PR description should not be included in next release changelog minor-change Used for PRs that don't require issue attached

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants