-
Notifications
You must be signed in to change notification settings - Fork 412
docs: everest mount page refresh #9601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
📚 Documentation preview at https://pr-9601.docs-lakefs-preview.io/ (Updated: 10/30/2025, 10:11:18 AM - Commit: 8075bcb) |
|
@talSofer updated the documentation structure - let me know if it is better now/ |
talSofer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for improving the docs!!
I suggested multiple changes to structure, let me know what you think
| ### OS and Protocol Support | ||
| - **Simplified Data Loading**: Use your existing tools to read and write files directly from the filesystem with no need for custom data loaders or SDKs. | ||
| - **Seamless Scalability**: Scale from a few local files to billions without changing your tools or workflow. Use the same code from experimentation to production. | ||
| - **Enhanced Performance**: Everest supports billions of files and offers fast, lazy data fetching, making it ideal for optimizing GPU utilization and other performance-sensitive tasks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use case title sounds like a technical benefit of mount. But there is a use case for performant data loading which I believe should be highlighted instead. WDYT?
|
|
||
| ### OS and Protocol Support | ||
| - **Simplified Data Loading**: Use your existing tools to read and write files directly from the filesystem with no need for custom data loaders or SDKs. | ||
| - **Seamless Scalability**: Scale from a few local files to billions without changing your tools or workflow. Use the same code from experimentation to production. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the use case be called "workflow scalability"? If you can describe what scales with lakeFS mount it will add clarity.
docs/src/reference/mount.md
Outdated
| After completing this getting started guide, we recommend reading the [Core Concepts](#core-concepts) section to understand caching, consistency, and performance characteristics. | ||
|
|
||
| ## Authenticate with lakeFS Credentials | ||
| ### 1. Prerequisites |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit; I would remove the numbers, it reduces clarity
docs/src/reference/mount.md
Outdated
| 3. **Configuration File:** `~/.lakectl.yaml` (or the file specified by `--lakectl-config`). | ||
|
|
||
| ### Prerequisites | ||
| #### Authentication Methods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep this heading outside the toc?
docs/src/reference/mount.md
Outdated
| If you choose to configure IAM provider using the same lakectl file (i.e `lakectl.yaml`) that you use for the **lakectl cli**, | ||
| you must upgrade lakectl to version (`≥ v1.57.0`) otherwise lakectl will raise errors when using it. | ||
|
|
||
| ### 3. Your First Mount (Read-Only) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ### 3. Your First Mount (Read-Only) | |
| ### Create Your First Mount |
docs/src/reference/mount.md
Outdated
| ### Consistency & Data Behavior | ||
|
|
||
| ### Commit Command (write-mode only) | ||
| Understanding how Everest handles data consistency is crucial for working effectively with mounted lakeFS repositories. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove this line
docs/src/reference/mount.md
Outdated
| - **Security Context:** Setting Pod `securityContext` (e.g., `runAsUser`) is not currently supported. | ||
|
|
||
| **Helm Chart default values:** | ||
| ### 1. Prerequisites |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's confusing to have this prerequisites section after we have a general prerequisites section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prerequisites at this level is part of the CSI driver
| --- | ||
|
|
||
| ## Authentication Chain for lakeFS | ||
| ## Getting Started |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC the getting started part is only relevant to local mounts (as opposed to CSI mounts).
I think that it will be easier to follow the docs if we:
- Change the overall outline (see suggested outline below)
- Exclude headings 4+ from the toc
Suggested outline:
- Use Cases
- Core Concepts
- Cache Behavior
- Consistency & Data Behavior
- Performance Considerations
- Mount a local filesystem or Working with local data (whatever works better)
- Getting Started
- Prerequisites
- Authentication & Configuration
- Create Your First Mount
- Mount Modes
- Read-Only
- Write
- Getting Started
- Mount on Kubernetes (CSI Driver)
- How it Works
- Getting Started
- Prerequisites
- Deploy the CSI Driver
- Use in Pods
- Troubleshooting
- Limitations
- Command-Line Reference
- Advanced Topics
- Write Mode Limitations
- Integration with Git
- FAQ
WDYT?
docs/src/reference/mount.md
Outdated
| * **Optimized selective data access**: The lazy prefetch strategy saves storage space and reduces latency by only fetching the required data. | ||
| * **Reduced initial latency**: Start working on your data immediately without waiting for downloads. | ||
| While both tools work with local data, they serve different needs. Use `lakectl local` for Git-like workflows where you need to pull and push entire directories. Use **lakeFS Mount** for cases where you want immediate, on-demand access to a large repository without downloading it first, making it ideal for exploration, training ML models, or any task that benefits from lazy loading. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I would not highlight any advantages of lakectl local, because mount can do anything it does. I would say that mount enables anything lakectl local enables plus all the advantages you mentioned here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update - don't think we say that lakectl is better, as we wrote that it will download all the files. try to emphasis the part where mount can be use for cases you like to have fast and transparent work for ml.
yonipeleg33
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only reviewed cache-related parts - LGTM, thanks!
| **Benefits of persistent cache:** | ||
| The `umount` command is used to unmount a currently mounted lakeFS repository. | ||
| - Faster startup times when remounting the same data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify - Is this referring to downloading metadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is relevant for both. if we remount using the same cache directory, all the data we accessed should be already available in the cache.
| - **Commit-Based Caching**: Each commit ID has its own cache namespace. This ensures that cached data always corresponds to the correct version of your files. | ||
| - **Cache Invalidation on Commit**: When you commit changes in write mode using `everest commit`, the mount point's source commit ID is updated to the new HEAD of the branch. As a result, the cache associated with the old commit ID is no longer used, and new data will be cached under the new commit ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently true, but might not be in the foreseeable future (we might want to share cached objects across commits) - so just remember to update the docs accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure - each release we will need to update the documentation to reflect the user whats running on their machine.
Co-authored-by: talSofer <[email protected]>
Co-authored-by: talSofer <[email protected]>
|
@talSofer addressed some of the feedback, I would like to do it incremental and enable other updates for the Everest for Windows before address more layout changes. |
Will open a new PR to address the rest of the comments.
Uh oh!
There was an error while loading. Please reload this page.