-
Notifications
You must be signed in to change notification settings - Fork 103
Fallback when mlx5dv is not supported. #1665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
953ecfc to
7fa6733
Compare
casteryh
added a commit
to casteryh/monarch
that referenced
this pull request
Oct 27, 2025
Summary: This change adds fallback support when mlx5dv (Mellanox device-specific extensions) is not available for RDMA operations. It modifies the queue pair creation logic to conditionally use either extended mlx5dv-based queue pairs (when supported) or standard ibverbs queue pairs (as fallback). The pt_cuda_alloc flag is updated to require mlx5dv support since it's necessary for merging memory segments when using PyTorch's CUDA allocator. The change adds a new `is_extended` parameter to control whether to create extended or standard queue pairs at runtime. Adds an env variable `MONARCH_RDMA_MLX5DV_DISABLED` to test the new code path on dev machine. Differential Revision: D85504061
7fa6733 to
c3bcb7a
Compare
casteryh
added a commit
to casteryh/monarch
that referenced
this pull request
Oct 28, 2025
Summary: This change adds fallback support when mlx5dv (Mellanox device-specific extensions) is not available for RDMA operations. It modifies the queue pair creation logic to conditionally use either extended mlx5dv-based queue pairs (when supported) or standard ibverbs queue pairs (as fallback). The pt_cuda_alloc flag is updated to require mlx5dv support since it's necessary for merging memory segments when using PyTorch's CUDA allocator. The change adds a new `is_extended` parameter to control whether to create extended or standard queue pairs at runtime. Adds an env variable `MONARCH_RDMA_MLX5DV_DISABLED` to test the new code path on dev machine. Differential Revision: D85504061
c3bcb7a to
cebd0b8
Compare
casteryh
added a commit
to casteryh/monarch
that referenced
this pull request
Oct 29, 2025
Summary:
This change adds fallback support when mlx5dv (Mellanox device-specific extensions) is not available for RDMA operations. It modifies the queue pair creation logic to conditionally use either extended mlx5dv-based queue pairs (when supported) or standard ibverbs queue pairs (as fallback). The pt_cuda_alloc flag is updated to require mlx5dv support since it's necessary for merging memory segments when using PyTorch's CUDA allocator. The change adds a new `is_extended` parameter to control whether to create extended or standard queue pairs at runtime.
Adds an env variable `MONARCH_RDMA_MLX5DV_DISABLED` to test the new code path on dev machine.
## Changes in Latest Revision
Based on reviewer feedback, the implementation has been updated with a cleaner, configuration-based approach:
**API Changes:**
- Replaced `uint8_t is_extended` parameter with `rdma_qp_type_t` enum in C API
- Added `RdmaQpType` enum to Rust with three variants:
- `Auto`: Auto-detect based on device capabilities (default)
- `Standard`: Force standard ibverbs queue pairs
- `Mlx5dv`: Force mlx5dv extended queue pairs
- Added `qp_type` field to `IbverbsConfig` for explicit QP type control
- C code uses switch statement with proper default case for unknown types
**Architecture:**
- Rust resolves `Auto` mode before calling C (single source of truth for detection)
- C function becomes a pure executor - no capability detection logic
- Removed environment variable approach in favor of configuration
**Testing:**
- Added `setup_with_qp_type()` helper function in test utilities
- Added 4 new unit tests to verify standard QP fallback path:
- `test_rdma_read_into_standard_qp` (CPU-to-CPU)
- `test_rdma_write_from_standard_qp` (CPU-to-CPU)
- `test_rdma_read_into_standard_qp_cuda` (GPU-to-GPU)
- `test_rdma_write_from_standard_qp_cuda` (GPU-to-GPU)
Reviewed By: dstaay-fb
Differential Revision: D85504061
casteryh
added a commit
to casteryh/monarch
that referenced
this pull request
Oct 29, 2025
Summary:
This change adds fallback support when mlx5dv (Mellanox device-specific extensions) is not available for RDMA operations. It modifies the queue pair creation logic to conditionally use either extended mlx5dv-based queue pairs (when supported) or standard ibverbs queue pairs (as fallback). The pt_cuda_alloc flag is updated to require mlx5dv support since it's necessary for merging memory segments when using PyTorch's CUDA allocator. The change adds a new `is_extended` parameter to control whether to create extended or standard queue pairs at runtime.
Adds an env variable `MONARCH_RDMA_MLX5DV_DISABLED` to test the new code path on dev machine.
## Changes in Latest Revision
Based on reviewer feedback, the implementation has been updated with a cleaner, configuration-based approach:
**API Changes:**
- Replaced `uint8_t is_extended` parameter with `rdma_qp_type_t` enum in C API
- Added `RdmaQpType` enum to Rust with three variants:
- `Auto`: Auto-detect based on device capabilities (default)
- `Standard`: Force standard ibverbs queue pairs
- `Mlx5dv`: Force mlx5dv extended queue pairs
- Added `qp_type` field to `IbverbsConfig` for explicit QP type control
- C code uses switch statement with proper default case for unknown types
**Architecture:**
- Rust resolves `Auto` mode before calling C (single source of truth for detection)
- C function becomes a pure executor - no capability detection logic
- Removed environment variable approach in favor of configuration
**Testing:**
- Added `setup_with_qp_type()` helper function in test utilities
- Added 4 new unit tests to verify standard QP fallback path:
- `test_rdma_read_into_standard_qp` (CPU-to-CPU)
- `test_rdma_write_from_standard_qp` (CPU-to-CPU)
- `test_rdma_read_into_standard_qp_cuda` (GPU-to-GPU)
- `test_rdma_write_from_standard_qp_cuda` (GPU-to-GPU)
Reviewed By: dstaay-fb
Differential Revision: D85504061
cebd0b8 to
2210944
Compare
casteryh
added a commit
to casteryh/monarch
that referenced
this pull request
Oct 30, 2025
Summary:
This change adds fallback support when mlx5dv (Mellanox device-specific extensions) is not available for RDMA operations. It modifies the queue pair creation logic to conditionally use either extended mlx5dv-based queue pairs (when supported) or standard ibverbs queue pairs (as fallback). The pt_cuda_alloc flag is updated to require mlx5dv support since it's necessary for merging memory segments when using PyTorch's CUDA allocator. The change adds a new `is_extended` parameter to control whether to create extended or standard queue pairs at runtime.
Adds an env variable `MONARCH_RDMA_MLX5DV_DISABLED` to test the new code path on dev machine.
## Changes in Latest Revision
Based on reviewer feedback, the implementation has been updated with a cleaner, configuration-based approach:
**API Changes:**
- Replaced `uint8_t is_extended` parameter with `rdma_qp_type_t` enum in C API
- Added `RdmaQpType` enum to Rust with three variants:
- `Auto`: Auto-detect based on device capabilities (default)
- `Standard`: Force standard ibverbs queue pairs
- `Mlx5dv`: Force mlx5dv extended queue pairs
- Added `qp_type` field to `IbverbsConfig` for explicit QP type control
- C code uses switch statement with proper default case for unknown types
**Architecture:**
- Rust resolves `Auto` mode before calling C (single source of truth for detection)
- C function becomes a pure executor - no capability detection logic
- Removed environment variable approach in favor of configuration
**Testing:**
- Added `setup_with_qp_type()` helper function in test utilities
- Added 4 new unit tests to verify standard QP fallback path:
- `test_rdma_read_into_standard_qp` (CPU-to-CPU)
- `test_rdma_write_from_standard_qp` (CPU-to-CPU)
- `test_rdma_read_into_standard_qp_cuda` (GPU-to-GPU)
- `test_rdma_write_from_standard_qp_cuda` (GPU-to-GPU)
Reviewed By: dstaay-fb
Differential Revision: D85504061
2210944 to
907a9d3
Compare
Summary:
This change adds fallback support when mlx5dv (Mellanox device-specific extensions) is not available for RDMA operations. It modifies the queue pair creation logic to conditionally use either extended mlx5dv-based queue pairs (when supported) or standard ibverbs queue pairs (as fallback). The pt_cuda_alloc flag is updated to require mlx5dv support since it's necessary for merging memory segments when using PyTorch's CUDA allocator. The change adds a new `is_extended` parameter to control whether to create extended or standard queue pairs at runtime.
Adds an env variable `MONARCH_RDMA_MLX5DV_DISABLED` to test the new code path on dev machine.
## Changes in Latest Revision
Based on reviewer feedback, the implementation has been updated with a cleaner, configuration-based approach:
**API Changes:**
- Replaced `uint8_t is_extended` parameter with `rdma_qp_type_t` enum in C API
- Added `RdmaQpType` enum to Rust with three variants:
- `Auto`: Auto-detect based on device capabilities (default)
- `Standard`: Force standard ibverbs queue pairs
- `Mlx5dv`: Force mlx5dv extended queue pairs
- Added `qp_type` field to `IbverbsConfig` for explicit QP type control
- C code uses switch statement with proper default case for unknown types
**Architecture:**
- Rust resolves `Auto` mode before calling C (single source of truth for detection)
- C function becomes a pure executor - no capability detection logic
- Removed environment variable approach in favor of configuration
**Testing:**
- Added `setup_with_qp_type()` helper function in test utilities
- Added 4 new unit tests to verify standard QP fallback path:
- `test_rdma_read_into_standard_qp` (CPU-to-CPU)
- `test_rdma_write_from_standard_qp` (CPU-to-CPU)
- `test_rdma_read_into_standard_qp_cuda` (GPU-to-GPU)
- `test_rdma_write_from_standard_qp_cuda` (GPU-to-GPU)
Reviewed By: dstaay-fb
Differential Revision: D85504061
907a9d3 to
9828943
Compare
|
This pull request has been merged in 434e447. |
AlirezaShamsoshoara
pushed a commit
to AlirezaShamsoshoara/monarch
that referenced
this pull request
Oct 30, 2025
Summary: Pull Request resolved: meta-pytorch#1665 This change adds fallback support when mlx5dv (Mellanox device-specific extensions) is not available for RDMA operations. It modifies the queue pair creation logic to conditionally use either extended mlx5dv-based queue pairs (when supported) or standard ibverbs queue pairs (as fallback). The pt_cuda_alloc flag is updated to require mlx5dv support since it's necessary for merging memory segments when using PyTorch's CUDA allocator. The change adds a new `is_extended` parameter to control whether to create extended or standard queue pairs at runtime. Adds an env variable `MONARCH_RDMA_MLX5DV_DISABLED` to test the new code path on dev machine. ## Changes in Latest Revision Based on reviewer feedback, the implementation has been updated with a cleaner, configuration-based approach: **API Changes:** - Replaced `uint8_t is_extended` parameter with `rdma_qp_type_t` enum in C API - Added `RdmaQpType` enum to Rust with three variants: - `Auto`: Auto-detect based on device capabilities (default) - `Standard`: Force standard ibverbs queue pairs - `Mlx5dv`: Force mlx5dv extended queue pairs - Added `qp_type` field to `IbverbsConfig` for explicit QP type control - C code uses switch statement with proper default case for unknown types **Architecture:** - Rust resolves `Auto` mode before calling C (single source of truth for detection) - C function becomes a pure executor - no capability detection logic - Removed environment variable approach in favor of configuration **Testing:** - Added `setup_with_qp_type()` helper function in test utilities - Added 4 new unit tests to verify standard QP fallback path: - `test_rdma_read_into_standard_qp` (CPU-to-CPU) - `test_rdma_write_from_standard_qp` (CPU-to-CPU) - `test_rdma_read_into_standard_qp_cuda` (GPU-to-GPU) - `test_rdma_write_from_standard_qp_cuda` (GPU-to-GPU) Reviewed By: dstaay-fb Differential Revision: D85504061 fbshipit-source-id: a54466a309ff086eae96a63f7edf994655664826
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
This change adds fallback support when mlx5dv (Mellanox device-specific extensions) is not available for RDMA operations. It modifies the queue pair creation logic to conditionally use either extended mlx5dv-based queue pairs (when supported) or standard ibverbs queue pairs (as fallback). The pt_cuda_alloc flag is updated to require mlx5dv support since it's necessary for merging memory segments when using PyTorch's CUDA allocator. The change adds a new
is_extendedparameter to control whether to create extended or standard queue pairs at runtime.Adds an env variable
MONARCH_RDMA_MLX5DV_DISABLEDto test the new code path on dev machine.Changes in Latest Revision
Based on reviewer feedback, the implementation has been updated with a cleaner, configuration-based approach:
API Changes:
uint8_t is_extendedparameter withrdma_qp_type_tenum in C APIRdmaQpTypeenum to Rust with three variants:Auto: Auto-detect based on device capabilities (default)Standard: Force standard ibverbs queue pairsMlx5dv: Force mlx5dv extended queue pairsqp_typefield toIbverbsConfigfor explicit QP type controlArchitecture:
Automode before calling C (single source of truth for detection)Testing:
setup_with_qp_type()helper function in test utilitiestest_rdma_read_into_standard_qp(CPU-to-CPU)test_rdma_write_from_standard_qp(CPU-to-CPU)test_rdma_read_into_standard_qp_cuda(GPU-to-GPU)test_rdma_write_from_standard_qp_cuda(GPU-to-GPU)Differential Revision: D85504061