Commit 2210944
Fallback when mlx5dv is not supported. (#1665)
Summary:
This change adds fallback support when mlx5dv (Mellanox device-specific extensions) is not available for RDMA operations. It modifies the queue pair creation logic to conditionally use either extended mlx5dv-based queue pairs (when supported) or standard ibverbs queue pairs (as fallback). The pt_cuda_alloc flag is updated to require mlx5dv support since it's necessary for merging memory segments when using PyTorch's CUDA allocator. The change adds a new `is_extended` parameter to control whether to create extended or standard queue pairs at runtime.
Adds an env variable `MONARCH_RDMA_MLX5DV_DISABLED` to test the new code path on dev machine.
## Changes in Latest Revision
Based on reviewer feedback, the implementation has been updated with a cleaner, configuration-based approach:
**API Changes:**
- Replaced `uint8_t is_extended` parameter with `rdma_qp_type_t` enum in C API
- Added `RdmaQpType` enum to Rust with three variants:
- `Auto`: Auto-detect based on device capabilities (default)
- `Standard`: Force standard ibverbs queue pairs
- `Mlx5dv`: Force mlx5dv extended queue pairs
- Added `qp_type` field to `IbverbsConfig` for explicit QP type control
- C code uses switch statement with proper default case for unknown types
**Architecture:**
- Rust resolves `Auto` mode before calling C (single source of truth for detection)
- C function becomes a pure executor - no capability detection logic
- Removed environment variable approach in favor of configuration
**Testing:**
- Added `setup_with_qp_type()` helper function in test utilities
- Added 4 new unit tests to verify standard QP fallback path:
- `test_rdma_read_into_standard_qp` (CPU-to-CPU)
- `test_rdma_write_from_standard_qp` (CPU-to-CPU)
- `test_rdma_read_into_standard_qp_cuda` (GPU-to-GPU)
- `test_rdma_write_from_standard_qp_cuda` (GPU-to-GPU)
Reviewed By: dstaay-fb
Differential Revision: D855040611 parent a75d8b3 commit 2210944
File tree
7 files changed
+293
-62
lines changed- monarch_rdma/src
- rdmaxcel-sys/src
7 files changed
+293
-62
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
87 | 87 | | |
88 | 88 | | |
89 | 89 | | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
90 | 119 | | |
91 | 120 | | |
92 | 121 | | |
| |||
133 | 162 | | |
134 | 163 | | |
135 | 164 | | |
| 165 | + | |
| 166 | + | |
136 | 167 | | |
137 | 168 | | |
138 | 169 | | |
| |||
160 | 191 | | |
161 | 192 | | |
162 | 193 | | |
| 194 | + | |
163 | 195 | | |
164 | 196 | | |
165 | 197 | | |
| |||
698 | 730 | | |
699 | 731 | | |
700 | 732 | | |
701 | | - | |
702 | | - | |
703 | | - | |
704 | | - | |
705 | | - | |
706 | | - | |
707 | | - | |
708 | | - | |
709 | | - | |
710 | | - | |
711 | | - | |
712 | | - | |
713 | | - | |
| 733 | + | |
714 | 734 | | |
715 | | - | |
| 735 | + | |
716 | 736 | | |
717 | 737 | | |
718 | 738 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
66 | 66 | | |
67 | 67 | | |
68 | 68 | | |
| 69 | + | |
69 | 70 | | |
70 | 71 | | |
71 | 72 | | |
| |||
521 | 522 | | |
522 | 523 | | |
523 | 524 | | |
524 | | - | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
525 | 528 | | |
526 | 529 | | |
527 | 530 | | |
| |||
530 | 533 | | |
531 | 534 | | |
532 | 535 | | |
| 536 | + | |
533 | 537 | | |
534 | 538 | | |
535 | 539 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
48 | 48 | | |
49 | 49 | | |
50 | 50 | | |
| 51 | + | |
51 | 52 | | |
52 | 53 | | |
53 | 54 | | |
| |||
120 | 121 | | |
121 | 122 | | |
122 | 123 | | |
123 | | - | |
| 124 | + | |
124 | 125 | | |
125 | 126 | | |
126 | 127 | | |
| |||
150 | 151 | | |
151 | 152 | | |
152 | 153 | | |
| 154 | + | |
| 155 | + | |
153 | 156 | | |
154 | 157 | | |
155 | 158 | | |
| |||
248 | 251 | | |
249 | 252 | | |
250 | 253 | | |
251 | | - | |
| 254 | + | |
252 | 255 | | |
253 | 256 | | |
254 | 257 | | |
| |||
265 | 268 | | |
266 | 269 | | |
267 | 270 | | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
268 | 276 | | |
269 | 277 | | |
270 | 278 | | |
| |||
420 | 428 | | |
421 | 429 | | |
422 | 430 | | |
423 | | - | |
424 | | - | |
425 | | - | |
426 | | - | |
| 431 | + | |
427 | 432 | | |
428 | 433 | | |
429 | 434 | | |
| |||
529 | 534 | | |
530 | 535 | | |
531 | 536 | | |
| 537 | + | |
| 538 | + | |
532 | 539 | | |
533 | 540 | | |
534 | 541 | | |
| |||
557 | 564 | | |
558 | 565 | | |
559 | 566 | | |
| 567 | + | |
560 | 568 | | |
561 | 569 | | |
562 | 570 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
620 | 620 | | |
621 | 621 | | |
622 | 622 | | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
| 664 | + | |
| 665 | + | |
| 666 | + | |
| 667 | + | |
| 668 | + | |
| 669 | + | |
| 670 | + | |
| 671 | + | |
| 672 | + | |
| 673 | + | |
| 674 | + | |
| 675 | + | |
| 676 | + | |
| 677 | + | |
| 678 | + | |
| 679 | + | |
| 680 | + | |
| 681 | + | |
| 682 | + | |
| 683 | + | |
| 684 | + | |
| 685 | + | |
| 686 | + | |
| 687 | + | |
| 688 | + | |
| 689 | + | |
| 690 | + | |
| 691 | + | |
| 692 | + | |
| 693 | + | |
| 694 | + | |
| 695 | + | |
| 696 | + | |
| 697 | + | |
| 698 | + | |
| 699 | + | |
| 700 | + | |
| 701 | + | |
| 702 | + | |
| 703 | + | |
| 704 | + | |
| 705 | + | |
| 706 | + | |
| 707 | + | |
| 708 | + | |
| 709 | + | |
| 710 | + | |
| 711 | + | |
| 712 | + | |
| 713 | + | |
| 714 | + | |
| 715 | + | |
| 716 | + | |
| 717 | + | |
| 718 | + | |
| 719 | + | |
| 720 | + | |
| 721 | + | |
| 722 | + | |
| 723 | + | |
| 724 | + | |
| 725 | + | |
| 726 | + | |
| 727 | + | |
| 728 | + | |
| 729 | + | |
| 730 | + | |
| 731 | + | |
| 732 | + | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
| 740 | + | |
| 741 | + | |
| 742 | + | |
623 | 743 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
294 | 294 | | |
295 | 295 | | |
296 | 296 | | |
297 | | - | |
| 297 | + | |
298 | 298 | | |
299 | 299 | | |
300 | 300 | | |
| |||
305 | 305 | | |
306 | 306 | | |
307 | 307 | | |
308 | | - | |
| 308 | + | |
| 309 | + | |
309 | 310 | | |
310 | 311 | | |
311 | 312 | | |
| 313 | + | |
312 | 314 | | |
313 | 315 | | |
314 | 316 | | |
315 | 317 | | |
316 | 318 | | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
317 | 323 | | |
318 | 324 | | |
319 | 325 | | |
| |||
537 | 543 | | |
538 | 544 | | |
539 | 545 | | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
540 | 570 | | |
541 | 571 | | |
542 | 572 | | |
| |||
0 commit comments