Commit 08fcc98
FP8 Grouped Gemm Optimization (#3655)
Summary:
Pull Request resolved: #3655
X-link: facebookresearch/FBGEMM#731
While optimizing MOE, we found that small overheads were a major bottleneck for grouped gemm performance. This diff tackles a few of them, specifically overhead from torch.dynamo wrapping `quantize_fp8_row` and having to slice input tensors before calling `f8f8bf16_rowwise_grouped`.
To fix the former, we enable `triton_quantize_fp8_row` to be directly called, skipping dynamo compatibility. In cases where AOTI isnt needed, this removes a bit of overhead.
To fix the latter, we templatize f8f8fbf16_rowwise_grouped_dynamic to accept at::Tensor instead of lists. We introduce a new wrapper called f8f8bf16_rowwise_grouped_stacked to maintain the behavior where zero_start_index_M isnt provided but a user wants a single contiguous output tensor.
In microbenchmarks, we've found these seemingly small changes can improve TFLOPs by 2X for small workloads.
Reviewed By: jiawenliu64
Differential Revision: D690725291 parent dced756 commit 08fcc98
File tree
79 files changed
+6719
-4713
lines changed- fbgemm_gpu/experimental
- gemm/triton_gemm
- gen_ai
- bench
- src/quantize
- ck_extensions/fp8_rowwise_grouped
- kernels
- cutlass_extensions
- test/quantize
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
79 files changed
+6719
-4713
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2447 | 2447 | | |
2448 | 2448 | | |
2449 | 2449 | | |
| 2450 | + | |
| 2451 | + | |
| 2452 | + | |
| 2453 | + | |
| 2454 | + | |
| 2455 | + | |
| 2456 | + | |
2450 | 2457 | | |
2451 | 2458 | | |
2452 | 2459 | | |
| |||
2484 | 2491 | | |
2485 | 2492 | | |
2486 | 2493 | | |
2487 | | - | |
| 2494 | + | |
2488 | 2495 | | |
2489 | 2496 | | |
2490 | 2497 | | |
| |||
2514 | 2521 | | |
2515 | 2522 | | |
2516 | 2523 | | |
2517 | | - | |
2518 | | - | |
2519 | | - | |
2520 | | - | |
2521 | | - | |
2522 | | - | |
2523 | | - | |
2524 | | - | |
2525 | | - | |
2526 | | - | |
2527 | | - | |
| 2524 | + | |
2528 | 2525 | | |
2529 | 2526 | | |
2530 | 2527 | | |
| |||
Lines changed: 0 additions & 103 deletions
This file was deleted.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
| |||
553 | 554 | | |
554 | 555 | | |
555 | 556 | | |
556 | | - | |
| 557 | + | |
557 | 558 | | |
558 | 559 | | |
559 | 560 | | |
560 | 561 | | |
561 | | - | |
| 562 | + | |
562 | 563 | | |
563 | 564 | | |
564 | 565 | | |
565 | | - | |
| 566 | + | |
566 | 567 | | |
567 | 568 | | |
568 | 569 | | |
569 | 570 | | |
570 | 571 | | |
571 | 572 | | |
572 | | - | |
573 | 573 | | |
574 | 574 | | |
575 | 575 | | |
576 | | - | |
577 | | - | |
578 | | - | |
579 | | - | |
580 | | - | |
581 | 576 | | |
582 | | - | |
583 | | - | |
584 | | - | |
585 | | - | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
586 | 581 | | |
587 | | - | |
588 | 582 | | |
589 | 583 | | |
590 | 584 | | |
| |||
0 commit comments