Commit 606449f
FP8 Grouped Gemm Optimization
Summary:
X-link: facebookresearch/FBGEMM#731
While optimizing MOE, we found that small overheads were a major bottleneck for grouped gemm performance. This diff tackles a few of them, specifically overhead from torch.dynamo wrapping `quantize_fp8_row` and having to slice input tensors before calling `f8f8bf16_rowwise_grouped`.
To fix the former, we enable `triton_quantize_fp8_row` to be directly called, skipping dynamo compatibility. In cases where AOTI isnt needed, this removes a bit of overhead.
To fix the latter, we templatize f8f8fbf16_rowwise_grouped_dynamic to accept at::Tensor instead of lists. We introduce a new wrapper called f8f8bf16_rowwise_grouped_stacked to maintain the behavior where zero_start_index_M isnt provided but a user wants a single contiguous output tensor.
In microbenchmarks, we've found these seemingly small changes can improve TFLOPs by 2X for small workloads.
Reviewed By: jiawenliu64
Differential Revision: D690725291 parent 00c43b4 commit 606449f
File tree
5 files changed
+200
-107
lines changed- fbgemm_gpu/experimental
- gemm/triton_gemm
- gen_ai
- bench
- src/quantize
- cutlass_extensions
- test/quantize
5 files changed
+200
-107
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2447 | 2447 | | |
2448 | 2448 | | |
2449 | 2449 | | |
| 2450 | + | |
| 2451 | + | |
| 2452 | + | |
| 2453 | + | |
| 2454 | + | |
| 2455 | + | |
| 2456 | + | |
2450 | 2457 | | |
2451 | 2458 | | |
2452 | 2459 | | |
| |||
2484 | 2491 | | |
2485 | 2492 | | |
2486 | 2493 | | |
2487 | | - | |
| 2494 | + | |
2488 | 2495 | | |
2489 | 2496 | | |
2490 | 2497 | | |
| |||
2514 | 2521 | | |
2515 | 2522 | | |
2516 | 2523 | | |
2517 | | - | |
2518 | | - | |
2519 | | - | |
2520 | | - | |
2521 | | - | |
2522 | | - | |
2523 | | - | |
2524 | | - | |
2525 | | - | |
2526 | | - | |
2527 | | - | |
| 2524 | + | |
2528 | 2525 | | |
2529 | 2526 | | |
2530 | 2527 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
541 | 541 | | |
542 | 542 | | |
543 | 543 | | |
544 | | - | |
545 | | - | |
546 | | - | |
547 | | - | |
548 | | - | |
549 | 544 | | |
550 | | - | |
551 | | - | |
552 | | - | |
553 | | - | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
554 | 549 | | |
555 | 550 | | |
556 | 551 | | |
| |||
0 commit comments