-
Notifications
You must be signed in to change notification settings - Fork 4.6k
[AMD][ROCm] Improve support of AMD #7448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[AMD][ROCm] Improve support of AMD #7448
Conversation
deepspeed/inference/v2/kernels/cutlass_ops/mixed_gemm/mixed_gemm.cu
Outdated
Show resolved
Hide resolved
5851003
to
1dc6bb7
Compare
@hwchen2017 kindly ask for review after fixed your comments. |
deepspeed/inference/v2/kernels/core_ops/cuda_linear/include/utils_paralleldequant.cuh
Outdated
Show resolved
Hide resolved
09b1953
to
f2dbbb7
Compare
The patch delivers several fixes for building issues for CUDA part of DeepSpeed library. Percentage of passed unit tests improved(tested on RDNA hardware, gfx110x and gfx12x) Before: collected 5298 items / 15 skipped 2773 failed, 862 passed, 1665 skipped, 13 errors After: collected 5851 items / 11 skipped 4187 failed, 1373 passed, 292 skipped, 10 errors Signed-off-by: Artem Kuzmitckii <[email protected]>
Signed-off-by: Artem Kuzmitckii <[email protected]>
part 2 Signed-off-by: Artem Kuzmitckii <[email protected]>
f2dbbb7
to
77a7e06
Compare
Signed-off-by: Artem Kuzmitckii <[email protected]>
45a01df
to
0946828
Compare
Signed-off-by: Artem Kuzmitckii <[email protected]>
@k-artem - is this ready for final review? @hwchen2017 - any remaining review requests? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share the error message you get on AMD GPU and explain why these changes can fix issues? It can help us better understand this PR. Thanks!
deepspeed/inference/v2/kernels/core_ops/cuda_linear/linear_kernels_cuda.cu
Outdated
Show resolved
Hide resolved
Signed-off-by: Artem Kuzmitckii <[email protected]>
Signed-off-by: Artem Kuzmitckii <[email protected]>
hi @hwchen2017 @loadams Apologies for the delay in this PR. I've updated the code according to the last set of comments. Please review the changes. I've enabled bf16 library-wide; however, I've disabled it for |
The patch delivers several fixes for building issues for CUDA part of DeepSpeed library.
Percentage of passed unit tests improved(tested on RDNA hardware, gfx110x and gfx12x) Before:
collected 5298 items / 15 skipped
2773 failed, 862 passed, 1665 skipped, 13 errors
After:
collected 5851 items / 11 skipped
4187 failed, 1373 passed, 292 skipped, 10 errors
Regarding testing of fp_quantizer(DS_BUILD_FP_QUANTIZER) via
tests/unit/ops/fp_quantizer/test_fp_quant.py
, this test depends on QPyTorch which should be patched before run on AMD, please apply Tiiiger/QPyTorch#71