-
Notifications
You must be signed in to change notification settings - Fork 255
Disable igemm bwd v1r1 on gfx942 #3682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -645,6 +645,9 @@ bool ConvHipImplicitGemmBwdDataV1R1::IsApplicable(const ExecutionContext& ctx, | |||
return false; | |||
if(!ctx.use_hip_kernels) | |||
return false; | |||
const std::string name = ctx.GetStream().GetDeviceName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good, but need to repeat with 5 more
ConvHipImplicitGemmBwdDataV1R1
ConvHipImplicitGemmBwdDataV4R1
ConvHipImplicitGemmV4R1Fwd
ConvHipImplicitGemmV4R1WrW
ConvHipImplicitGemmV4R4Fwd
ConvHipImplicitGemmV4R4WrW
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Kamil! I included those, and I'm running lots of additional tests in search of any more.
if(name == "gfx942" || name == "gfx950") | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to explicitly enable this solver for some gpus, otherwise we'll have to keep updating the list for every new device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. The primary goal of this was to give Adam ideas for temporary workarounds to get his stuff working. The proper fixes are obviously much more involved , and I expect this will never go in. I'll leave this up for reference for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes have a temporary feel to them, so I shall not be making this extra change at this time.
Fixing these test crashes was determined to be necessary for ROCm 6.5. Time does not allow correcting the kernels in question, so the solvers are being disabled. Note: all of the failures were with FP32 and this change causes some other types to be disabled unnecessarily. This issue will be discussed among the team and shall not be addressed in this PR nor planned at this time. |
if(problem.IsFp32() && (name == "gfx942" || name == "gfx950")) | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this solver only an issue for FP32?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are actually a few type-specific issues in these solvers. I originally had them all properly flagged, but then I couldn't find a quick way to replicate it in the tests and removed them. This one got missed. I'll clean it up.
Basically :
May be add this to description too. LGTM |
Did not respond in a timely fashion
This reverts commit 8a81e7a.
This reverts commit 8a81e7a.
Kernel contains asm instructions which do not exist on these asics. Disables the following solvers and the associated tests:
ConvHipImplicitGemmBwdDataV1R1
ConvHipImplicitGemmBwdDataV4R1
ConvHipImplicitGemmV4R1Fwd
ConvHipImplicitGemmV4R1WrW
ConvHipImplicitGemmV4R4Fwd
ConvHipImplicitGemmV4R4WrW