-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable algo caching in ROCM EP #19567
base: main
Are you sure you want to change the base?
Conversation
Similar to the work done by Liangxijun-1001 in apache/tvm#16178 the ROCM spec mandates calling miopenFindConvolution*Algorithm() before using any Convolution API This is the link to the porting guide describing this requirement https://rocmdocs.amd.com/projects/MIOpen/en/latest/MIOpen_Porting_Guide.html Thus, this change disables the algo cache and enforces the official API semantics Signed-off-by: David Nieto <[email protected]>
My intuition is that algo caching can be used in ROCM. For example, PyTorch also cache algos: |
It does look like the cache in pytorch checks quite a few parameters https://github.com/pytorch/pytorch/blob/26fbbc3e844c8267aa8cf78f152c9612da40dc89/aten/src/ATen/native/miopen/Conv_miopen.cpp#L172 among those the mihandle. I am going to instrument a bit to see if there is a corner case I have not considered |
I stumbled into the same issue, while trying to use onnxruntime with Immich too. I had this issue only while running ML jobs in parallel (I describe everything in the link above), but worked fine in single threaded execution. However, I've managed to get something to work by applying the content of this PR with a small patch needed after a rebase on the diff --git a/onnxruntime/core/providers/rocm/nn/conv_transpose.cc b/onnxruntime/core/providers/rocm/nn/conv_transpose.cc
index 45ed4c8ac..79a11c0b1 100644
--- a/onnxruntime/core/providers/rocm/nn/conv_transpose.cc
+++ b/onnxruntime/core/providers/rocm/nn/conv_transpose.cc
@@ -76,7 +76,6 @@ Status ConvTranspose<T, NHWC>::DoConvTranspose(OpKernelContext* context, bool dy
if (w_dims_changed) {
s_.last_w_dims = gsl::make_span(w_dims);
- s_.cached_benchmark_bwd_results.clear();
}
ConvTransposeAttributes::Prepare p; I don't have any skills on all this, I just wanted to make use of my AMD RX 6400 for ML |
@Zelnes, I can understand that the cache is not thread safe. Basically, when w changed, it will clear cache but another thread might still use the cache. A quick fix is to add a mutex to guard the cache like this More complete solution is to add more parameters (like weight shape and conv_desc etc like pytorch) to the key of hash table so that we will never need clear the cache. Current implementation has assumption that those non-shape parameters does not change. It is not always True, which will lead to `MIOpen Error: No invoker was registered for convolution forward.'. Cache is needed for good performance (finding algo takes time so it does not make sense to do it every time). @PeixuanZuo for awareness the issue. |
Description
Similar to the work done by Liangxijun-1001 in
apache/tvm#16178 the ROCM spec mandates calling miopenFindConvolution*Algorithm() before using any Convolution API
This is the link to the porting guide describing this requirement https://rocmdocs.amd.com/projects/MIOpen/en/latest/MIOpen_Porting_Guide.html
Thus, this change disables the algo cache and enforces the official API semantics
Motivation and Context
Without this change onnxruntime would fail subsequent calls to convolution APIs with errors like
'MIOpen Error: No invoker was registered for convolution forward.' It was reproduced, for example,
while porting immich to use ROCM acceleration
fixes this: #19566