-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix][Ansor] Fixing BroadcastShape function #17627
base: main
Are you sure you want to change the base?
Conversation
234b87c
to
d2facb3
Compare
2fa4299
to
93feceb
Compare
include/tvm/topi/detail/broadcast.h
Outdated
<< " in: " << tvm::Array<tvm::PrimExpr>(shape1.begin(), shape1.end()) << " and " | ||
<< tvm::Array<tvm::PrimExpr>(shape2.begin(), shape2.end()); | ||
LOG(WARNING) << "Incompatible broadcast dims: " << shape1[s1_size - i] << " and " | ||
<< shape2[s2_size - i] << ". Automatically cutting the larger dimension."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You relax here the constraint, leaving explicit warning on this, maybe enough for brodcasting case.
Not an english native here, maybe s/cutting/trimming/ would sound better ? I leave it up to your consideration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trimming sounds better to me (I'm also not a native English speaker). I've made the correction.
93feceb
to
1c6d127
Compare
Hi @thaisacs , Thanks for your contribution !
Should't instead this fix or add to your model a trim operation or something to complain with that shape mismatch ? |
Updating, more reasoning: If metaschedule iterates through and bail-out/abort as whole program then maybe we should relax as you proposed. But let's see others opinion on this statement Cc @Hzfengsy @MasterJH5574 Also, for acceptance a small testcase would be nice for such broadcast scenario here at test_topi_broadcast.py . |
Hello, I'm conducting experiments with the following models:
Imported directly from PyTorch to Relay. |
Thank you, see now. But from your description it is still not clear the extend of the issue: Metaschedule (a) bails-out/abort as a whole program or (b) only skips on some specific layers ?
Lets again see others opinion here on it. Could attach outputs (as text.gz) here with complete schedule proccess from your side ? |
When I execute the following function with the GoogLeNet network, for example, and
The program is aborted during compilation, specifically at the following lines:
Then, we are unable to generate code and perform model inference. Note that during tuning to generate the |
1c6d127
to
d7dc989
Compare
Thanks for the details, see now what you are trying.
See my attached relax-metasched-resnet18.log.gz test script, more online here Notes:
Please let me know if issue still persists for you via Later update:
Back to your proposed fix,
|
Thank you for your response and help. I am using Relay and the auto-scheduler (a.k.a. Ansor), but I don't think the issue is in Relay. I believe that addressing this issue is crucial for maintaining support for the auto-scheduler. Was support for the auto-scheduler discontinued in the next release? |
A pinpoint example would help here to understand the source cause, relay vs relax flow differs in ways the passes are applied. Cc @Hzfengsy
auto-scheduler (metaschedule/ansor) stays , the older autotune with relay flow goes (unmaintained since a while). I suggest to adapt your code to an equivalent modern relax flow, if this issue still pops up we must look at it. |
Thanks. I'll test it on relax flow. |
Thanks for the great discussion!
What do you mean by "more frequently"? IIUC, if it's a TOPI issue, it will always or never happen, but not a probability event.
The error seems reasonable. So I wonder where the root of the issue is. To be clearer, TOPI or ansor |
I think it's in the interaction of the two: auto-scheduler and TOPI. The auto-scheduler can find schedules for some layers, but not all. When attempting to compile the entire model, TOPI needs to deal with the case that the internal tensors have different dimensions. Currently, TOPI handles this by stopping the compilation process. Note that, when a tensor's dimension is dynamic and cannot be determined at compile time, TOPI prematurely considers the shape of the output tensor. |
As practical note, if the amount of sample limit per layer is too low metaschedule will fail to propose any valid sketches for layers. If we "shunt" things like this |
I think that the broadcast shape function has no impact on the search. It is only used for the final compilation of the model. Aren't invalid schedules removed by the evolutionary search? The searches I performed considered 1000 points per layer of the model. For example, in resnet_152 with 27 layers, the evolutionary search explored 26968 schedules. |
Behavior before correction
When Ansor doesn't find a schedule for a layer during tuning, like this:
I was getting model compilation failures like this
InternalError: Check failed: (false) is false: Incompatible broadcast dims: 144 and 128 in: [1, 7, 7, 144] and [1, 1, 1, 128]
Behavior after correction
Now, When Ansor doesn't find a schedule for a layer during tuning, like this:
the code reports the following warning
And, I can compile the models without compilation and execution errors and the accuracy of the model is maintained.