-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDAIEFuseFillIntoForall] Handle case where fill output is not sliced #976
Conversation
I think in applications when we see loops like
The forall should be canonicalized away and we don't have to fuse the fill op into the loop. However, I can see why this is needed here. We want to keep the thread mapping attribute? |
The canonicalizer doesn't ever remove scf.forall ops with thread/block ids afaik. Which is, as I think you're saying, exactly what we want. |
We can always modify the canonicalize patterns to include such case. But the real question is do we want such loop to be canonicalized away? In our current pipeline, a lot of passes have dependency on thread mapping attribute. |
I agree that if it were canonicalized away, we'd have many issues in our passes. That is why this PR doesn't try to eliminate the scf.forall. |
The reason I'm discussing this here is that when we initially created these individual passes, the general principle was to reuse as many upstream functions as possible, such as |
36bb093
to
98f610c
Compare
I've updated it to use the upstream function if there is a slice present. I don't see a way to use that function if there is no slice |
if (std::distance(fillUses.begin(), fillUses.end()) != 1) return; | ||
OpOperand &fillUse = *fillUses.begin(); | ||
auto forallOp = dyn_cast<scf::ForallOp>(fillUse.getOwner()); | ||
if (!forallOp) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these "return" situations, I would add some debug messages instead of silently return.
ResultRange::use_range fillUses = fillOp->getUses(); | ||
if (std::distance(fillUses.begin(), fillUses.end()) != 1) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just check if (fillOp.hasOneUse())
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't look like that is a method on use_range
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should work on ops. fillOp->hasOneUse()
doesn't work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
tensor::ExtractSliceOp extractSliceOp; | ||
for (Operation *user : bbArg.getUsers()) { | ||
if (auto nxt = dyn_cast<tensor::ExtractSliceOp>(user)) { | ||
if (extractSliceOp) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why you return here. It should be break?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the original function for this purpose.
auto itBBArgUsers = llvm::find_if(bbArg.getUsers(), [&](Operation *user) {
auto sliceOp = dyn_cast<tensor::ExtractSliceOp>(user);
return sliceOp;
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple extract_slice ops -- bailing as this is unexpected.
return forallOp; | ||
// In the case where there are no extract_slice ops, we manually create the | ||
// fill at the beginning of the forall body. | ||
assert(!extractSliceOp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find code with lots of asserts easier to read, but I've changed it to an if-else
// In the case where there are no extract_slice ops, we manually create the | ||
// fill at the beginning of the forall body. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be better if you add some comments explaining when this situation happen (i.e., scf.forall loop count is 1).
Value scalar = fillOp.value(); | ||
Location loc = fillOp.getLoc(); | ||
auto fusedFill = rewriter.create<linalg::FillOp>(loc, scalar, bbArg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Value scalar = fillOp.value(); | |
Location loc = fillOp.getLoc(); | |
auto fusedFill = rewriter.create<linalg::FillOp>(loc, scalar, bbArg); | |
auto fusedFill = rewriter.create<linalg::FillOp>(fillOp.getLoc(), fillOp.value(), bbArg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change does not show up in the latest revision.
compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEFuseFillIntoForall.cpp
Outdated
Show resolved
Hide resolved
// Do not use the result of the old fill. | ||
rewriter.replaceAllUsesWith(fillOp.getResults()[0], fillOp.getOutputs()[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this be included in the above replaceUsesWithIf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, different ops
// check that the operand of scf.forall is not the filled tensor, because the | ||
// fill will take place inside the scf.forall. | ||
// CHECK: %[[FORALL:.*]] = scf.forall (%[[ARG1:.*]]) in (1) | ||
// CHECK-SAME: shared_outs(%[[ARG2:.*]] = %[[FUNCARG]]) | ||
|
||
// check for the new fill | ||
// CHECK: %[[NEWFILL:.*]] = linalg.fill | ||
// CHECK-SAME: outs(%[[ARG2]] : tensor<8xi8>) -> tensor<8xi8> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel it's not readable if you mixed CHECK
with the comments. I'd prefer you put all the comments above and keep //CHECK
section compact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
6b856b1
to
494fcd5
Compare
494fcd5
to
a85fb26
Compare
For 2x2 or 4x4 tiling the chain of ops after the fill looks like
i.e. the filled value enters an extract_slice inside the scf.forall. But for 1x1 tiling, it looks like
i.e. there is no intermediate extract_slice.
Before this PR, the logic was hardcoded to look for an extrac_slice, this PR relaxes this.
Before this PR, 1x1 tiling hits
iree-amd-aie/compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEFuseFillIntoForall.cpp
Line 71 in f5ab91e
After this PR, the compilation progresses further (fails much later in objectfifo pipeline, unrelated to this).