-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LowerToAIE] fix for op could not fold repetition counts
#1055
Conversation
80b9697
to
e6fa5b0
Compare
op could not fold repetition counts
op could not fold repetition counts
// Sort and unique-ify the repetition counts: | ||
std::sort(repetitionCounts.begin(), repetitionCounts.end()); | ||
auto last = std::unique(repetitionCounts.begin(), repetitionCounts.end()); | ||
repetitionCounts.erase(last, repetitionCounts.end()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the sort + uniquify needed? Can't you just look for the smallest repetitionCount
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed, I've removed it.
std::string arrString(ArrayRef<int64_t> vs) { | ||
return std::string("[") | ||
.append(llvm::join( | ||
llvm::map_range(vs, [](int64_t v) { return std::to_string(v); }), | ||
",")) | ||
.append("]"); | ||
} | ||
|
||
std::string intOpFoldResultsString(ArrayRef<OpFoldResult> ofrs) { | ||
auto maybeValues = mlir::getConstantIntValues(ofrs); | ||
if (maybeValues.has_value()) return "[not all constant]"; | ||
return arrString(maybeValues.value()); | ||
} | ||
} // namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's useful to put these in AMDAIEUtils
?
%7 = amdaie.npu.circular_dma_cpy_nd %3([0] [512] [1], [] [] []) | ||
%8 = amdaie.npu.circular_dma_cpy_nd %6([0, 0, 0, 0] [2, 16, 4, 8] [0, 8, 128, 1], [0, 0] [2, 512] [0, 1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was happening was that the lower-to-aie pass is assumed a stride-0 leading dimension was always or never present, but that wasn't the case
It's about the [] [] []
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I'm not sure. Here are some more examples of pass/fail before this PR. But I'll remove my comment from the summary (which seems wrong).
// Fail (original):
%7 = amdaie.npu.circular_dma_cpy_nd %3([0] [512] [1], [] [] [])
%8 = amdaie.npu.circular_dma_cpy_nd %6([0, 0, 0, 0] [2, 16, 4, 8] [0, 8, 128, 1], [0, 0] [2, 512] [0, 1])
// Fail: make source of first explicit access pattern
%7 = amdaie.npu.circular_dma_cpy_nd %3([0] [512] [1], [0] [512] [1])
%8 = amdaie.npu.circular_dma_cpy_nd %6([0, 0, 0, 0] [2, 16, 4, 8] [0, 8, 128, 1], [0, 0] [2, 512] [0, 1])
// Fail: another attempt to make the source of first explicit access pattern
%7 = amdaie.npu.circular_dma_cpy_nd %3([0] [512] [1], [0, 0] [2, 512] [0, 1])
%8 = amdaie.npu.circular_dma_cpy_nd %6([0, 0, 0, 0] [2, 16, 4, 8] [0, 8, 128, 1], [0, 0] [2, 512] [0, 1])
// Fail: another attempt to make the source of first explicit access pattern
%7 = amdaie.npu.circular_dma_cpy_nd %3([0] [512] [1], [0, 0] [1, 512] [512, 1])
%8 = amdaie.npu.circular_dma_cpy_nd %6([0, 0, 0, 0] [2, 16, 4, 8] [0, 8, 128, 1], [0, 0] [2, 512] [0, 1])
// Pass:
%7 = amdaie.npu.circular_dma_cpy_nd %3([0] [512] [1], [] [] [])
%8 = amdaie.npu.circular_dma_cpy_nd %6([0, 0, 0, 0] [1, 16, 4, 8] [1, 8, 128, 1], [0, 0] [1, 512] [1, 1])
// Pass:
%7 = amdaie.npu.circular_dma_cpy_nd %3([0] [512] [1], [0, 0] [1, 512] [0, 1])
%8 = amdaie.npu.circular_dma_cpy_nd %6([0, 0, 0, 0] [1, 16, 4, 8] [0, 8, 128, 1], [0, 0] [2, 256] [256, 1])
// Pass:
%7 = amdaie.npu.circular_dma_cpy_nd %3([0] [512] [1], [0, 0] [1, 512] [0, 1])
%8 = amdaie.npu.circular_dma_cpy_nd %6([0, 0, 0, 0] [2, 8, 4, 8] [0, 8, 128, 1], [0, 0] [2, 256] [256, 1])
// Fail:
%7 = amdaie.npu.circular_dma_cpy_nd %3([0] [512] [1], [0, 0] [2, 512] [0, 1])
%8 = amdaie.npu.circular_dma_cpy_nd %6([0, 0, 0, 0] [2, 16, 4, 8] [0, 8, 128, 1], [0, 0] [2, 512] [0, 1])
// Fail:
%7 = amdaie.npu.circular_dma_cpy_nd %3([0] [512] [1], [] [] [])
%8 = amdaie.npu.circular_dma_cpy_nd %6([0, 0, 0, 0] [2, 16, 4, 8] [0, 8, 128, 1], [0, 0] [2, 512] [0, 1])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see, could you add test cases for some of the other distinct failure cases as well? I know it's a bit cumbersome as this pass needs a lot of collateral information to work, but you could reuse the same module with additional connections for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I added all the failure cases above. I tried doing it in a new .mlir file using sed to substitute in the access patterns, which made it nice and compact but also hard to understand.
09b0e9c
to
901e008
Compare
This is a fix for the compilation error
that is observed some transposed matmuls with the new pack-peel pipeline. What was happening was that the
lower-to-aie
pass is assumed a stride-0 leading dimension was always or never present, but that wasn't the case