Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LowerToAIE] fix for op could not fold repetition counts #1055

Merged
merged 5 commits into from
Jan 29, 2025

Conversation

newling
Copy link
Contributor

@newling newling commented Jan 23, 2025

This is a fix for the compilation error

<unknown>:0: error: 'amdaie.npu.circular_dma_cpy_nd' op could not fold repetition counts

that is observed some transposed matmuls with the new pack-peel pipeline. What was happening was that the lower-to-aie pass is assumed a stride-0 leading dimension was always or never present, but that wasn't the case

@newling newling force-pushed the fold_repetition_count_fix branch 4 times, most recently from 80b9697 to e6fa5b0 Compare January 28, 2025 00:54
@newling newling changed the title [WIP][LowerToAIE] fix for op could not fold repetition counts [LowerToAIE] fix for op could not fold repetition counts Jan 28, 2025
Comment on lines 119 to 122
// Sort and unique-ify the repetition counts:
std::sort(repetitionCounts.begin(), repetitionCounts.end());
auto last = std::unique(repetitionCounts.begin(), repetitionCounts.end());
repetitionCounts.erase(last, repetitionCounts.end());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the sort + uniquify needed? Can't you just look for the smallest repetitionCount?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed, I've removed it.

Comment on lines 41 to 54
std::string arrString(ArrayRef<int64_t> vs) {
return std::string("[")
.append(llvm::join(
llvm::map_range(vs, [](int64_t v) { return std::to_string(v); }),
","))
.append("]");
}

std::string intOpFoldResultsString(ArrayRef<OpFoldResult> ofrs) {
auto maybeValues = mlir::getConstantIntValues(ofrs);
if (maybeValues.has_value()) return "[not all constant]";
return arrString(maybeValues.value());
}
} // namespace
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's useful to put these in AMDAIEUtils?

Comment on lines +1184 to +1418
%7 = amdaie.npu.circular_dma_cpy_nd %3([0] [512] [1], [] [] [])
%8 = amdaie.npu.circular_dma_cpy_nd %6([0, 0, 0, 0] [2, 16, 4, 8] [0, 8, 128, 1], [0, 0] [2, 512] [0, 1])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was happening was that the lower-to-aie pass is assumed a stride-0 leading dimension was always or never present, but that wasn't the case

It's about the [] [] [] right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I'm not sure. Here are some more examples of pass/fail before this PR. But I'll remove my comment from the summary (which seems wrong).

// Fail (original):
%7 = amdaie.npu.circular_dma_cpy_nd %3([0] [512] [1], [] [] [])
%8 = amdaie.npu.circular_dma_cpy_nd %6([0, 0, 0, 0] [2, 16, 4, 8] [0, 8, 128, 1], [0, 0] [2, 512] [0, 1])

// Fail: make source of first explicit access pattern
%7 = amdaie.npu.circular_dma_cpy_nd %3([0] [512] [1], [0] [512] [1])
%8 = amdaie.npu.circular_dma_cpy_nd %6([0, 0, 0, 0] [2, 16, 4, 8] [0, 8, 128, 1], [0, 0] [2, 512] [0, 1])

// Fail: another attempt to make the source of first explicit access pattern
%7 = amdaie.npu.circular_dma_cpy_nd %3([0] [512] [1], [0, 0] [2, 512] [0, 1])
%8 = amdaie.npu.circular_dma_cpy_nd %6([0, 0, 0, 0] [2, 16, 4, 8] [0, 8, 128, 1], [0, 0] [2, 512] [0, 1])

// Fail: another attempt to make the source of first explicit access pattern
%7 = amdaie.npu.circular_dma_cpy_nd %3([0] [512] [1], [0, 0] [1, 512] [512, 1])
%8 = amdaie.npu.circular_dma_cpy_nd %6([0, 0, 0, 0] [2, 16, 4, 8] [0, 8, 128, 1], [0, 0] [2, 512] [0, 1])

// Pass:
%7 = amdaie.npu.circular_dma_cpy_nd %3([0] [512] [1], [] [] [])
%8 = amdaie.npu.circular_dma_cpy_nd %6([0, 0, 0, 0] [1, 16, 4, 8] [1, 8, 128, 1], [0, 0] [1, 512] [1, 1])

// Pass:
%7 = amdaie.npu.circular_dma_cpy_nd %3([0] [512] [1], [0, 0] [1, 512] [0, 1])
%8 = amdaie.npu.circular_dma_cpy_nd %6([0, 0, 0, 0] [1, 16, 4, 8] [0, 8, 128, 1], [0, 0] [2, 256] [256, 1])

// Pass:
%7 = amdaie.npu.circular_dma_cpy_nd %3([0] [512] [1], [0, 0] [1, 512] [0, 1])
%8 = amdaie.npu.circular_dma_cpy_nd %6([0, 0, 0, 0] [2, 8, 4, 8] [0, 8, 128, 1], [0, 0] [2, 256] [256, 1])

// Fail:
%7 = amdaie.npu.circular_dma_cpy_nd %3([0] [512] [1], [0, 0] [2, 512] [0, 1])
%8 = amdaie.npu.circular_dma_cpy_nd %6([0, 0, 0, 0] [2, 16, 4, 8] [0, 8, 128, 1], [0, 0] [2, 512] [0, 1])

// Fail:
%7 = amdaie.npu.circular_dma_cpy_nd %3([0] [512] [1], [] [] [])
%8 = amdaie.npu.circular_dma_cpy_nd %6([0, 0, 0, 0] [2, 16, 4, 8] [0, 8, 128, 1], [0, 0] [2, 512] [0, 1])

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see, could you add test cases for some of the other distinct failure cases as well? I know it's a bit cumbersome as this pass needs a lot of collateral information to work, but you could reuse the same module with additional connections for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I added all the failure cases above. I tried doing it in a new .mlir file using sed to substitute in the access patterns, which made it nice and compact but also hard to understand.

@newling newling force-pushed the fold_repetition_count_fix branch from 09b0e9c to 901e008 Compare January 29, 2025 01:38
@jtuyls jtuyls merged commit 9438dd2 into nod-ai:main Jan 29, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants