Skip to content

Conversation

@Sameeranjoshi
Copy link

Depends on #548

1. Use a custom legalizer for bf16(only works for smaller than 32 elements)
2. Use underlying G_FPEXT to first convert bf16 -> f32 then perform add.
  G_FADD(G_FPEXT(V1), G_FPEXT(V2))
  Convert back to original types and shapes if needed.
3. Can see fully vectorized code which wasn't seen before.
Break <64xbf16> into 2 chunks of <32xbf16>.
Pending check: Not sure about how to verify pad and unpad logic,
seems it's unrolling into a lot of boilerplate code.
Less than
Vectors of f32 = 16, 32 gets converted to 64xf32 as those are legal.
Vectors of bf16 = 32xbf16 is a custom case converts other sizes into
this corresponding vector.
Sameeranjoshi added a commit to Sameeranjoshi/iree-amd-aie that referenced this pull request Jul 21, 2025
Patch adds tests to make sure all the tests with `vector.multi_reduction`
generate successfully pass Peano legalizer and generate efficient vectorized code.
This patch checks only the IREE side to keep the dependency minimun on Peano.
(Depends on Peano:
1. Xilinx/llvm-aie#548
2. Xilinx/llvm-aie#557
)

1. `reassociateFpReductions=true` is must else code is scalarized. This flag
could be added into the IREE vectorization pipeline to trigger automatically.
2. bf16/i32/f32 all types with different sizes work now.
Sameeranjoshi added a commit to Sameeranjoshi/iree-amd-aie that referenced this pull request Jul 21, 2025
Patch adds tests to make sure all the tests with `vector.multi_reduction`
generate successfully pass Peano legalizer and generate efficient vectorized code.
This patch checks only the IREE side to keep the dependency minimun on Peano.
(Depends on Peano:
1. Xilinx/llvm-aie#548
2. Xilinx/llvm-aie#557
)

1. `reassociateFpReductions=true` is must else code is scalarized. This flag
could be added into the IREE vectorization pipeline to trigger automatically.
2. bf16/i32/f32 all types with different sizes work now.
Sameeranjoshi added a commit to Sameeranjoshi/llvm-aie that referenced this pull request Jul 22, 2025
…d code.

This patch is dependent on Xilinx#548 and Xilinx#557. Previously bf16 and f32 failed to
generate fully vectorized code and used to scalarize, this test makes sures
different types and vector sizes work and are fully vectorized.

This is a supplementary patch for verifying below pipeline:
Part 1: `vector.multi_reduction` to `vector.reduction` to `llvm.vector.reduce.fadd.*`
nod-ai/iree-amd-aie#1336
Part 2: Further lowers to AIE2P instructions.(This patch)
@@ -0,0 +1,56 @@
; RUN: llc -mtriple=aie2p -O0 -stop-after=legalizer %s -o - 2>&1 | FileCheck %s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the tests, it is better to run just the pass of interest. For example, you can create an MIR test including just the ilegal type operation and run llc with -run-pass=legalizer. In this way we can easily spot the specific legalization change in action.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The good part is that you can use llvm/utils/update_mir_test_checks.py to update the tests.

@Sameeranjoshi
Copy link
Author

Squashed into #604

mgehre-amd pushed a commit that referenced this pull request Aug 21, 2025
[AutoBump] Merge with fixes of 8388040 (Jan 23) (19)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants