Skip to content

Conversation

@Sameeranjoshi
Copy link

No description provided.

1. Use a custom legalizer for bf16(only works for smaller than 32 elements)
2. Use underlying G_FPEXT to first convert bf16 -> f32 then perform add.
  G_FADD(G_FPEXT(V1), G_FPEXT(V2))
  Convert back to original types and shapes if needed.
3. Can see fully vectorized code which wasn't seen before.
Break <64xbf16> into 2 chunks of <32xbf16>.
Pending check: Not sure about how to verify pad and unpad logic,
seems it's unrolling into a lot of boilerplate code.
Less than
Vectors of f32 = 16, 32 gets converted to 64xf32 as those are legal.
Vectors of bf16 = 32xbf16 is a custom case converts other sizes into
this corresponding vector.
…d code.

This patch is dependent on Xilinx#548 and Xilinx#557. Previously bf16 and f32 failed to
generate fully vectorized code and used to scalarize, this test makes sures
different types and vector sizes work and are fully vectorized.

This is a supplementary patch for verifying below pipeline:
Part 1: `vector.multi_reduction` to `vector.reduction` to `llvm.vector.reduce.fadd.*`
nod-ai/iree-amd-aie#1336
Part 2: Further lowers to AIE2P instructions.(This patch)
Implement support in legalizer for Float32 types.
@Sameeranjoshi
Copy link
Author

Squashed into #604

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant