-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIE2] Combiners for 8x8->8x8 and 8x4->4x8 matrix transposes #76
base: vvandebe.shufflevector.pattern.optimization
Are you sure you want to change the base?
[AIE2] Combiners for 8x8->8x8 and 8x4->4x8 matrix transposes #76
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shuffle modes description is difficult to read at first. Each element type and matrix dimension uses its own shuffle mode.
llvm/test/CodeGen/AIE/aie2/GlobalISel/prelegalizercombiner-shufflevector.mir
Outdated
Show resolved
Hide resolved
llvm/test/CodeGen/AIE/aie2/GlobalISel/prelegalizercombiner-shufflevector.mir
Outdated
Show resolved
Hide resolved
%1:_(<64 x s8>) = COPY $x0 | ||
%2:_(<64 x s8>) = COPY $x1 | ||
%0:_(<64 x s8>) = G_SHUFFLE_VECTOR %1:_(<64 x s8>), %2:_, shufflemask(0, 16, 32, 48, 1, 17, 33, 49, 2, 18, 34, 50, 3, 19, 35, 51, 4, 20, 36, 52, 5, 21, 37, 53, 6, 22, 38, 54, 7, 23, 39, 55, 8, 24, 40, 56, 9, 25, 41, 57, 10, 26, 42, 58, 11, 27, 43, 59, 12, 28, 44, 60, 13, 29, 45, 61, 14, 30, 46, 62, 15, 31, 47, 63) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mode 35 operates on a 8-bit element 8x8 matrix.
That would need to match shufflemask(0, 8, 16, 24, 1, 9, 17, 25, ...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it does but vshuffle takes 1024 bytes of input. It ignores the higher bits of the input. 35 operates on a 8x8 8-bit element matrix, 64x8 vector, which is 512-bits. In the first case, those two are split into two 4x8 8-bit vectors and the second case is the "common" case where we just ignore the higher order bits.
What you propose would be a 4x8 8-bit match which is a 32x8 vector or 256-bit.
0e664aa
to
f872cf4
Compare
Yeah, it is a strange set of the ISA. At the moment I am orientating myself by looking at the image descriptions since from the perspective of |
f872cf4
to
73a92c2
Compare
be3751a
to
5654047
Compare
73a92c2
to
e160d1c
Compare
e160d1c
to
07244e7
Compare
07244e7
to
4836f6a
Compare
5654047
to
84f3995
Compare
b41f4e1
to
ea44d18
Compare
84f3995
to
5c3b1a6
Compare
4c022df
to
b49d34c
Compare
f855c29
to
4d6af83
Compare
We check for iterative shift masks which corresponds to the CONCAT_VECTOR instruction.
…hunks of a vector
… of two vectors together
4d6af83
to
13cc82a
Compare
e05b018
to
aec1600
Compare
aec1600
to
d1d0a3a
Compare
--- | ||
name: concat_vector_reverse_32_512_random | ||
legalized: false | ||
body: | | ||
bb.1.entry: | ||
liveins: $wl2, $wl4 | ||
; CHECK-LABEL: name: concat_vector_reverse_32_512_random | ||
; CHECK: liveins: $wl2, $wl4 | ||
; CHECK-NEXT: {{ $}} | ||
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(<8 x s32>) = COPY $wl2 | ||
; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(<8 x s32>) = COPY $wl4 | ||
; CHECK-NEXT: [[CONCAT_VECTORS:%[0-9]+]]:_(<16 x s32>) = G_CONCAT_VECTORS [[COPY1]](<8 x s32>), [[COPY]](<8 x s32>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These got moved somehow, I will fix it when i get to it
c38562a
to
ebe6489
Compare
This merge requests adds the same framework for Shuffle Vector combinations used in #41 to the AIE2 backend. It also defines a new generator that generates the pattern used for a decent for subset of the shuffle vector modes. Concretely it matches mode 35 (4x4 -> 4x4 matrix transpose) and 29 (8x4->4x8 matrix transpose). Finally, it adds a generic opcode for the AIE vshuffle instruction.