-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIE2P] Fix combine G_SHUFFLE_VECTOR into G_AIE_VSEL #294
Conversation
@@ -1791,32 +1791,57 @@ bool llvm::matchShuffleToVSel( | |||
|
|||
const LLT DstTy = MRI.getType(DstReg); | |||
const LLT Src1Ty = MRI.getType(Src1Reg); | |||
if (Src1Ty.getSizeInBits() != 512) | |||
if (Src1Ty.getSizeInBits() != 512 || Src1Ty == LLT::scalar(64)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be checking Src1Ty.getElementType() == LLT::scalar(64)
, I mean we should bail out for<8 x s64>
. Also, good idea to have a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see there is a test already, not sure how it works. May be mask value is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, the mask was wrong. I fixed it.
int CurrIdx = Mask[I] % NumSrcElems; | ||
if (CurrIdx <= PrevIdx) | ||
return false; | ||
|
||
PrevIdx = CurrIdx; | ||
++I; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not really required, for loop below should handle.
; CHECK-NEXT: PseudoRET implicit $lr, implicit [[SHUF]](<8 x s64>) | ||
%1:_(<8 x s64>) = COPY $x2 | ||
%8:_(<8 x s64>) = G_IMPLICIT_DEF | ||
%0:_(<8 x s64>) = G_SHUFFLE_VECTOR %8(<8 x s64>), %1, shufflemask(8, 1, 2, 3, 11, 12, 13, 14) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mask value should be shufflemask(8, 1, 2, 3, 12, 13, 14, 15)
e6f8920
to
8945459
Compare
8945459
to
3bd5bd2
Compare
%4:_(<8 x s32>) = COPY $wl4 | ||
%3:_(<4 x s32>) = G_AIE_UNPAD_VECTOR %4(<8 x s32>) | ||
%8:_(<16 x s32>) = G_AIE_PAD_VECTOR_UNDEF %3(<4 x s32>) | ||
%8:_(<16 x s32>) = G_IMPLICIT_DEF | ||
%0:_(<16 x s32>) = G_SHUFFLE_VECTOR %8(<16 x s32>), %1, shufflemask(0, 1, 2, 3, 16, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the name of this test strange. No index is repeated, rather, vsel can't reach index 16 in position 4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
16 is the index 0 of the second vector, that's what I meant here :) Any suggestions for the better name? Nothing comes to my mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I view it as element 16 not suitable for position 4. lane_mismatch? unaligned_index?
; CHECK-NEXT: PseudoRET implicit $lr, implicit [[SHUF]](<8 x s64>) | ||
%1:_(<8 x s64>) = COPY $x2 | ||
%8:_(<8 x s64>) = G_IMPLICIT_DEF | ||
%0:_(<8 x s64>) = G_SHUFFLE_VECTOR %8(<8 x s64>), %1, shufflemask(8, 1, 2, 3, 12, 13, 14, 15) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this equivalent to {16, 17, 2, 3, 4, 5, 6, 7, 24, 25, 26, 27, 28, 29, 30, 31} on 16 x 32 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it is but here I wanted to test that we never substitute G_SHUFFLE_VECTOR
with G_AIE_VSEL
for ``s64because during instruction selection we map
G_AIE_VSEL` to `VSEL` where the words can be 8, 16, or 32-bit wide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Park it for now as an improvement story?
@@ -1810,18 +1811,37 @@ bool llvm::matchShuffleToVSel( | |||
// the i-th element from Src2 is used. | |||
// 2. The mask indices modulo the number of elements are in strictly ascending | |||
// order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be simplified by saying that shufflemask[i] should be -1, i or i + elemcount?
I'm assuming -1 is don't care, and we choose it to be i, leading to a zero bit in the select mask.
We then just need to run one loop, returning false as soon as we find an illegal index
3bd5bd2
to
f17984b
Compare
8da0d8e
to
b95532f
Compare
if (Idx >= (int)NumSrcElems) { | ||
unsigned long long ElemMask = 1 << I; | ||
DstMask |= ElemMask; | ||
} | ||
} | ||
|
||
MatchInfo = std::make_tuple(DstReg, Src1Reg, Src2Reg, DstMask); | ||
MatchInfo = [=, &TII](MachineIRBuilder &B) { | ||
auto Cst = B.buildConstant(LLT::scalar(32), DstMask); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Register MaskReg =
++I; | ||
return (Value == -1 || Value == I || Value == I + (int)NumSrcElems); | ||
})) { | ||
return false; | ||
} | ||
|
||
// Create the mask | ||
unsigned long long DstMask = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should replace this (and the next one) type by the standard type uint64_t
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although unsigned long long has been longer in the C standard and is guaranteed to be at least 64 bits.
b95532f
to
3b5e98f
Compare
int Idx = Mask[I]; | ||
if (Idx == -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I had meant to put the full check of the above std::all_of
in here; we can safely return false if the condition doesn't hold; Value == I + (int)NumSrcElems
can be reused as the test to insert a one bit.
Also, the fact that we have to capture I
by reference makes it a bit unnatural to use std::all_of.
if (Idx == -1 || Idx == I)
continue;
else if (Idx == I + NumSrcElements)
DstMask |= uint64_t(1) << I;
else
return false;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh, I got it. Thanks! I changed it.
3b5e98f
to
222db34
Compare
There are some new comments in this merged PR: #272
This PR addresses all comments from the previous PR.