Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIE2P] Fix combine G_SHUFFLE_VECTOR into G_AIE_VSEL #294

Merged
merged 1 commit into from
Jan 29, 2025

Conversation

katerynamuts
Copy link
Collaborator

There are some new comments in this merged PR: #272
This PR addresses all comments from the previous PR.

@@ -1791,32 +1791,57 @@ bool llvm::matchShuffleToVSel(

const LLT DstTy = MRI.getType(DstReg);
const LLT Src1Ty = MRI.getType(Src1Reg);
if (Src1Ty.getSizeInBits() != 512)
if (Src1Ty.getSizeInBits() != 512 || Src1Ty == LLT::scalar(64))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be checking Src1Ty.getElementType() == LLT::scalar(64), I mean we should bail out for<8 x s64>. Also, good idea to have a test.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see there is a test already, not sure how it works. May be mask value is wrong.

Copy link
Collaborator Author

@katerynamuts katerynamuts Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, the mask was wrong. I fixed it.

Comment on lines 1820 to 1825
int CurrIdx = Mask[I] % NumSrcElems;
if (CurrIdx <= PrevIdx)
return false;

PrevIdx = CurrIdx;
++I;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not really required, for loop below should handle.

; CHECK-NEXT: PseudoRET implicit $lr, implicit [[SHUF]](<8 x s64>)
%1:_(<8 x s64>) = COPY $x2
%8:_(<8 x s64>) = G_IMPLICIT_DEF
%0:_(<8 x s64>) = G_SHUFFLE_VECTOR %8(<8 x s64>), %1, shufflemask(8, 1, 2, 3, 11, 12, 13, 14)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mask value should be shufflemask(8, 1, 2, 3, 12, 13, 14, 15)

%4:_(<8 x s32>) = COPY $wl4
%3:_(<4 x s32>) = G_AIE_UNPAD_VECTOR %4(<8 x s32>)
%8:_(<16 x s32>) = G_AIE_PAD_VECTOR_UNDEF %3(<4 x s32>)
%8:_(<16 x s32>) = G_IMPLICIT_DEF
%0:_(<16 x s32>) = G_SHUFFLE_VECTOR %8(<16 x s32>), %1, shufflemask(0, 1, 2, 3, 16, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the name of this test strange. No index is repeated, rather, vsel can't reach index 16 in position 4.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

16 is the index 0 of the second vector, that's what I meant here :) Any suggestions for the better name? Nothing comes to my mind.

Copy link
Collaborator

@martien-de-jong martien-de-jong Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I view it as element 16 not suitable for position 4. lane_mismatch? unaligned_index?

; CHECK-NEXT: PseudoRET implicit $lr, implicit [[SHUF]](<8 x s64>)
%1:_(<8 x s64>) = COPY $x2
%8:_(<8 x s64>) = G_IMPLICIT_DEF
%0:_(<8 x s64>) = G_SHUFFLE_VECTOR %8(<8 x s64>), %1, shufflemask(8, 1, 2, 3, 12, 13, 14, 15)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this equivalent to {16, 17, 2, 3, 4, 5, 6, 7, 24, 25, 26, 27, 28, 29, 30, 31} on 16 x 32 ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it is but here I wanted to test that we never substitute G_SHUFFLE_VECTOR with G_AIE_VSEL for ``s64because during instruction selection we mapG_AIE_VSEL` to `VSEL` where the words can be 8, 16, or 32-bit wide.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Park it for now as an improvement story?

@@ -1810,18 +1811,37 @@ bool llvm::matchShuffleToVSel(
// the i-th element from Src2 is used.
// 2. The mask indices modulo the number of elements are in strictly ascending
// order.
Copy link
Collaborator

@martien-de-jong martien-de-jong Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be simplified by saying that shufflemask[i] should be -1, i or i + elemcount?
I'm assuming -1 is don't care, and we choose it to be i, leading to a zero bit in the select mask.
We then just need to run one loop, returning false as soon as we find an illegal index

@katerynamuts katerynamuts changed the title [AIE2P] !Fixup Combine G_SHUFFLE_VECTOR into G_AIE_VSEL [AIE2P] Fixup combine G_SHUFFLE_VECTOR into G_AIE_VSEL Jan 28, 2025
@katerynamuts katerynamuts changed the title [AIE2P] Fixup combine G_SHUFFLE_VECTOR into G_AIE_VSEL [AIE2P] Fix combine G_SHUFFLE_VECTOR into G_AIE_VSEL Jan 28, 2025
@katerynamuts katerynamuts force-pushed the katemuts.vsel branch 2 times, most recently from 8da0d8e to b95532f Compare January 29, 2025 07:19
if (Idx >= (int)NumSrcElems) {
unsigned long long ElemMask = 1 << I;
DstMask |= ElemMask;
}
}

MatchInfo = std::make_tuple(DstReg, Src1Reg, Src2Reg, DstMask);
MatchInfo = [=, &TII](MachineIRBuilder &B) {
auto Cst = B.buildConstant(LLT::scalar(32), DstMask);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Register MaskReg =

++I;
return (Value == -1 || Value == I || Value == I + (int)NumSrcElems);
})) {
return false;
}

// Create the mask
unsigned long long DstMask = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should replace this (and the next one) type by the standard type uint64_t.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although unsigned long long has been longer in the C standard and is guaranteed to be at least 64 bits.

int Idx = Mask[I];
if (Idx == -1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I had meant to put the full check of the above std::all_of in here; we can safely return false if the condition doesn't hold; Value == I + (int)NumSrcElems can be reused as the test to insert a one bit.
Also, the fact that we have to capture I by reference makes it a bit unnatural to use std::all_of.

if (Idx == -1 || Idx == I)
  continue;
else if (Idx == I + NumSrcElements)
 DstMask |= uint64_t(1) << I;
else
  return false;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh, I got it. Thanks! I changed it.

@katerynamuts katerynamuts enabled auto-merge (rebase) January 29, 2025 12:35
@katerynamuts katerynamuts merged commit 0f3e26a into aie-public Jan 29, 2025
8 checks passed
@katerynamuts katerynamuts deleted the katemuts.vsel branch January 29, 2025 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants