Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIEX] Combine G_SHUFFLE_VECTOR to COPY #346

Merged
merged 2 commits into from
Feb 18, 2025

Conversation

katerynamuts
Copy link
Collaborator

No description provided.

return false;

// If the mask has only -1 (undef), do nothing
auto allUndefs = [&NumSrcElems](const ArrayRef<int> &Mask) -> bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capture by value?

body: |
bb.1:
; CHECK-LABEL: name: shuffle_vector_to_copy_scalar
; CHECK: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not really a copy

%1:_(<16 x s32>) = COPY $x0
%2:_(<16 x s32>) = COPY $x1
%0:_(<16 x s32>) = G_SHUFFLE_VECTOR %1(<16 x s32>), %2(<16 x s32>), shufflemask(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
PseudoRET implicit $lr, implicit %0
...
# Note: currently not commutable
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is exactly the same as matching a sawtooth with period, amplitude and height parameters that we saw recently. May I suggest to factor that code out, and build the commutable version by checking the resulting height and select the appropriate operand?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip: I would create a base class that generalizes a mask match. The instances will fix free parameters, like the height in this example, while matching. A base class for a sawtooth has period, amplitude and height as free parameters. Specializations require for instance Period == amplitude. Different constructors can pre-fix specific free parameters.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A negative amplitude would indicate possibilities for shuffle modes that reverse.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

%2:_(<2 x s32>) = COPY $l1
%0:_(<2 x s32>) = G_SHUFFLE_VECTOR %1(<2 x s32>), %2(<2 x s32>), shufflemask(1, 2)
PseudoRET implicit $lr, implicit %0
...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add a boundary case test, e.g. a mask with eight elements, which only match in the first seven positions

return false;

// If the mask has only -1 (undef), do nothing
auto allUndefs = [&NumSrcElems](const ArrayRef<int> &Mask) -> bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AllUndefs.

BuildFnTy &MatchInfo) {
assert(MI.getOpcode() == TargetOpcode::G_SHUFFLE_VECTOR);

const Register DstReg = MI.getOperand(0).getReg();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use something like this:

const auto [DstReg, DstTy, Src1Reg, Src1Ty] = MI.getFirst2RegLLTs();

?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I guess that this could trigger some clang frontend bug....

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this I am getting the following error: " error: reference to local binding 'DstReg' declared in enclosing function 'llvm::matchShuffleToCopy'
MatchInfo = [=](MachineIRBuilder &B) { B.buildCopy(DstReg, Src1Reg); };"

and the same error for Src1Reg.

Copy link
Collaborator

@andcarminati andcarminati Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is a clang bug.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means that you can explain every line from the C++ standard related to structural bindings? Respect.

@katerynamuts katerynamuts force-pushed the kmuts.shuffle-to-copy branch 2 times, most recently from fa49412 to dca2a49 Compare February 12, 2025 14:21
public:
SequentialPeriodicMaskMatch(unsigned Height, unsigned Period)
: BaseMaskMatch{Height, Period} {}
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These classes don't add much in this way. We could also have created BaseMaskMatch with the constructor used here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I removed the derived classes for now, if we need them, we can add them later.

}

unsigned Period = 0;
unsigned Height = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had imagined these as std::optional, indicating that they were free parameters. Then, the match function, isValidMask, would use this freedom to make it match the given mask. Matching on a mask without dont-care values, it would fix Height on the first element, the Amplitude on the second, and period when the sawtooth ended. It would return false the first time it couldn't adjust a free parameter. If you would know the period that you were looking for, you would initialize it with concrete values using a constructor, or a setter function.
I guess that for our purposes, this generalized sawtooth is a reasonable base, although we could make the base class even more generic. For instance, we could match the different transpose modes for VSHUFFLE in this form as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But don't pursue this idea at this point. It is already good to have it factored this far.

// Find the start value of the mask
if (FirstNotUndef) {
// Get the start value
const unsigned MaskStart = Mask[Idx] - Idx % Period;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could reproduce the test fails with gcc build and the issue is here. Period can be zero and it results in divide by zero.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, thanks, @niwinanto! After cleaning the ccache, I still couldn't reproduce it.

@martien-de-jong
Copy link
Collaborator

I think I'll generalize this a bit more after it has been merged.

@martien-de-jong martien-de-jong merged commit 42774ee into aie-public Feb 18, 2025
8 checks passed
@martien-de-jong martien-de-jong deleted the kmuts.shuffle-to-copy branch February 18, 2025 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants