-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[LoadStoreVectorizer] Propagate alignment through contiguous chain #145733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
15600f9
b905f1c
cd9e9f6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -343,6 +343,9 @@ class Vectorizer { | |
/// Postcondition: For all i, ret[i][0].second == 0, because the first instr | ||
/// in the chain is the leader, and an instr touches distance 0 from itself. | ||
std::vector<Chain> gatherChains(ArrayRef<Instruction *> Instrs); | ||
|
||
/// Propagates the best alignment in a chain of contiguous accesses | ||
void propagateBestAlignmentsInChain(ArrayRef<ChainElem> C) const; | ||
}; | ||
|
||
class LoadStoreVectorizerLegacyPass : public FunctionPass { | ||
|
@@ -716,6 +719,14 @@ std::vector<Chain> Vectorizer::splitChainByAlignment(Chain &C) { | |
unsigned AS = getLoadStoreAddressSpace(C[0].Inst); | ||
unsigned VecRegBytes = TTI.getLoadStoreVecRegBitWidth(AS) / 8; | ||
|
||
// We know that the accesses are contiguous. Propagate alignment | ||
// information so that slices of the chain can still be vectorized. | ||
propagateBestAlignmentsInChain(C); | ||
LLVM_DEBUG({ | ||
dbgs() << "LSV: Chain after alignment propagation:\n"; | ||
dumpChain(C); | ||
}); | ||
|
||
std::vector<Chain> Ret; | ||
for (unsigned CBegin = 0; CBegin < C.size(); ++CBegin) { | ||
// Find candidate chains of size not greater than the largest vector reg. | ||
|
@@ -823,6 +834,7 @@ std::vector<Chain> Vectorizer::splitChainByAlignment(Chain &C) { | |
<< Alignment.value() << " to " << NewAlign.value() | ||
<< "\n"); | ||
Alignment = NewAlign; | ||
setLoadStoreAlignment(C[CBegin].Inst, Alignment); | ||
} | ||
} | ||
|
||
|
@@ -880,14 +892,6 @@ bool Vectorizer::vectorizeChain(Chain &C) { | |
VecElemTy, 8 * ChainBytes / DL.getTypeSizeInBits(VecElemTy)); | ||
|
||
Align Alignment = getLoadStoreAlignment(C[0].Inst); | ||
// If this is a load/store of an alloca, we might have upgraded the alloca's | ||
// alignment earlier. Get the new alignment. | ||
if (AS == DL.getAllocaAddrSpace()) { | ||
Alignment = std::max( | ||
Alignment, | ||
getOrEnforceKnownAlignment(getLoadStorePointerOperand(C[0].Inst), | ||
MaybeAlign(), DL, C[0].Inst, nullptr, &DT)); | ||
} | ||
|
||
// All elements of the chain must have the same scalar-type size. | ||
#ifndef NDEBUG | ||
|
@@ -1634,3 +1638,32 @@ std::optional<APInt> Vectorizer::getConstantOffset(Value *PtrA, Value *PtrB, | |
.sextOrTrunc(OrigBitWidth); | ||
return std::nullopt; | ||
} | ||
|
||
void Vectorizer::propagateBestAlignmentsInChain(ArrayRef<ChainElem> C) const { | ||
auto PropagateAlignments = [](auto ChainIt) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Make this a proper helper function instead of a lambda There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, I can't figure out a way to have this take in both the iterator and the reverse iterator without it being a templated helper function. I was leaning towards this lambda approach because it is only used in this one function, do you think a separate helper would still be preferred even if we need to use a template? Open to ideas. |
||
ChainElem BestAlignedElem = *ChainIt.begin(); | ||
Align BestAlignSoFar = getLoadStoreAlignment(BestAlignedElem.Inst); | ||
|
||
for (const ChainElem &E : ChainIt) { | ||
Align OrigAlign = getLoadStoreAlignment(E.Inst); | ||
if (OrigAlign > BestAlignSoFar) { | ||
BestAlignedElem = E; | ||
BestAlignSoFar = OrigAlign; | ||
continue; | ||
} | ||
|
||
APInt DeltaFromBestAlignedElem = | ||
APIntOps::abdu(E.OffsetFromLeader, BestAlignedElem.OffsetFromLeader); | ||
// commonAlignment is equivalent to a greatest common power-of-two | ||
// divisor; it returns the largest power of 2 that divides both A and B. | ||
Align NewAlign = commonAlignment( | ||
BestAlignSoFar, DeltaFromBestAlignedElem.getLimitedValue()); | ||
if (NewAlign > OrigAlign) | ||
setLoadStoreAlignment(E.Inst, NewAlign); | ||
} | ||
}; | ||
|
||
// Propagate forwards and backwards. | ||
PropagateAlignments(C); | ||
PropagateAlignments(reverse(C)); | ||
Comment on lines
+1667
to
+1668
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why does this need to go through twice? I'd expect each load and store to have started with an optimally computed alignment There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I haven't thought of a specific end-to-end test that would trigger this, but I think it can't hurt to include. |
||
} |
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding this change: I don't totally understand why we delay the upgrade of load/store alignment, but I'd imagine it is preferred to minimize IR changes before vectorization is certain.
If this change gets accepted, we are ok with eagerly upgrading load/store alignment. So, for consistency, we should also be ok with eagerly doing so for the existing alloca alignment upgrade optimization. Hence, I am bundling this simplification with this change, which lets us remove this handling in
vectorizeChain
by instead callingsetLoadStoreAlignment
on line 837.There is one test case change that results from this, in
massive_indirection.ll
. This is not a regression, as the target supports unaligned loads. The upgrade from alignment of 8 to an alignment of 4294967296 that happened with this call togetOrEnforceKnownAlignment
is not the intended purpose of this block of code, merely an unintended side effect. This is evidenced by the fact that there are no allocas in the test case.