[LV][EVL] Restore inbounds flag in VPVectorEndPointerRecipe when using EVL tail folding #145306

Mel-Chen · 2025-06-23T10:46:05Z

When tail folding by mask, the inbounds flag on VPVectorEndPointerRecipe must be removed because we may compute addresses that would not be accessed in the original scalar loop.
However, when tail folding by EVL , we are guaranteed to stay within the bounds of the original loop since the VF operand of VPVectorEndPointerRecipe is replaced by EVL, which limits the access range.
This patch restores the inbounds flag in VPVectorEndPointerRecipe when EVL tail folding is enabled.

…g EVL tail folding

llvmbot · 2025-06-23T10:46:33Z

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Mel Chen (Mel-Chen)

Changes

When tail folding by mask, the inbounds flag on VPVectorEndPointerRecipe must be removed because we may compute addresses that would not be accessed in the original scalar loop.
However, when tail folding by EVL , we are guaranteed to stay within the bounds of the original loop since the VF operand of VPVectorEndPointerRecipe is replaced by EVL, which limits the access range.
This patch restores the inbounds flag in VPVectorEndPointerRecipe when EVL tail folding is enabled.

Full diff: https://github.com/llvm/llvm-project/pull/145306.diff

3 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+5-1)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+24-5)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-reverse-load-store.ll (+8-8)

diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index f4163b0743a9a..34ac50f286c5f 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -1700,9 +1700,13 @@ class VPVectorEndPointerRecipe : public VPRecipeWithIRFlags,
 
   VP_CLASSOF_IMPL(VPDef::VPVectorEndPointerSC)
 
+  VPValue *getPtr() const { return getOperand(0); }
+
   VPValue *getVFValue() { return getOperand(1); }
   const VPValue *getVFValue() const { return getOperand(1); }
 
+  Type *getIndexedTy() const { return IndexedTy; };
+
   void execute(VPTransformState &State) override;
 
   bool onlyFirstLaneUsed(const VPValue *Op) const override {
@@ -1727,7 +1731,7 @@ class VPVectorEndPointerRecipe : public VPRecipeWithIRFlags,
   }
 
   VPVectorEndPointerRecipe *clone() override {
-    return new VPVectorEndPointerRecipe(getOperand(0), getVFValue(), IndexedTy,
+    return new VPVectorEndPointerRecipe(getPtr(), getVFValue(), IndexedTy,
                                         getGEPNoWrapFlags(), getDebugLoc());
   }
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index ac6be09ef271d..c46d004df9be7 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -2171,6 +2171,29 @@ static VPRecipeBase *createEVLRecipe(VPValue *HeaderMask,
       .Default([&](VPRecipeBase *R) { return nullptr; });
 }
 
+// Replace VF operands with EVL in recipes that use VF.
+static void replaceVFWithEVL(VPValue &VF, VPValue &EVL) {
+  for (VPUser *U : to_vector(VF.users())) {
+    if (auto *VEP = dyn_cast<VPVectorEndPointerRecipe>(U)) {
+      VPValue *Ptr = VEP->getPtr();
+      GEPNoWrapFlags NewFlags = VEP->getGEPNoWrapFlags();
+      // Replacing VF with EVL ensures that VPVectorEndPointer will not compute
+      // out-of-bounds, as long as the original pointer had the inbounds flag.
+      if (!NewFlags.isInBounds()) {
+        if (auto *GEP = dyn_cast<GetElementPtrInst>(
+                Ptr->getUnderlyingValue()->stripPointerCasts()))
+          if (GEP->isInBounds())
+            NewFlags = GEPNoWrapFlags::inBounds();
+      }
+      auto *NewVEP = new VPVectorEndPointerRecipe(
+          Ptr, &EVL, VEP->getIndexedTy(), NewFlags, VEP->getDebugLoc());
+      NewVEP->insertBefore(VEP);
+      VEP->replaceAllUsesWith(NewVEP);
+      VEP->eraseFromParent();
+    }
+  }
+}
+
 /// Replace recipes with their EVL variants.
 static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
   Type *CanonicalIVType = Plan.getCanonicalIV()->getScalarType();
@@ -2198,12 +2221,8 @@ static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
     PrevEVL = Builder.createScalarPhi({MaxEVL, &EVL}, DebugLoc(), "prev.evl");
   }
 
-  for (VPUser *U : to_vector(Plan.getVF().users())) {
-    if (auto *R = dyn_cast<VPVectorEndPointerRecipe>(U))
-      R->setOperand(1, &EVL);
-  }
-
   SmallVector<VPRecipeBase *> ToErase;
+  replaceVFWithEVL(Plan.getVF(), EVL);
 
   for (VPValue *HeaderMask : collectAllHeaderMasks(Plan)) {
     for (VPUser *U : collectUsersRecursively(HeaderMask)) {
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-reverse-load-store.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-reverse-load-store.ll
index 96db5bf4e9acc..dc2683e20ade9 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-reverse-load-store.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-reverse-load-store.ll
@@ -34,16 +34,16 @@ define void @reverse_load_store(i64 %startval, ptr noalias %ptr, ptr noalias %pt
 ; IF-EVL-NEXT:    [[TMP18:%.*]] = zext i32 [[TMP5]] to i64
 ; IF-EVL-NEXT:    [[TMP9:%.*]] = mul i64 0, [[TMP18]]
 ; IF-EVL-NEXT:    [[TMP10:%.*]] = sub i64 1, [[TMP18]]
-; IF-EVL-NEXT:    [[TMP16:%.*]] = getelementptr i32, ptr [[TMP8]], i64 [[TMP9]]
-; IF-EVL-NEXT:    [[TMP12:%.*]] = getelementptr i32, ptr [[TMP16]], i64 [[TMP10]]
+; IF-EVL-NEXT:    [[TMP11:%.*]] = getelementptr inbounds i32, ptr [[TMP8]], i64 [[TMP9]]
+; IF-EVL-NEXT:    [[TMP12:%.*]] = getelementptr inbounds i32, ptr [[TMP11]], i64 [[TMP10]]
 ; IF-EVL-NEXT:    [[VP_OP_LOAD:%.*]] = call <vscale x 4 x i32> @llvm.vp.load.nxv4i32.p0(ptr align 4 [[TMP12]], <vscale x 4 x i1> splat (i1 true), i32 [[TMP5]])
 ; IF-EVL-NEXT:    [[VP_REVERSE:%.*]] = call <vscale x 4 x i32> @llvm.experimental.vp.reverse.nxv4i32(<vscale x 4 x i32> [[VP_OP_LOAD]], <vscale x 4 x i1> splat (i1 true), i32 [[TMP5]])
 ; IF-EVL-NEXT:    [[TMP13:%.*]] = getelementptr inbounds i32, ptr [[PTR2:%.*]], i64 [[TMP7]]
 ; IF-EVL-NEXT:    [[TMP19:%.*]] = zext i32 [[TMP5]] to i64
 ; IF-EVL-NEXT:    [[TMP14:%.*]] = mul i64 0, [[TMP19]]
 ; IF-EVL-NEXT:    [[TMP15:%.*]] = sub i64 1, [[TMP19]]
-; IF-EVL-NEXT:    [[TMP22:%.*]] = getelementptr i32, ptr [[TMP13]], i64 [[TMP14]]
-; IF-EVL-NEXT:    [[TMP17:%.*]] = getelementptr i32, ptr [[TMP22]], i64 [[TMP15]]
+; IF-EVL-NEXT:    [[TMP22:%.*]] = getelementptr inbounds i32, ptr [[TMP13]], i64 [[TMP14]]
+; IF-EVL-NEXT:    [[TMP17:%.*]] = getelementptr inbounds i32, ptr [[TMP22]], i64 [[TMP15]]
 ; IF-EVL-NEXT:    [[VP_REVERSE3:%.*]] = call <vscale x 4 x i32> @llvm.experimental.vp.reverse.nxv4i32(<vscale x 4 x i32> [[VP_REVERSE]], <vscale x 4 x i1> splat (i1 true), i32 [[TMP5]])
 ; IF-EVL-NEXT:    call void @llvm.vp.store.nxv4i32.p0(<vscale x 4 x i32> [[VP_REVERSE3]], ptr align 4 [[TMP17]], <vscale x 4 x i1> splat (i1 true), i32 [[TMP5]])
 ; IF-EVL-NEXT:    [[TMP20:%.*]] = zext i32 [[TMP5]] to i64
@@ -137,8 +137,8 @@ define void @reverse_load_store_masked(i64 %startval, ptr noalias %ptr, ptr noal
 ; IF-EVL-NEXT:    [[TMP26:%.*]] = zext i32 [[TMP5]] to i64
 ; IF-EVL-NEXT:    [[TMP17:%.*]] = mul i64 0, [[TMP26]]
 ; IF-EVL-NEXT:    [[TMP18:%.*]] = sub i64 1, [[TMP26]]
-; IF-EVL-NEXT:    [[TMP19:%.*]] = getelementptr i32, ptr [[TMP16]], i64 [[TMP17]]
-; IF-EVL-NEXT:    [[TMP20:%.*]] = getelementptr i32, ptr [[TMP19]], i64 [[TMP18]]
+; IF-EVL-NEXT:    [[TMP15:%.*]] = getelementptr inbounds i32, ptr [[TMP16]], i64 [[TMP17]]
+; IF-EVL-NEXT:    [[TMP20:%.*]] = getelementptr inbounds i32, ptr [[TMP15]], i64 [[TMP18]]
 ; IF-EVL-NEXT:    [[VP_REVERSE_MASK:%.*]] = call <vscale x 4 x i1> @llvm.experimental.vp.reverse.nxv4i1(<vscale x 4 x i1> [[TMP14]], <vscale x 4 x i1> splat (i1 true), i32 [[TMP5]])
 ; IF-EVL-NEXT:    [[VP_OP_LOAD4:%.*]] = call <vscale x 4 x i32> @llvm.vp.load.nxv4i32.p0(ptr align 4 [[TMP20]], <vscale x 4 x i1> [[VP_REVERSE_MASK]], i32 [[TMP5]])
 ; IF-EVL-NEXT:    [[VP_REVERSE:%.*]] = call <vscale x 4 x i32> @llvm.experimental.vp.reverse.nxv4i32(<vscale x 4 x i32> [[VP_OP_LOAD4]], <vscale x 4 x i1> splat (i1 true), i32 [[TMP5]])
@@ -146,8 +146,8 @@ define void @reverse_load_store_masked(i64 %startval, ptr noalias %ptr, ptr noal
 ; IF-EVL-NEXT:    [[TMP27:%.*]] = zext i32 [[TMP5]] to i64
 ; IF-EVL-NEXT:    [[TMP22:%.*]] = mul i64 0, [[TMP27]]
 ; IF-EVL-NEXT:    [[TMP23:%.*]] = sub i64 1, [[TMP27]]
-; IF-EVL-NEXT:    [[TMP24:%.*]] = getelementptr i32, ptr [[TMP21]], i64 [[TMP22]]
-; IF-EVL-NEXT:    [[TMP25:%.*]] = getelementptr i32, ptr [[TMP24]], i64 [[TMP23]]
+; IF-EVL-NEXT:    [[TMP24:%.*]] = getelementptr inbounds i32, ptr [[TMP21]], i64 [[TMP22]]
+; IF-EVL-NEXT:    [[TMP25:%.*]] = getelementptr inbounds i32, ptr [[TMP24]], i64 [[TMP23]]
 ; IF-EVL-NEXT:    [[VP_REVERSE5:%.*]] = call <vscale x 4 x i32> @llvm.experimental.vp.reverse.nxv4i32(<vscale x 4 x i32> [[VP_REVERSE]], <vscale x 4 x i1> splat (i1 true), i32 [[TMP5]])
 ; IF-EVL-NEXT:    [[VP_REVERSE_MASK6:%.*]] = call <vscale x 4 x i1> @llvm.experimental.vp.reverse.nxv4i1(<vscale x 4 x i1> [[TMP14]], <vscale x 4 x i1> splat (i1 true), i32 [[TMP5]])
 ; IF-EVL-NEXT:    call void @llvm.vp.store.nxv4i32.p0(<vscale x 4 x i32> [[VP_REVERSE5]], ptr align 4 [[TMP25]], <vscale x 4 x i1> [[VP_REVERSE_MASK6]], i32 [[TMP5]])

alexey-bataev · 2025-06-23T16:58:30Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

@@ -2171,6 +2171,29 @@ static VPRecipeBase *createEVLRecipe(VPValue *HeaderMask,
      .Default([&](VPRecipeBase *R) { return nullptr; });
 }

+// Replace VF operands with EVL in recipes that use VF.


Use /// style

Should it be more specific about VPVectorEndPointerRecipe?

alexey-bataev · 2025-06-23T16:59:33Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+// Replace VF operands with EVL in recipes that use VF.
+static void replaceVFWithEVL(VPValue &VF, VPValue &EVL) {
+  for (VPUser *U : to_vector(VF.users())) {
+    if (auto *VEP = dyn_cast<VPVectorEndPointerRecipe>(U)) {


Early exit, if the condition is false, to reduce structure complexity?

fhahn · 2025-06-23T18:33:13Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+      // Replacing VF with EVL ensures that VPVectorEndPointer will not compute
+      // out-of-bounds, as long as the original pointer had the inbounds flag.
+      if (!NewFlags.isInBounds()) {
+        if (auto *GEP = dyn_cast<GetElementPtrInst>(


We shouldn't rely on the the underlying GEP to re-set inbounds. There might be other reasons for why the underlying inbounds may not be valid any longer, e.g. if the original GEP + memory op where executed conditionally and the loop and are unconditionally executed now.

How critical is this? If it is important would be good to think of a better way to preserve the inbounds until EVL recipes are added or recover it w/o looking at the underlying IR

Makes sense. I think we may have two possible approaches to fix this:

We can still restore the inbounds flag in VPlanTransforms::tryAddExplicitVectorLength, but only if we verify that all memory access users of the VPVectorEndPointer have had their masks converted to nullptr. However, this still depends on the underlying GEP — if the original GEP doesn't have the inbounds flag, we can't guarantee that the VPVectorEndPointer is inbounds either.

Alternatively, we could move this earlier and set the correct flag during VPRecipeBuilder::tryToWidenMemory when CM.foldTailByEVL() is true. Perhaps LoopVectorizationLegality::blockNeedsPredication can help determine whether a memory access is conditional.

This is a side project, and I'm not sure how critical it is. If the cost of implementing an acceptable fix exceeds the above two options, maybe we can defer it.

[LV][EVL] Restore inbounds flag in VPVectorEndPointerRecipe when usin…

904d77b

…g EVL tail folding

Mel-Chen requested review from fhahn, lukel97, arcbbb and alexey-bataev June 23, 2025 10:46

llvmbot added vectorizers llvm:transforms labels Jun 23, 2025

Mel-Chen mentioned this pull request Jun 23, 2025

[VPlan] Emit VPVectorEndPointerRecipe for reverse interleave pointer adjustment #144864

Open

alexey-bataev reviewed Jun 23, 2025

View reviewed changes

fhahn reviewed Jun 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LV][EVL] Restore inbounds flag in VPVectorEndPointerRecipe when using EVL tail folding #145306

[LV][EVL] Restore inbounds flag in VPVectorEndPointerRecipe when using EVL tail folding #145306

Uh oh!

Mel-Chen commented Jun 23, 2025

Uh oh!

llvmbot commented Jun 23, 2025 •

edited

Loading

Uh oh!

alexey-bataev Jun 23, 2025

Uh oh!

alexey-bataev Jun 23, 2025

Uh oh!

fhahn Jun 23, 2025

Uh oh!

Mel-Chen Jun 24, 2025

Uh oh!

Uh oh!

[LV][EVL] Restore inbounds flag in VPVectorEndPointerRecipe when using EVL tail folding #145306

Are you sure you want to change the base?

[LV][EVL] Restore inbounds flag in VPVectorEndPointerRecipe when using EVL tail folding #145306

Uh oh!

Conversation

Mel-Chen commented Jun 23, 2025

Uh oh!

llvmbot commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexey-bataev Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

alexey-bataev Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

fhahn Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

Mel-Chen Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvmbot commented Jun 23, 2025 •

edited

Loading