Skip to content

[OpenMP] [IR Builder] Changes to Support Scan Operation #136035

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

anchuraj
Copy link
Contributor

@anchuraj anchuraj commented Apr 16, 2025

Scan reductions are supported in OpenMP with the help of scan directive. Reduction clause of the for loop/simd directive can take an inscan modifier along with the body of the directive specifying a scan directive. This PR implements the lowering logic for scan reductions in workshare loops of OpenMP.
The body of the for loop is split into two loops (Input phase loop and Scan Phase loop) and a scan reduction loop is added in the middle. The Input phase loop populates a temporary buffer with initial values that are to be reduced. The buffer is used by the reduction loop to perform scan reduction. Scan phase loop copies the values of the buffer to the reduction variable before executing the scan phase. Below is a high level view of the code generated.

<declare pointer to buffer> ptr
omp parallel {
  size num_iters = <num_iters>
  // temp buffer allocation
  omp masked {
    buff = malloc(num_iters*scanvarstype)
    *ptr = buff
  } 
 barrier;
  // input phase loop
  for (i: 0..<num_iters>) {
    <input phase>;
    buffer = *ptr;
    buffer[i] = red;
  }
  // scan reduction
  omp masked
  {
    for (int k = 0; k != ceil(log2(num_iters)); ++k) {
      i=pow(2,k)
      for (size cnt = last_iter; cnt >= i; --cnt) {
        buffer = *ptr;
        buffer[cnt] op= buffer[cnt-i];
      }
    }
  }
 barrier;
 // scan phase loop
  for (0..<num_iters>) {
    buffer = *ptr;
    red = buffer[i] ;
    <scan phase>;
  }
  // temp buffer deletion
  omp masked {
    free(*ptr)
  }
  barrier;
}

The temporary buffer needs to be shared between all threads performing reduction since it is read/written in Input and Scan workshare Loops. This is achieved by declaring a pointer to the buffer in the shared region and dynamically allocating the buffer by the master thread.
This is the reason why allocation, deallocation and scan reduction are performed within masked. The code is verified to produce correct results for Fortran programs with the code changes in the PR #133149

@llvmbot llvmbot added flang:openmp clang:openmp OpenMP related changes to Clang labels Apr 16, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 16, 2025

@llvm/pr-subscribers-mlir
@llvm/pr-subscribers-mlir-openmp

@llvm/pr-subscribers-flang-openmp

Author: Anchu Rajendran S (anchuraj)

Changes

Scan reductions are supported in OpenMP with the the help of scan directive. Reduction clause of the for loop/simd directive takes an inscan modifier if scan reduction is specified. With an inscan modifier, the body of the directive should specify a scan directive. This PR implements the lowering logic for scan reductions in workshare loops of OpenMP.
The body of the for loop is split into two loops (Input phase loop and Scan Phase loop) and a scan reduction loop is added in the middle. The Input phase loop populates a temporary buffer with initial values that are to be reduced. The buffer is used by the reduction loop to perform scan reduction. Scan phase loop copies the values of the buffer to the reduction variable before executing the scan phase.

&lt;declare pointer to buffer&gt; ptr
omp parallel {
  size num_iters = &lt;num_iters&gt;
  // temp buffer allocation
  omp masked {
    buff = malloc(num_iters*scanvarstype)
    *ptr = malloc
  } 
  // input phase loop
  for (i: 0..&lt;num_iters&gt;) {
    &lt;input phase&gt;;
    buffer = *ptr;
    buffer[i] = red;
  }
  // scan reduction
  omp masked
  {
    for (int k = 0; k != ceil(log2(num_iters)); ++k) {
      i=pow(2,k)
      for (size cnt = last_iter; cnt &gt;= i; --cnt) {
        buffer = *ptr;
        buffer[cnt] op= buffer[cnt-i];
      }
    }
  }
 // scan phase loop
  for (0..&lt;num_iters&gt;) {
    buffer = *ptr;
    red = buffer[i] ;
    &lt;scan phase&gt;;
  }
  // temp buffer deletion
  omp masked {
    free(*ptr)
  }

The buffer needs to be shared between all threads performing reduction since is read/write in Input and Scan Loops. This is achieved by declaring a pointer to the buffer in the shared region and dynamically allocating the buffer by the master thread.
This is why allocation, deallocation and scan reduction are performed within masked. The code is verified to run for Fortran programs with the code changes in the PR #133149


Patch is 31.18 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/136035.diff

3 Files Affected:

  • (modified) llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h (+135-4)
  • (modified) llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp (+423-1)
  • (modified) llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp (+65)
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
index 6b104708bdb0d..682e1699078af 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
@@ -503,6 +503,31 @@ class OpenMPIRBuilder {
       return allocaInst;
     }
   };
+
+  struct ScanInformation {
+  public:
+    /// Dominates the body of the loop before scan directive
+    llvm::BasicBlock *OMPBeforeScanBlock = nullptr;
+    /// Dominates the body of the loop before scan directive
+    llvm::BasicBlock *OMPAfterScanBlock = nullptr;
+    /// Controls the flow to before or after scan blocks
+    llvm::BasicBlock *OMPScanDispatch = nullptr;
+    /// Exit block of loop body
+    llvm::BasicBlock *OMPScanLoopExit = nullptr;
+    /// Block before loop body where scan initializations are done
+    llvm::BasicBlock *OMPScanInit = nullptr;
+    /// Block after loop body where scan finalizations are done
+    llvm::BasicBlock *OMPScanFinish = nullptr;
+    /// If true, it indicates Input phase is lowered; else it indicates
+    /// ScanPhase is lowered
+    bool OMPFirstScanLoop = false;
+    // Maps the private reduction variable to the pointer of the temporary
+    // buffer
+    llvm::SmallDenseMap<llvm::Value *, llvm::Value *> ScanBuffPtrs;
+    llvm::Value *IV;
+    llvm::Value *Span;
+  } ScanInfo;
+
   /// Initialize the internal state, this will put structures types and
   /// potentially other helpers into the underlying module. Must be called
   /// before any other method and only once! This internal state includes types
@@ -729,6 +754,35 @@ class OpenMPIRBuilder {
                       LoopBodyGenCallbackTy BodyGenCB, Value *TripCount,
                       const Twine &Name = "loop");
 
+  /// Generator for the control flow structure of an OpenMP canonical loops if
+  /// the parent directive has an `inscan` modifier specified.
+  /// If the `inscan` modifier is specified, the region of the parent is
+  /// expected to have a `scan` directive. Based on the clauses in
+  /// scan directive, the body of the loop is split into two loops: Input loop
+  /// and Scan Loop. Input loop contains the code generated for input phase of
+  /// scan and Scan loop contains the code generated for scan phase of scan.
+  ///
+  /// \param Loc       The insert and source location description.
+  /// \param BodyGenCB Callback that will generate the loop body code.
+  /// \param Start     Value of the loop counter for the first iterations.
+  /// \param Stop      Loop counter values past this will stop the loop.
+  /// \param Step      Loop counter increment after each iteration; negative
+  ///                  means counting down.
+  /// \param IsSigned  Whether Start, Stop and Step are signed integers.
+  /// \param InclusiveStop Whether \p Stop itself is a valid value for the loop
+  ///                      counter.
+  /// \param ComputeIP Insertion point for instructions computing the trip
+  ///                  count. Can be used to ensure the trip count is available
+  ///                  at the outermost loop of a loop nest. If not set,
+  ///                  defaults to the preheader of the generated loop.
+  /// \param Name      Base name used to derive BB and instruction names.
+  ///
+  /// \returns A vector containing Loop Info of Input Loop and Scan Loop.
+  Expected<SmallVector<llvm::CanonicalLoopInfo *>> createCanonicalScanLoops(
+      const LocationDescription &Loc, LoopBodyGenCallbackTy BodyGenCB,
+      Value *Start, Value *Stop, Value *Step, bool IsSigned, bool InclusiveStop,
+      InsertPointTy ComputeIP, const Twine &Name);
+
   /// Calculate the trip count of a canonical loop.
   ///
   /// This allows specifying user-defined loop counter values using increment,
@@ -798,13 +852,16 @@ class OpenMPIRBuilder {
   ///                  at the outermost loop of a loop nest. If not set,
   ///                  defaults to the preheader of the generated loop.
   /// \param Name      Base name used to derive BB and instruction names.
+  /// \param InScan    Whether loop has a scan reduction specified.
   ///
   /// \returns An object representing the created control flow structure which
   ///          can be used for loop-associated directives.
-  Expected<CanonicalLoopInfo *> createCanonicalLoop(
-      const LocationDescription &Loc, LoopBodyGenCallbackTy BodyGenCB,
-      Value *Start, Value *Stop, Value *Step, bool IsSigned, bool InclusiveStop,
-      InsertPointTy ComputeIP = {}, const Twine &Name = "loop");
+  Expected<CanonicalLoopInfo *>
+  createCanonicalLoop(const LocationDescription &Loc,
+                      LoopBodyGenCallbackTy BodyGenCB, Value *Start,
+                      Value *Stop, Value *Step, bool IsSigned,
+                      bool InclusiveStop, InsertPointTy ComputeIP = {},
+                      const Twine &Name = "loop", bool InScan = false);
 
   /// Collapse a loop nest into a single loop.
   ///
@@ -1532,6 +1589,45 @@ class OpenMPIRBuilder {
       ArrayRef<OpenMPIRBuilder::ReductionInfo> ReductionInfos,
       Function *ReduceFn, AttributeList FuncAttrs);
 
+  /// Creates the runtime call specified
+  /// \param Callee Function Declaration Value
+  /// \param Args Arguments passed to the call
+  /// \param Name Optional param to specify the name of the call Instruction.
+  ///
+  /// \return The Runtime call instruction created.
+  llvm::CallInst *emitNoUnwindRuntimeCall(llvm::FunctionCallee Callee,
+                                          ArrayRef<llvm::Value *> Args,
+                                          const llvm::Twine &Name);
+
+  /// Helper function for CreateCanonicalScanLoops to create InputLoop
+  /// in the firstGen and Scan Loop in the SecondGen
+  /// \param InputLoopGen Callback for generating the loop for input phase
+  /// \param ScanLoopGen Callback for generating the loop for scan phase
+  ///
+  /// \return error if any produced, else return success.
+  Error emitScanBasedDirectiveIR(
+      llvm::function_ref<Error()> InputLoopGen,
+      llvm::function_ref<Error(LocationDescription Loc)> ScanLoopGen);
+
+  /// Creates the basic blocks required for scan reduction.
+  void createScanBBs();
+
+  /// Dynamically allocates the buffer needed for scan reduction.
+  /// \param AllocaIP The IP where possibly-shared pointer of buffer needs to be
+  /// declared. \param ScanVars Scan Variables.
+  ///
+  /// \return error if any produced, else return success.
+  Error emitScanBasedDirectiveDeclsIR(InsertPointTy AllocaIP,
+                                      ArrayRef<llvm::Value *> ScanVars,
+                                      ArrayRef<llvm::Type *> ScanVarsType);
+
+  /// Copies the result back to the reduction variable.
+  /// \param ReductionInfos Array type containing the ReductionOps.
+  ///
+  /// \return error if any produced, else return success.
+  Error emitScanBasedDirectiveFinalsIR(
+      SmallVector<llvm::OpenMPIRBuilder::ReductionInfo> ReductionInfos);
+
   /// This function emits a helper that gathers Reduce lists from the first
   /// lane of every active warp to lanes in the first warp.
   ///
@@ -2607,6 +2703,41 @@ class OpenMPIRBuilder {
                                     BodyGenCallbackTy BodyGenCB,
                                     FinalizeCallbackTy FiniCB, Value *Filter);
 
+  /// This function performs the scan reduction of the values updated in
+  /// the input phase. The reduction logic needs to be emitted between input
+  /// and scan loop returned by `CreateCanonicalScanLoops`. The following
+  /// is the code that is generated, `buffer` and `span` are expected to be
+  /// populated before executing the generated code.
+  ///
+  ///  for (int k = 0; k != ceil(log2(span)); ++k) {
+  ///    i=pow(2,k)
+  ///    for (size cnt = last_iter; cnt >= i; --cnt)
+  ///      buffer[cnt] op= buffer[cnt-i];
+  ///  }
+  /// \param Loc The insert and source location description.
+  /// \param ReductionInfos Array type containing the ReductionOps.
+  ///
+  /// \returns The insertion position *after* the masked.
+  InsertPointOrErrorTy emitScanReduction(
+      const LocationDescription &Loc,
+      SmallVector<llvm::OpenMPIRBuilder::ReductionInfo> ReductionInfos);
+
+  /// This directive split and directs the control flow to input phase
+  ///  blocks or scan phase blocks based on 1. whether input loop or scan loop
+  ///  is executed, 2. whether exclusive or inclusive scan is used.
+  ///
+  /// \param Loc The insert and source location description.
+  /// \param AllocaIP The IP where the temporary buffer for scan reduction
+  //                  needs to be allocated.
+  /// \param ScanVars Scan Variables.
+  /// \param IsInclusive Whether it is an inclusive or exclusive scan.
+  ///
+  /// \returns The insertion position *after* the scan.
+  InsertPointOrErrorTy createScan(const LocationDescription &Loc,
+                                  InsertPointTy AllocaIP,
+                                  ArrayRef<llvm::Value *> ScanVars,
+                                  ArrayRef<llvm::Type *> ScanVarsType,
+                                  bool IsInclusive);
   /// Generator for '#omp critical'
   ///
   /// \param Loc The insert and source location description.
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index 28662efc02882..00dc571c6ea77 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -59,6 +59,8 @@
 #include "llvm/Transforms/Utils/LoopPeel.h"
 #include "llvm/Transforms/Utils/UnrollLoop.h"
 
+#include <cassert>
+#include <cstddef>
 #include <cstdint>
 #include <optional>
 
@@ -3981,6 +3983,337 @@ OpenMPIRBuilder::createMasked(const LocationDescription &Loc,
                               /*Conditional*/ true, /*hasFinalize*/ true);
 }
 
+llvm::CallInst *
+OpenMPIRBuilder::emitNoUnwindRuntimeCall(llvm::FunctionCallee Callee,
+                                         ArrayRef<llvm::Value *> Args,
+                                         const llvm::Twine &Name) {
+  llvm::CallInst *Call = Builder.CreateCall(
+      Callee, Args, SmallVector<llvm::OperandBundleDef, 1>(), Name);
+  Call->setDoesNotThrow();
+  return Call;
+}
+
+// Expects input basic block is dominated by BeforeScanBB.
+// Once Scan directive is encountered, the code after scan directive should be
+// dominated by AfterScanBB. Scan directive splits the code sequence to
+// scan and input phase. Based on whether inclusive or exclusive
+// clause is used in the scan directive and whether input loop or scan loop
+// is lowered, it adds jumps to input and scan phase. First Scan loop is the
+// input loop and second is the scan loop. The code generated handles only
+// inclusive scans now.
+OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::createScan(
+    const LocationDescription &Loc, InsertPointTy AllocaIP,
+    ArrayRef<llvm::Value *> ScanVars, ArrayRef<llvm::Type *> ScanVarsType,
+    bool IsInclusive) {
+  if (ScanInfo.OMPFirstScanLoop) {
+    llvm::Error Err =
+        emitScanBasedDirectiveDeclsIR(AllocaIP, ScanVars, ScanVarsType);
+    if (Err) {
+      return Err;
+    }
+  }
+  if (!updateToLocation(Loc))
+    return Loc.IP;
+
+  llvm::Value *IV = ScanInfo.IV;
+
+  if (ScanInfo.OMPFirstScanLoop) {
+    // Emit buffer[i] = red; at the end of the input phase.
+    for (size_t i = 0; i < ScanVars.size(); i++) {
+      Value *BuffPtr = ScanInfo.ScanBuffPtrs[ScanVars[i]];
+      Value *Buff = Builder.CreateLoad(Builder.getPtrTy(), BuffPtr);
+      Type *DestTy = ScanVarsType[i];
+      Value *Val = Builder.CreateInBoundsGEP(DestTy, Buff, IV, "arrayOffset");
+      Value *Src = Builder.CreateLoad(DestTy, ScanVars[i]);
+
+      Builder.CreateStore(Src, Val);
+    }
+  }
+  Builder.CreateBr(ScanInfo.OMPScanLoopExit);
+  emitBlock(ScanInfo.OMPScanDispatch, Builder.GetInsertBlock()->getParent());
+
+  if (!ScanInfo.OMPFirstScanLoop) {
+    IV = ScanInfo.IV;
+    // Emit red = buffer[i]; at the entrance to the scan phase.
+    // TODO: if exclusive scan, the red = buffer[i-1] needs to be updated.
+    for (size_t i = 0; i < ScanVars.size(); i++) {
+      Value *BuffPtr = ScanInfo.ScanBuffPtrs[ScanVars[i]];
+      Value *Buff = Builder.CreateLoad(Builder.getPtrTy(), BuffPtr);
+      Type *DestTy = ScanVarsType[i];
+      Value *SrcPtr =
+          Builder.CreateInBoundsGEP(DestTy, Buff, IV, "arrayOffset");
+      Value *Src = Builder.CreateLoad(DestTy, SrcPtr);
+      Builder.CreateStore(Src, ScanVars[i]);
+    }
+  }
+
+  // TODO: Update it to CreateBr and remove dead blocks
+  llvm::Value *CmpI = Builder.getInt1(true);
+  if (ScanInfo.OMPFirstScanLoop == IsInclusive) {
+    Builder.CreateCondBr(CmpI, ScanInfo.OMPBeforeScanBlock,
+                         ScanInfo.OMPAfterScanBlock);
+  } else {
+    Builder.CreateCondBr(CmpI, ScanInfo.OMPAfterScanBlock,
+                         ScanInfo.OMPBeforeScanBlock);
+  }
+  emitBlock(ScanInfo.OMPAfterScanBlock, Builder.GetInsertBlock()->getParent());
+  Builder.SetInsertPoint(ScanInfo.OMPAfterScanBlock);
+  return Builder.saveIP();
+}
+
+Error OpenMPIRBuilder::emitScanBasedDirectiveDeclsIR(
+    InsertPointTy AllocaIP, ArrayRef<Value *> ScanVars,
+    ArrayRef<Type *> ScanVarsType) {
+
+  Builder.restoreIP(AllocaIP);
+  // Create the shared pointer at alloca IP.
+  for (size_t i = 0; i < ScanVars.size(); i++) {
+    llvm::Value *BuffPtr =
+        Builder.CreateAlloca(Builder.getPtrTy(), nullptr, "vla");
+    ScanInfo.ScanBuffPtrs[ScanVars[i]] = BuffPtr;
+  }
+
+  // Allocate temporary buffer by master thread
+  auto BodyGenCB = [&](InsertPointTy AllocaIP,
+                       InsertPointTy CodeGenIP) -> Error {
+    Builder.restoreIP(CodeGenIP);
+    Value *AllocSpan = Builder.CreateAdd(ScanInfo.Span, Builder.getInt32(1));
+    for (size_t i = 0; i < ScanVars.size(); i++) {
+      Type *IntPtrTy = Builder.getInt32Ty();
+      Constant *Allocsize = ConstantExpr::getSizeOf(ScanVarsType[i]);
+      Allocsize = ConstantExpr::getTruncOrBitCast(Allocsize, IntPtrTy);
+      Value *Buff = Builder.CreateMalloc(IntPtrTy, ScanVarsType[i], Allocsize,
+                                         AllocSpan, nullptr, "arr");
+      Builder.CreateStore(Buff, ScanInfo.ScanBuffPtrs[ScanVars[i]]);
+    }
+    return Error::success();
+  };
+  // TODO: Perform finalization actions for variables. This has to be
+  // called for variables which have destructors/finalizers.
+  auto FiniCB = [&](InsertPointTy CodeGenIP) { return llvm::Error::success(); };
+
+  Builder.SetInsertPoint(ScanInfo.OMPScanInit->getTerminator());
+  llvm::Value *FilterVal = Builder.getInt32(0);
+  llvm::OpenMPIRBuilder::InsertPointOrErrorTy AfterIP =
+      createMasked(Builder.saveIP(), BodyGenCB, FiniCB, FilterVal);
+
+  if (!AfterIP)
+    return AfterIP.takeError();
+  Builder.restoreIP(*AfterIP);
+  BasicBlock *InputBB = Builder.GetInsertBlock();
+  if (InputBB->getTerminator())
+    Builder.SetInsertPoint(Builder.GetInsertBlock()->getTerminator());
+  AfterIP = createBarrier(Builder.saveIP(), llvm::omp::OMPD_barrier);
+  if (!AfterIP)
+    return AfterIP.takeError();
+  Builder.restoreIP(*AfterIP);
+
+  return Error::success();
+}
+
+Error OpenMPIRBuilder::emitScanBasedDirectiveFinalsIR(
+    SmallVector<ReductionInfo> ReductionInfos) {
+  auto BodyGenCB = [&](InsertPointTy AllocaIP,
+                       InsertPointTy CodeGenIP) -> Error {
+    Builder.restoreIP(CodeGenIP);
+    for (ReductionInfo RedInfo : ReductionInfos) {
+      Value *PrivateVar = RedInfo.PrivateVariable;
+      Value *OrigVar = RedInfo.Variable;
+      Value *BuffPtr = ScanInfo.ScanBuffPtrs[PrivateVar];
+      Value *Buff = Builder.CreateLoad(Builder.getPtrTy(), BuffPtr);
+
+      Type *SrcTy = RedInfo.ElementType;
+      Value *Val =
+          Builder.CreateInBoundsGEP(SrcTy, Buff, ScanInfo.Span, "arrayOffset");
+      Value *Src = Builder.CreateLoad(SrcTy, Val);
+
+      Builder.CreateStore(Src, OrigVar);
+      Builder.CreateFree(Buff);
+    }
+    return Error::success();
+  };
+  // TODO: Perform finalization actions for variables. This has to be
+  // called for variables which have destructors/finalizers.
+  auto FiniCB = [&](InsertPointTy CodeGenIP) { return llvm::Error::success(); };
+
+  if (ScanInfo.OMPScanFinish->getTerminator())
+    Builder.SetInsertPoint(ScanInfo.OMPScanFinish->getTerminator());
+  else
+    Builder.SetInsertPoint(ScanInfo.OMPScanFinish);
+
+  llvm::Value *FilterVal = Builder.getInt32(0);
+  llvm::OpenMPIRBuilder::InsertPointOrErrorTy AfterIP =
+      createMasked(Builder.saveIP(), BodyGenCB, FiniCB, FilterVal);
+
+  if (!AfterIP)
+    return AfterIP.takeError();
+  Builder.restoreIP(*AfterIP);
+  BasicBlock *InputBB = Builder.GetInsertBlock();
+  if (InputBB->getTerminator())
+    Builder.SetInsertPoint(Builder.GetInsertBlock()->getTerminator());
+  AfterIP = createBarrier(Builder.saveIP(), llvm::omp::OMPD_barrier);
+  if (!AfterIP)
+    return AfterIP.takeError();
+  Builder.restoreIP(*AfterIP);
+  return Error::success();
+}
+
+OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::emitScanReduction(
+    const LocationDescription &Loc,
+    SmallVector<llvm::OpenMPIRBuilder::ReductionInfo> ReductionInfos) {
+
+  if (!updateToLocation(Loc))
+    return Loc.IP;
+  auto BodyGenCB = [&](InsertPointTy AllocaIP,
+                       InsertPointTy CodeGenIP) -> Error {
+    Builder.restoreIP(CodeGenIP);
+    Function *CurFn = Builder.GetInsertBlock()->getParent();
+    // for (int k = 0; k <= ceil(log2(n)); ++k)
+    llvm::BasicBlock *LoopBB =
+        BasicBlock::Create(CurFn->getContext(), "omp.outer.log.scan.body");
+    llvm::BasicBlock *ExitBB =
+        splitBB(Builder, false, "omp.outer.log.scan.exit");
+    llvm::Function *F = llvm::Intrinsic::getOrInsertDeclaration(
+        Builder.GetInsertBlock()->getModule(),
+        (llvm::Intrinsic::ID)llvm::Intrinsic::log2, Builder.getDoubleTy());
+    llvm::BasicBlock *InputBB = Builder.GetInsertBlock();
+    llvm::Value *Arg =
+        Builder.CreateUIToFP(ScanInfo.Span, Builder.getDoubleTy());
+    llvm::Value *LogVal = emitNoUnwindRuntimeCall(F, Arg, "");
+    F = llvm::Intrinsic::getOrInsertDeclaration(
+        Builder.GetInsertBlock()->getModule(),
+        (llvm::Intrinsic::ID)llvm::Intrinsic::ceil, Builder.getDoubleTy());
+    LogVal = emitNoUnwindRuntimeCall(F, LogVal, "");
+    LogVal = Builder.CreateFPToUI(LogVal, Builder.getInt32Ty());
+    llvm::Value *NMin1 = Builder.CreateNUWSub(
+        ScanInfo.Span, llvm::ConstantInt::get(ScanInfo.Span->getType(), 1));
+    Builder.SetInsertPoint(InputBB);
+    Builder.CreateBr(LoopBB);
+    emitBlock(LoopBB, Builder.GetInsertBlock()->getParent());
+    Builder.SetInsertPoint(LoopBB);
+
+    PHINode *Counter = Builder.CreatePHI(Builder.getInt32Ty(), 2);
+    //// size pow2k = 1;
+    PHINode *Pow2K = Builder.CreatePHI(Builder.getInt32Ty(), 2);
+    Counter->addIncoming(llvm::ConstantInt::get(Builder.getInt32Ty(), 0),
+                         InputBB);
+    Pow2K->addIncoming(llvm::ConstantInt::get(Builder.getInt32Ty(), 1),
+                       InputBB);
+    //// for (size i = n - 1; i >= 2 ^ k; --i)
+    ////   tmp[i] op= tmp[i-pow2k];
+    llvm::BasicBlock *InnerLoopBB =
+        BasicBlock::Create(CurFn->getContext(), "omp.inner.log.scan.body");
+    llvm::BasicBlock *InnerExitBB =
+        BasicBlock::Create(CurFn->getContext(), "omp.inner.log.scan.exit");
+    llvm::Value *CmpI = Builder.CreateICmpUGE(NMin1, Pow2K);
+    Builder.CreateCondBr(CmpI, InnerLoopBB, InnerExitBB);
+    emitBlock(InnerLoopBB, Builder.GetInsertBlock()->getParent());
+    Builder.SetInsertPoint(InnerLoopBB);
+    auto *IVal = Builder.CreatePHI(Builder.getInt32Ty(), 2);
+    IVal->addIncoming(NMin1, LoopBB);
+    for (ReductionInfo RedInfo : ReductionInfos) {
+      Value *ReductionVal = RedInfo.PrivateVariable;
+      Value *BuffPtr = ScanInfo.ScanBuffPtrs[ReductionVal];
+      Value *Buff = Builder.CreateLoad(Builder.getPtrTy(), BuffPtr);
+      Type *DestTy = RedInfo.ElementType;
+      Value *IV = Builder.CreateAdd(IVal, Builder.getInt32(1));
+      Value *LHSPtr =
+          Builder.CreateInBoundsGEP(DestTy, Buff, IV, "arrayOffset");
+      Value *OffsetIval = Builder.CreateNUWSub(IV, Pow2K);
+      Value *RHSPtr =
+          Builder.CreateInBoundsGEP(DestTy, Buff, OffsetIval, "arrayOffset");
+      Value *LHS = Builder.CreateLoad(DestTy, LHSPtr);
+ ...
[truncated]

@anchuraj anchuraj changed the title IR Builder Changes to Support Scan Operation [OpenMP] [IR Builder] Changes to Support Scan Operation Apr 17, 2025
@anchuraj anchuraj requested a review from kparzysz April 17, 2025 15:55
@anchuraj anchuraj force-pushed the scan-IRBuilder-Support branch from 108b0ce to 6d59b7e Compare April 18, 2025 18:14
Copy link

github-actions bot commented Apr 18, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@anchuraj anchuraj force-pushed the scan-IRBuilder-Support branch from 78f656e to 6d59b7e Compare April 18, 2025 20:40
@anchuraj anchuraj requested a review from jsjodin April 21, 2025 14:47
@anchuraj anchuraj requested review from TIFitis and tblah April 30, 2025 21:12
@anchuraj anchuraj force-pushed the scan-IRBuilder-Support branch from 6d59b7e to 0ad35dc Compare May 1, 2025 21:41
@anchuraj anchuraj force-pushed the scan-IRBuilder-Support branch 2 times, most recently from 4b50e9c to 323d3e3 Compare June 13, 2025 20:19
@anchuraj anchuraj requested a review from abidh June 16, 2025 18:34
Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of relying on createScan to be called withing the BodyGen callback, have you considered createCanonicalScanLoops to receive two callbacks: one for the input phase, and another for the scan phase?

llvm::SmallDenseMap<llvm::Value *, llvm::Value *> ScanBuffPtrs;
llvm::Value *IV;
llvm::Value *Span;
} ScanInfo;
Copy link
Member

@Meinersbur Meinersbur Jul 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} ScanInfo;
};

Usually it is not a good idea to remember such data between methods calls. Better pass a handle object between methods calls, such as AtomicInfo and CanonicalLoopInfo.

The type would consistently be called ScanInfo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to object passing. Thank you!

Comment on lines 4459 to 4474
/// InputBlock
/// |
/// ContinueBlock
///
/// is transformed to:
///
/// InputBlock
/// |
/// OMPScanDispatch
///
/// OMPBeforeScanBlock
/// |
/// OMPScanLoopExit
/// |
/// ContinueBlock
///
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great as documentation of the ScanInformation type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the documentation to ScanInfo and added extra documentation

Comment on lines 4601 to 4602
if (InScan)
ScanInfo.IV = IndVar;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is as bad as using globals to pass information between calls. You can nest multiple inscan-loops. At least you would need to make it a stack like FinalizationStack if you need to keep track which outer loop we are currently in, but preferably pass CanonicalLoopInfo or ScanInformation result to whatever needs the IV.

#130135 was meant to generalize the concept of region nesting, avoiding each having nestable region construct manages its own stack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated it to use ScanInfo.

@@ -5349,6 +5350,100 @@ TEST_F(OpenMPIRBuilderTest, CreateReductions) {
EXPECT_TRUE(findGEPZeroOne(ReductionFn->getArg(1), FirstRHS, SecondRHS));
}

void createScan(llvm::Value *scanVar, llvm::Type *scanType,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not inline this into the lambda where it is called?

If there is a reason to keep it separate, add static

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated it to static. The body gen lambda expects return type of type Error. When the assert of createScan was used when inlined in bodygen lambda, it was complaining that returning void by assert is incompatible.

/// expected to have a `scan` directive. Based on the clauses in
/// scan directive, the body of the loop is split into two loops: Input loop
/// and Scan Loop. Input loop contains the code generated for input phase of
/// scan and Scan loop contains the code generated for scan phase of scan.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also document on how it is to be used, i.g. that createScan is supposed to be called within the body callback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Thank you!

@anchuraj anchuraj force-pushed the scan-IRBuilder-Support branch 3 times, most recently from 2bcfb8e to 5343c53 Compare July 27, 2025 23:35
@anchuraj anchuraj requested a review from Meinersbur July 27, 2025 23:36
@anchuraj
Copy link
Contributor Author

Instead of relying on createScan to be called withing the BodyGen callback, have you considered createCanonicalScanLoops to receive two callbacks: one for the input phase, and another for the scan phase?

Thank you @Meinersbur for the review. I have tried to address the review comments.
For the above comment, I do not fully understand the design mentioned. I tried to create two lambdas in ScanInfoInitialize and stored it as two members (of type lambda) of the class ScanInfo, so the flang frontend calls these two lambdas when scan directive is lowered. However, that does not seem to simplify the design.

@anchuraj anchuraj force-pushed the scan-IRBuilder-Support branch from 5343c53 to a8fb6bd Compare July 27, 2025 23:44
Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the above comment, I do not fully understand the design mentioned. I tried to create two lambdas in ScanInfoInitialize and stored it as two members (of type lambda) of the class ScanInfo, so the flang frontend calls these two lambdas when scan directive is lowered. However, that does not seem to simplify the design.

createScanLoop(LoopBodyGenCallbackTy InputPhaseCb, LoopBodyGenCallbackTy ScanPhaseCb, llvm::Value *scanVar, llvm::Type *scanType, ...)

It certainly makes the implementation more complicated, the question is by how much.

The reason why I think this would be preferable is that it does not the entire scan construct is built by a single call. Compared to an outer createCanonicalScanLoops where createScan has to be called inside the callback. This forces frontends to structure their CodeGen a certain way, and may overlook weird combinations in-between createCanonicalScanLoops and createScan such as createCritical:

#pragma omp for reduction(inscan,+:x)
for (int i = 0; i < n; ++i) {
  #pragma omp critical
  { 
    x += A[i];
    #pragma omp scan inclusive(x)
    B[i] = x;
  }
}

My question is whether you have considered such a design and it seems now you have (and considered not worth the complexity). Given that (IMHO unfortunately) the entire OpenMP builder is already designed to have specific things happen within lambdas. your current design would not be unusual.

@@ -3774,6 +3900,84 @@ class CanonicalLoopInfo {
LLVM_ABI void invalidate();
};

/// ScanInfo holds the information to assist in lowering of Scan reduction.
/// Before lowering, body of the for loop specifying scan reduction is expected
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Before lowering, body of the for loop specifying scan reduction is expected
/// Before lowering, the body of the for loop specifying scan reduction is expected

Comment on lines +3905 to +3906
/// to have the following structure
/// Loop Body Entry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// to have the following structure
/// Loop Body Entry
/// to have the following structure
///
/// Loop Body Entry

an additional newline keeps clang-format/doxygen from merging the lines

Comment on lines +3945 to +3947
/// `ScanLoopFinish` block.

class ScanInfo {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// `ScanLoopFinish` block.
class ScanInfo {
/// `ScanLoopFinish` block.
class ScanInfo {

Comment on lines +3950 to +3952
llvm::BasicBlock *OMPBeforeScanBlock = nullptr;
/// Dominates the body of the loop before scan directive
llvm::BasicBlock *OMPAfterScanBlock = nullptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
llvm::BasicBlock *OMPBeforeScanBlock = nullptr;
/// Dominates the body of the loop before scan directive
llvm::BasicBlock *OMPAfterScanBlock = nullptr;
llvm::BasicBlock *OMPBeforeScanBlock = nullptr;
/// Dominates the body of the loop before scan directive
llvm::BasicBlock *OMPAfterScanBlock = nullptr;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space between doxygen comments and previous declaration

Builder.SetInsertPoint(LoopBB);

PHINode *Counter = Builder.CreatePHI(Builder.getInt32Ty(), 2);
//// size pow2k = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//// size pow2k = 1;
// size pow2k = 1;

};

const auto &&InputLoopGen = [&]() -> Error {
auto LoopInfo = createCanonicalLoop(Builder.saveIP(), BodyGen, Start, Stop,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto LoopInfo = createCanonicalLoop(Builder.saveIP(), BodyGen, Start, Stop,
CanonicalLoopInfo* LoopInfo = createCanonicalLoop(Builder.saveIP(), BodyGen, Start, Stop,

// <scan phase>;
// }
ScanRedInfo->OMPFirstScanLoop = false;
auto Result = ScanLoopGen(Builder.saveIP());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto Result = ScanLoopGen(Builder.saveIP());
Error Result = ScanLoopGen(Builder.saveIP());

@@ -1,3 +1,4 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

please avoid unrelated changes

unsigned NumLog = 0;
unsigned NumCeil = 0;
for (Instruction &I : instructions(F)) {
if (isa<CallInst>(I)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (isa<CallInst>(I)) {
if (!isa<CallInst>(I))
continue;

reduce indention

for (Instruction &I : instructions(F)) {
if (isa<CallInst>(I)) {
CallInst *Call = dyn_cast<CallInst>(&I);
auto Name = Call->getCalledFunction()->getName();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No almost-always-auto

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants