-
Notifications
You must be signed in to change notification settings - Fork 14.6k
[OpenMP] [IR Builder] Changes to Support Scan Operation #136035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-mlir @llvm/pr-subscribers-flang-openmp Author: Anchu Rajendran S (anchuraj) ChangesScan reductions are supported in OpenMP with the the help of scan directive. Reduction clause of the for loop/simd directive takes an
The buffer needs to be shared between all threads performing reduction since is read/write in Input and Scan Loops. This is achieved by declaring a pointer to the buffer in the shared region and dynamically allocating the buffer by the master thread. Patch is 31.18 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/136035.diff 3 Files Affected:
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
index 6b104708bdb0d..682e1699078af 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
@@ -503,6 +503,31 @@ class OpenMPIRBuilder {
return allocaInst;
}
};
+
+ struct ScanInformation {
+ public:
+ /// Dominates the body of the loop before scan directive
+ llvm::BasicBlock *OMPBeforeScanBlock = nullptr;
+ /// Dominates the body of the loop before scan directive
+ llvm::BasicBlock *OMPAfterScanBlock = nullptr;
+ /// Controls the flow to before or after scan blocks
+ llvm::BasicBlock *OMPScanDispatch = nullptr;
+ /// Exit block of loop body
+ llvm::BasicBlock *OMPScanLoopExit = nullptr;
+ /// Block before loop body where scan initializations are done
+ llvm::BasicBlock *OMPScanInit = nullptr;
+ /// Block after loop body where scan finalizations are done
+ llvm::BasicBlock *OMPScanFinish = nullptr;
+ /// If true, it indicates Input phase is lowered; else it indicates
+ /// ScanPhase is lowered
+ bool OMPFirstScanLoop = false;
+ // Maps the private reduction variable to the pointer of the temporary
+ // buffer
+ llvm::SmallDenseMap<llvm::Value *, llvm::Value *> ScanBuffPtrs;
+ llvm::Value *IV;
+ llvm::Value *Span;
+ } ScanInfo;
+
/// Initialize the internal state, this will put structures types and
/// potentially other helpers into the underlying module. Must be called
/// before any other method and only once! This internal state includes types
@@ -729,6 +754,35 @@ class OpenMPIRBuilder {
LoopBodyGenCallbackTy BodyGenCB, Value *TripCount,
const Twine &Name = "loop");
+ /// Generator for the control flow structure of an OpenMP canonical loops if
+ /// the parent directive has an `inscan` modifier specified.
+ /// If the `inscan` modifier is specified, the region of the parent is
+ /// expected to have a `scan` directive. Based on the clauses in
+ /// scan directive, the body of the loop is split into two loops: Input loop
+ /// and Scan Loop. Input loop contains the code generated for input phase of
+ /// scan and Scan loop contains the code generated for scan phase of scan.
+ ///
+ /// \param Loc The insert and source location description.
+ /// \param BodyGenCB Callback that will generate the loop body code.
+ /// \param Start Value of the loop counter for the first iterations.
+ /// \param Stop Loop counter values past this will stop the loop.
+ /// \param Step Loop counter increment after each iteration; negative
+ /// means counting down.
+ /// \param IsSigned Whether Start, Stop and Step are signed integers.
+ /// \param InclusiveStop Whether \p Stop itself is a valid value for the loop
+ /// counter.
+ /// \param ComputeIP Insertion point for instructions computing the trip
+ /// count. Can be used to ensure the trip count is available
+ /// at the outermost loop of a loop nest. If not set,
+ /// defaults to the preheader of the generated loop.
+ /// \param Name Base name used to derive BB and instruction names.
+ ///
+ /// \returns A vector containing Loop Info of Input Loop and Scan Loop.
+ Expected<SmallVector<llvm::CanonicalLoopInfo *>> createCanonicalScanLoops(
+ const LocationDescription &Loc, LoopBodyGenCallbackTy BodyGenCB,
+ Value *Start, Value *Stop, Value *Step, bool IsSigned, bool InclusiveStop,
+ InsertPointTy ComputeIP, const Twine &Name);
+
/// Calculate the trip count of a canonical loop.
///
/// This allows specifying user-defined loop counter values using increment,
@@ -798,13 +852,16 @@ class OpenMPIRBuilder {
/// at the outermost loop of a loop nest. If not set,
/// defaults to the preheader of the generated loop.
/// \param Name Base name used to derive BB and instruction names.
+ /// \param InScan Whether loop has a scan reduction specified.
///
/// \returns An object representing the created control flow structure which
/// can be used for loop-associated directives.
- Expected<CanonicalLoopInfo *> createCanonicalLoop(
- const LocationDescription &Loc, LoopBodyGenCallbackTy BodyGenCB,
- Value *Start, Value *Stop, Value *Step, bool IsSigned, bool InclusiveStop,
- InsertPointTy ComputeIP = {}, const Twine &Name = "loop");
+ Expected<CanonicalLoopInfo *>
+ createCanonicalLoop(const LocationDescription &Loc,
+ LoopBodyGenCallbackTy BodyGenCB, Value *Start,
+ Value *Stop, Value *Step, bool IsSigned,
+ bool InclusiveStop, InsertPointTy ComputeIP = {},
+ const Twine &Name = "loop", bool InScan = false);
/// Collapse a loop nest into a single loop.
///
@@ -1532,6 +1589,45 @@ class OpenMPIRBuilder {
ArrayRef<OpenMPIRBuilder::ReductionInfo> ReductionInfos,
Function *ReduceFn, AttributeList FuncAttrs);
+ /// Creates the runtime call specified
+ /// \param Callee Function Declaration Value
+ /// \param Args Arguments passed to the call
+ /// \param Name Optional param to specify the name of the call Instruction.
+ ///
+ /// \return The Runtime call instruction created.
+ llvm::CallInst *emitNoUnwindRuntimeCall(llvm::FunctionCallee Callee,
+ ArrayRef<llvm::Value *> Args,
+ const llvm::Twine &Name);
+
+ /// Helper function for CreateCanonicalScanLoops to create InputLoop
+ /// in the firstGen and Scan Loop in the SecondGen
+ /// \param InputLoopGen Callback for generating the loop for input phase
+ /// \param ScanLoopGen Callback for generating the loop for scan phase
+ ///
+ /// \return error if any produced, else return success.
+ Error emitScanBasedDirectiveIR(
+ llvm::function_ref<Error()> InputLoopGen,
+ llvm::function_ref<Error(LocationDescription Loc)> ScanLoopGen);
+
+ /// Creates the basic blocks required for scan reduction.
+ void createScanBBs();
+
+ /// Dynamically allocates the buffer needed for scan reduction.
+ /// \param AllocaIP The IP where possibly-shared pointer of buffer needs to be
+ /// declared. \param ScanVars Scan Variables.
+ ///
+ /// \return error if any produced, else return success.
+ Error emitScanBasedDirectiveDeclsIR(InsertPointTy AllocaIP,
+ ArrayRef<llvm::Value *> ScanVars,
+ ArrayRef<llvm::Type *> ScanVarsType);
+
+ /// Copies the result back to the reduction variable.
+ /// \param ReductionInfos Array type containing the ReductionOps.
+ ///
+ /// \return error if any produced, else return success.
+ Error emitScanBasedDirectiveFinalsIR(
+ SmallVector<llvm::OpenMPIRBuilder::ReductionInfo> ReductionInfos);
+
/// This function emits a helper that gathers Reduce lists from the first
/// lane of every active warp to lanes in the first warp.
///
@@ -2607,6 +2703,41 @@ class OpenMPIRBuilder {
BodyGenCallbackTy BodyGenCB,
FinalizeCallbackTy FiniCB, Value *Filter);
+ /// This function performs the scan reduction of the values updated in
+ /// the input phase. The reduction logic needs to be emitted between input
+ /// and scan loop returned by `CreateCanonicalScanLoops`. The following
+ /// is the code that is generated, `buffer` and `span` are expected to be
+ /// populated before executing the generated code.
+ ///
+ /// for (int k = 0; k != ceil(log2(span)); ++k) {
+ /// i=pow(2,k)
+ /// for (size cnt = last_iter; cnt >= i; --cnt)
+ /// buffer[cnt] op= buffer[cnt-i];
+ /// }
+ /// \param Loc The insert and source location description.
+ /// \param ReductionInfos Array type containing the ReductionOps.
+ ///
+ /// \returns The insertion position *after* the masked.
+ InsertPointOrErrorTy emitScanReduction(
+ const LocationDescription &Loc,
+ SmallVector<llvm::OpenMPIRBuilder::ReductionInfo> ReductionInfos);
+
+ /// This directive split and directs the control flow to input phase
+ /// blocks or scan phase blocks based on 1. whether input loop or scan loop
+ /// is executed, 2. whether exclusive or inclusive scan is used.
+ ///
+ /// \param Loc The insert and source location description.
+ /// \param AllocaIP The IP where the temporary buffer for scan reduction
+ // needs to be allocated.
+ /// \param ScanVars Scan Variables.
+ /// \param IsInclusive Whether it is an inclusive or exclusive scan.
+ ///
+ /// \returns The insertion position *after* the scan.
+ InsertPointOrErrorTy createScan(const LocationDescription &Loc,
+ InsertPointTy AllocaIP,
+ ArrayRef<llvm::Value *> ScanVars,
+ ArrayRef<llvm::Type *> ScanVarsType,
+ bool IsInclusive);
/// Generator for '#omp critical'
///
/// \param Loc The insert and source location description.
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index 28662efc02882..00dc571c6ea77 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -59,6 +59,8 @@
#include "llvm/Transforms/Utils/LoopPeel.h"
#include "llvm/Transforms/Utils/UnrollLoop.h"
+#include <cassert>
+#include <cstddef>
#include <cstdint>
#include <optional>
@@ -3981,6 +3983,337 @@ OpenMPIRBuilder::createMasked(const LocationDescription &Loc,
/*Conditional*/ true, /*hasFinalize*/ true);
}
+llvm::CallInst *
+OpenMPIRBuilder::emitNoUnwindRuntimeCall(llvm::FunctionCallee Callee,
+ ArrayRef<llvm::Value *> Args,
+ const llvm::Twine &Name) {
+ llvm::CallInst *Call = Builder.CreateCall(
+ Callee, Args, SmallVector<llvm::OperandBundleDef, 1>(), Name);
+ Call->setDoesNotThrow();
+ return Call;
+}
+
+// Expects input basic block is dominated by BeforeScanBB.
+// Once Scan directive is encountered, the code after scan directive should be
+// dominated by AfterScanBB. Scan directive splits the code sequence to
+// scan and input phase. Based on whether inclusive or exclusive
+// clause is used in the scan directive and whether input loop or scan loop
+// is lowered, it adds jumps to input and scan phase. First Scan loop is the
+// input loop and second is the scan loop. The code generated handles only
+// inclusive scans now.
+OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::createScan(
+ const LocationDescription &Loc, InsertPointTy AllocaIP,
+ ArrayRef<llvm::Value *> ScanVars, ArrayRef<llvm::Type *> ScanVarsType,
+ bool IsInclusive) {
+ if (ScanInfo.OMPFirstScanLoop) {
+ llvm::Error Err =
+ emitScanBasedDirectiveDeclsIR(AllocaIP, ScanVars, ScanVarsType);
+ if (Err) {
+ return Err;
+ }
+ }
+ if (!updateToLocation(Loc))
+ return Loc.IP;
+
+ llvm::Value *IV = ScanInfo.IV;
+
+ if (ScanInfo.OMPFirstScanLoop) {
+ // Emit buffer[i] = red; at the end of the input phase.
+ for (size_t i = 0; i < ScanVars.size(); i++) {
+ Value *BuffPtr = ScanInfo.ScanBuffPtrs[ScanVars[i]];
+ Value *Buff = Builder.CreateLoad(Builder.getPtrTy(), BuffPtr);
+ Type *DestTy = ScanVarsType[i];
+ Value *Val = Builder.CreateInBoundsGEP(DestTy, Buff, IV, "arrayOffset");
+ Value *Src = Builder.CreateLoad(DestTy, ScanVars[i]);
+
+ Builder.CreateStore(Src, Val);
+ }
+ }
+ Builder.CreateBr(ScanInfo.OMPScanLoopExit);
+ emitBlock(ScanInfo.OMPScanDispatch, Builder.GetInsertBlock()->getParent());
+
+ if (!ScanInfo.OMPFirstScanLoop) {
+ IV = ScanInfo.IV;
+ // Emit red = buffer[i]; at the entrance to the scan phase.
+ // TODO: if exclusive scan, the red = buffer[i-1] needs to be updated.
+ for (size_t i = 0; i < ScanVars.size(); i++) {
+ Value *BuffPtr = ScanInfo.ScanBuffPtrs[ScanVars[i]];
+ Value *Buff = Builder.CreateLoad(Builder.getPtrTy(), BuffPtr);
+ Type *DestTy = ScanVarsType[i];
+ Value *SrcPtr =
+ Builder.CreateInBoundsGEP(DestTy, Buff, IV, "arrayOffset");
+ Value *Src = Builder.CreateLoad(DestTy, SrcPtr);
+ Builder.CreateStore(Src, ScanVars[i]);
+ }
+ }
+
+ // TODO: Update it to CreateBr and remove dead blocks
+ llvm::Value *CmpI = Builder.getInt1(true);
+ if (ScanInfo.OMPFirstScanLoop == IsInclusive) {
+ Builder.CreateCondBr(CmpI, ScanInfo.OMPBeforeScanBlock,
+ ScanInfo.OMPAfterScanBlock);
+ } else {
+ Builder.CreateCondBr(CmpI, ScanInfo.OMPAfterScanBlock,
+ ScanInfo.OMPBeforeScanBlock);
+ }
+ emitBlock(ScanInfo.OMPAfterScanBlock, Builder.GetInsertBlock()->getParent());
+ Builder.SetInsertPoint(ScanInfo.OMPAfterScanBlock);
+ return Builder.saveIP();
+}
+
+Error OpenMPIRBuilder::emitScanBasedDirectiveDeclsIR(
+ InsertPointTy AllocaIP, ArrayRef<Value *> ScanVars,
+ ArrayRef<Type *> ScanVarsType) {
+
+ Builder.restoreIP(AllocaIP);
+ // Create the shared pointer at alloca IP.
+ for (size_t i = 0; i < ScanVars.size(); i++) {
+ llvm::Value *BuffPtr =
+ Builder.CreateAlloca(Builder.getPtrTy(), nullptr, "vla");
+ ScanInfo.ScanBuffPtrs[ScanVars[i]] = BuffPtr;
+ }
+
+ // Allocate temporary buffer by master thread
+ auto BodyGenCB = [&](InsertPointTy AllocaIP,
+ InsertPointTy CodeGenIP) -> Error {
+ Builder.restoreIP(CodeGenIP);
+ Value *AllocSpan = Builder.CreateAdd(ScanInfo.Span, Builder.getInt32(1));
+ for (size_t i = 0; i < ScanVars.size(); i++) {
+ Type *IntPtrTy = Builder.getInt32Ty();
+ Constant *Allocsize = ConstantExpr::getSizeOf(ScanVarsType[i]);
+ Allocsize = ConstantExpr::getTruncOrBitCast(Allocsize, IntPtrTy);
+ Value *Buff = Builder.CreateMalloc(IntPtrTy, ScanVarsType[i], Allocsize,
+ AllocSpan, nullptr, "arr");
+ Builder.CreateStore(Buff, ScanInfo.ScanBuffPtrs[ScanVars[i]]);
+ }
+ return Error::success();
+ };
+ // TODO: Perform finalization actions for variables. This has to be
+ // called for variables which have destructors/finalizers.
+ auto FiniCB = [&](InsertPointTy CodeGenIP) { return llvm::Error::success(); };
+
+ Builder.SetInsertPoint(ScanInfo.OMPScanInit->getTerminator());
+ llvm::Value *FilterVal = Builder.getInt32(0);
+ llvm::OpenMPIRBuilder::InsertPointOrErrorTy AfterIP =
+ createMasked(Builder.saveIP(), BodyGenCB, FiniCB, FilterVal);
+
+ if (!AfterIP)
+ return AfterIP.takeError();
+ Builder.restoreIP(*AfterIP);
+ BasicBlock *InputBB = Builder.GetInsertBlock();
+ if (InputBB->getTerminator())
+ Builder.SetInsertPoint(Builder.GetInsertBlock()->getTerminator());
+ AfterIP = createBarrier(Builder.saveIP(), llvm::omp::OMPD_barrier);
+ if (!AfterIP)
+ return AfterIP.takeError();
+ Builder.restoreIP(*AfterIP);
+
+ return Error::success();
+}
+
+Error OpenMPIRBuilder::emitScanBasedDirectiveFinalsIR(
+ SmallVector<ReductionInfo> ReductionInfos) {
+ auto BodyGenCB = [&](InsertPointTy AllocaIP,
+ InsertPointTy CodeGenIP) -> Error {
+ Builder.restoreIP(CodeGenIP);
+ for (ReductionInfo RedInfo : ReductionInfos) {
+ Value *PrivateVar = RedInfo.PrivateVariable;
+ Value *OrigVar = RedInfo.Variable;
+ Value *BuffPtr = ScanInfo.ScanBuffPtrs[PrivateVar];
+ Value *Buff = Builder.CreateLoad(Builder.getPtrTy(), BuffPtr);
+
+ Type *SrcTy = RedInfo.ElementType;
+ Value *Val =
+ Builder.CreateInBoundsGEP(SrcTy, Buff, ScanInfo.Span, "arrayOffset");
+ Value *Src = Builder.CreateLoad(SrcTy, Val);
+
+ Builder.CreateStore(Src, OrigVar);
+ Builder.CreateFree(Buff);
+ }
+ return Error::success();
+ };
+ // TODO: Perform finalization actions for variables. This has to be
+ // called for variables which have destructors/finalizers.
+ auto FiniCB = [&](InsertPointTy CodeGenIP) { return llvm::Error::success(); };
+
+ if (ScanInfo.OMPScanFinish->getTerminator())
+ Builder.SetInsertPoint(ScanInfo.OMPScanFinish->getTerminator());
+ else
+ Builder.SetInsertPoint(ScanInfo.OMPScanFinish);
+
+ llvm::Value *FilterVal = Builder.getInt32(0);
+ llvm::OpenMPIRBuilder::InsertPointOrErrorTy AfterIP =
+ createMasked(Builder.saveIP(), BodyGenCB, FiniCB, FilterVal);
+
+ if (!AfterIP)
+ return AfterIP.takeError();
+ Builder.restoreIP(*AfterIP);
+ BasicBlock *InputBB = Builder.GetInsertBlock();
+ if (InputBB->getTerminator())
+ Builder.SetInsertPoint(Builder.GetInsertBlock()->getTerminator());
+ AfterIP = createBarrier(Builder.saveIP(), llvm::omp::OMPD_barrier);
+ if (!AfterIP)
+ return AfterIP.takeError();
+ Builder.restoreIP(*AfterIP);
+ return Error::success();
+}
+
+OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::emitScanReduction(
+ const LocationDescription &Loc,
+ SmallVector<llvm::OpenMPIRBuilder::ReductionInfo> ReductionInfos) {
+
+ if (!updateToLocation(Loc))
+ return Loc.IP;
+ auto BodyGenCB = [&](InsertPointTy AllocaIP,
+ InsertPointTy CodeGenIP) -> Error {
+ Builder.restoreIP(CodeGenIP);
+ Function *CurFn = Builder.GetInsertBlock()->getParent();
+ // for (int k = 0; k <= ceil(log2(n)); ++k)
+ llvm::BasicBlock *LoopBB =
+ BasicBlock::Create(CurFn->getContext(), "omp.outer.log.scan.body");
+ llvm::BasicBlock *ExitBB =
+ splitBB(Builder, false, "omp.outer.log.scan.exit");
+ llvm::Function *F = llvm::Intrinsic::getOrInsertDeclaration(
+ Builder.GetInsertBlock()->getModule(),
+ (llvm::Intrinsic::ID)llvm::Intrinsic::log2, Builder.getDoubleTy());
+ llvm::BasicBlock *InputBB = Builder.GetInsertBlock();
+ llvm::Value *Arg =
+ Builder.CreateUIToFP(ScanInfo.Span, Builder.getDoubleTy());
+ llvm::Value *LogVal = emitNoUnwindRuntimeCall(F, Arg, "");
+ F = llvm::Intrinsic::getOrInsertDeclaration(
+ Builder.GetInsertBlock()->getModule(),
+ (llvm::Intrinsic::ID)llvm::Intrinsic::ceil, Builder.getDoubleTy());
+ LogVal = emitNoUnwindRuntimeCall(F, LogVal, "");
+ LogVal = Builder.CreateFPToUI(LogVal, Builder.getInt32Ty());
+ llvm::Value *NMin1 = Builder.CreateNUWSub(
+ ScanInfo.Span, llvm::ConstantInt::get(ScanInfo.Span->getType(), 1));
+ Builder.SetInsertPoint(InputBB);
+ Builder.CreateBr(LoopBB);
+ emitBlock(LoopBB, Builder.GetInsertBlock()->getParent());
+ Builder.SetInsertPoint(LoopBB);
+
+ PHINode *Counter = Builder.CreatePHI(Builder.getInt32Ty(), 2);
+ //// size pow2k = 1;
+ PHINode *Pow2K = Builder.CreatePHI(Builder.getInt32Ty(), 2);
+ Counter->addIncoming(llvm::ConstantInt::get(Builder.getInt32Ty(), 0),
+ InputBB);
+ Pow2K->addIncoming(llvm::ConstantInt::get(Builder.getInt32Ty(), 1),
+ InputBB);
+ //// for (size i = n - 1; i >= 2 ^ k; --i)
+ //// tmp[i] op= tmp[i-pow2k];
+ llvm::BasicBlock *InnerLoopBB =
+ BasicBlock::Create(CurFn->getContext(), "omp.inner.log.scan.body");
+ llvm::BasicBlock *InnerExitBB =
+ BasicBlock::Create(CurFn->getContext(), "omp.inner.log.scan.exit");
+ llvm::Value *CmpI = Builder.CreateICmpUGE(NMin1, Pow2K);
+ Builder.CreateCondBr(CmpI, InnerLoopBB, InnerExitBB);
+ emitBlock(InnerLoopBB, Builder.GetInsertBlock()->getParent());
+ Builder.SetInsertPoint(InnerLoopBB);
+ auto *IVal = Builder.CreatePHI(Builder.getInt32Ty(), 2);
+ IVal->addIncoming(NMin1, LoopBB);
+ for (ReductionInfo RedInfo : ReductionInfos) {
+ Value *ReductionVal = RedInfo.PrivateVariable;
+ Value *BuffPtr = ScanInfo.ScanBuffPtrs[ReductionVal];
+ Value *Buff = Builder.CreateLoad(Builder.getPtrTy(), BuffPtr);
+ Type *DestTy = RedInfo.ElementType;
+ Value *IV = Builder.CreateAdd(IVal, Builder.getInt32(1));
+ Value *LHSPtr =
+ Builder.CreateInBoundsGEP(DestTy, Buff, IV, "arrayOffset");
+ Value *OffsetIval = Builder.CreateNUWSub(IV, Pow2K);
+ Value *RHSPtr =
+ Builder.CreateInBoundsGEP(DestTy, Buff, OffsetIval, "arrayOffset");
+ Value *LHS = Builder.CreateLoad(DestTy, LHSPtr);
+ ...
[truncated]
|
108b0ce
to
6d59b7e
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
78f656e
to
6d59b7e
Compare
6d59b7e
to
0ad35dc
Compare
4b50e9c
to
323d3e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of relying on createScan
to be called withing the BodyGen
callback, have you considered createCanonicalScanLoops
to receive two callbacks: one for the input phase, and another for the scan phase?
llvm::SmallDenseMap<llvm::Value *, llvm::Value *> ScanBuffPtrs; | ||
llvm::Value *IV; | ||
llvm::Value *Span; | ||
} ScanInfo; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} ScanInfo; | |
}; |
Usually it is not a good idea to remember such data between methods calls. Better pass a handle object between methods calls, such as AtomicInfo
and CanonicalLoopInfo
.
The type would consistently be called ScanInfo
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to object passing. Thank you!
/// InputBlock | ||
/// | | ||
/// ContinueBlock | ||
/// | ||
/// is transformed to: | ||
/// | ||
/// InputBlock | ||
/// | | ||
/// OMPScanDispatch | ||
/// | ||
/// OMPBeforeScanBlock | ||
/// | | ||
/// OMPScanLoopExit | ||
/// | | ||
/// ContinueBlock | ||
/// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great as documentation of the ScanInformation type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the documentation to ScanInfo and added extra documentation
if (InScan) | ||
ScanInfo.IV = IndVar; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is as bad as using globals to pass information between calls. You can nest multiple inscan-loops. At least you would need to make it a stack like FinalizationStack
if you need to keep track which outer loop we are currently in, but preferably pass CanonicalLoopInfo
or ScanInformation
result to whatever needs the IV.
#130135 was meant to generalize the concept of region nesting, avoiding each having nestable region construct manages its own stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated it to use ScanInfo.
@@ -5349,6 +5350,100 @@ TEST_F(OpenMPIRBuilderTest, CreateReductions) { | |||
EXPECT_TRUE(findGEPZeroOne(ReductionFn->getArg(1), FirstRHS, SecondRHS)); | |||
} | |||
|
|||
void createScan(llvm::Value *scanVar, llvm::Type *scanType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not inline this into the lambda where it is called?
If there is a reason to keep it separate, add static
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated it to static. The body gen lambda expects return type of type Error
. When the assert of createScan
was used when inlined in bodygen lambda, it was complaining that returning void by assert is incompatible.
/// expected to have a `scan` directive. Based on the clauses in | ||
/// scan directive, the body of the loop is split into two loops: Input loop | ||
/// and Scan Loop. Input loop contains the code generated for input phase of | ||
/// scan and Scan loop contains the code generated for scan phase of scan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also document on how it is to be used, i.g. that createScan
is supposed to be called within the body callback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Thank you!
2bcfb8e
to
5343c53
Compare
Thank you @Meinersbur for the review. I have tried to address the review comments. |
5343c53
to
a8fb6bd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the above comment, I do not fully understand the design mentioned. I tried to create two lambdas in
ScanInfoInitialize
and stored it as two members (of type lambda) of the classScanInfo
, so the flang frontend calls these two lambdas when scan directive is lowered. However, that does not seem to simplify the design.
createScanLoop(LoopBodyGenCallbackTy InputPhaseCb, LoopBodyGenCallbackTy ScanPhaseCb, llvm::Value *scanVar, llvm::Type *scanType, ...)
It certainly makes the implementation more complicated, the question is by how much.
The reason why I think this would be preferable is that it does not the entire scan construct is built by a single call. Compared to an outer createCanonicalScanLoops
where createScan
has to be called inside the callback. This forces frontends to structure their CodeGen a certain way, and may overlook weird combinations in-between createCanonicalScanLoops
and createScan
such as createCritical
:
#pragma omp for reduction(inscan,+:x)
for (int i = 0; i < n; ++i) {
#pragma omp critical
{
x += A[i];
#pragma omp scan inclusive(x)
B[i] = x;
}
}
My question is whether you have considered such a design and it seems now you have (and considered not worth the complexity). Given that (IMHO unfortunately) the entire OpenMP builder is already designed to have specific things happen within lambdas. your current design would not be unusual.
@@ -3774,6 +3900,84 @@ class CanonicalLoopInfo { | |||
LLVM_ABI void invalidate(); | |||
}; | |||
|
|||
/// ScanInfo holds the information to assist in lowering of Scan reduction. | |||
/// Before lowering, body of the for loop specifying scan reduction is expected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Before lowering, body of the for loop specifying scan reduction is expected | |
/// Before lowering, the body of the for loop specifying scan reduction is expected |
/// to have the following structure | ||
/// Loop Body Entry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// to have the following structure | |
/// Loop Body Entry | |
/// to have the following structure | |
/// | |
/// Loop Body Entry |
an additional newline keeps clang-format/doxygen from merging the lines
/// `ScanLoopFinish` block. | ||
|
||
class ScanInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// `ScanLoopFinish` block. | |
class ScanInfo { | |
/// `ScanLoopFinish` block. | |
class ScanInfo { |
llvm::BasicBlock *OMPBeforeScanBlock = nullptr; | ||
/// Dominates the body of the loop before scan directive | ||
llvm::BasicBlock *OMPAfterScanBlock = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
llvm::BasicBlock *OMPBeforeScanBlock = nullptr; | |
/// Dominates the body of the loop before scan directive | |
llvm::BasicBlock *OMPAfterScanBlock = nullptr; | |
llvm::BasicBlock *OMPBeforeScanBlock = nullptr; | |
/// Dominates the body of the loop before scan directive | |
llvm::BasicBlock *OMPAfterScanBlock = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space between doxygen comments and previous declaration
Builder.SetInsertPoint(LoopBB); | ||
|
||
PHINode *Counter = Builder.CreatePHI(Builder.getInt32Ty(), 2); | ||
//// size pow2k = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//// size pow2k = 1; | |
// size pow2k = 1; |
}; | ||
|
||
const auto &&InputLoopGen = [&]() -> Error { | ||
auto LoopInfo = createCanonicalLoop(Builder.saveIP(), BodyGen, Start, Stop, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto LoopInfo = createCanonicalLoop(Builder.saveIP(), BodyGen, Start, Stop, | |
CanonicalLoopInfo* LoopInfo = createCanonicalLoop(Builder.saveIP(), BodyGen, Start, Stop, |
// <scan phase>; | ||
// } | ||
ScanRedInfo->OMPFirstScanLoop = false; | ||
auto Result = ScanLoopGen(Builder.saveIP()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto Result = ScanLoopGen(Builder.saveIP()); | |
Error Result = ScanLoopGen(Builder.saveIP()); |
@@ -1,3 +1,4 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please avoid unrelated changes
unsigned NumLog = 0; | ||
unsigned NumCeil = 0; | ||
for (Instruction &I : instructions(F)) { | ||
if (isa<CallInst>(I)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (isa<CallInst>(I)) { | |
if (!isa<CallInst>(I)) | |
continue; |
for (Instruction &I : instructions(F)) { | ||
if (isa<CallInst>(I)) { | ||
CallInst *Call = dyn_cast<CallInst>(&I); | ||
auto Name = Call->getCalledFunction()->getName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No almost-always-auto
Scan reductions are supported in OpenMP with the help of scan directive. Reduction clause of the for loop/simd directive can take an
inscan
modifier along with the body of the directive specifying ascan
directive. This PR implements the lowering logic for scan reductions in workshare loops of OpenMP.The body of the for loop is split into two loops (Input phase loop and Scan Phase loop) and a scan reduction loop is added in the middle. The Input phase loop populates a temporary buffer with initial values that are to be reduced. The buffer is used by the reduction loop to perform scan reduction. Scan phase loop copies the values of the buffer to the reduction variable before executing the scan phase. Below is a high level view of the code generated.
The temporary buffer needs to be shared between all threads performing reduction since it is read/written in Input and Scan workshare Loops. This is achieved by declaring a pointer to the buffer in the shared region and dynamically allocating the buffer by the master thread.
This is the reason why allocation, deallocation and scan reduction are performed within
masked
. The code is verified to produce correct results for Fortran programs with the code changes in the PR #133149