[SYCL] Add barrier optimization pass #19353

MrSidims · 2025-07-09T00:09:42Z

It removes redundant barriers (both back-to-back and in general in CFG) and downgrades global barrier to local if there are no global memory accesses 'between' them. See description in
SYCLOptimizeBackToBackBarrier.cpp for more details.

llvm/lib/SYCLLowerIR/SYCLOptimizeBarriers.cpp

wenju-he · 2025-07-10T05:37:43Z

llvm/lib/SYCLLowerIR/SYCLOptimizeBarriers.cpp

+  const uint64_t LHSWeight = LHSIt->second;
+  const uint64_t RHSWeight = RHSIt->second;
+
+  if (LHSWeight > RHSWeight)


is it ok to compare value of static_cast(LHS) and don't use weight?

In general - yes. The idea I had in #16750 (from where the function and map originates) is to make the pass forward compatible. Aka if new Scopes are introduced - the pass wouldn't break them as it would consider them as Unknown. Not sure if it's worth to keep, as for example current version of the pass doesn't consider 'unknown' memory semantic masks.

wenju-he · 2025-07-10T05:39:03Z

llvm/lib/SYCLLowerIR/SYCLOptimizeBarriers.cpp

+  Changed |= eliminateBoundaryBarriers(BarrierPtrs);
+  // Then remove redundant barriers within a single basic block.
+  for (auto &BarrierBBPair : BarriersByBB)
+    Changed = eliminateBackToBackInBB(BarrierBBPair.first, BarrierBBPair.second,


can eliminateBackToBackInBB be merged into eliminateDominatedBarriers? eliminateBackToBackInBB is just a special case of the latter in that all barriers are in a single BB, right?

Yes, it can be merged. Yet I've left them split as eliminate back to back barriers function is algorithmic-wise simpler, then CFG elimination. And I though that it's a good idea to first optimize back-to-back barriers, then (not yet implemented) hoist 2 or more barriers into one in case if their appropriate blocks share the same predecessor and their semantics match, and only then do CFG-aware removal/downgrade on the remaining barriers).

wenju-he · 2025-07-10T06:32:36Z

llvm/lib/SYCLLowerIR/SYCLOptimizeBarriers.cpp

+
+      // If identical then drop Cur.
+      if (CmpExec == CompareRes::EQUAL && CmpMem == CompareRes::EQUAL) {
+        if (noFencedMemAccessesBetween(Last.CI, Cur.CI, FenceLast, BBMemInfo)) {


just a note: there could be repeated classifyMemScope calculation of somes instructions in noFencedMemAccessesBetween, e.g. following case:

barrier(CrossDevice) Instruction Set 1 (RegionMemScope == None) barrier(Device) Instruction Set 2 (RegionMemScope == None) barrier(Workgroup) Instruction Set 3 (RegionMemScope == None) barrier(Subgroup)

I guess the case is rare, so probably no need to optimize.

True. But this would require to do extra memorization (by default), which might be worse comparing extra calculus in the rare case.

In general there are other ways to define fenced regions between barriers, but I haven't though about them until last Monday, when I found a similar work :) Re-making scanning and re-defining fenced regions is a possible enhancement for the pass.

In general there are other ways to define fenced regions between barriers, but I haven't though about them until last Monday, when I found a similar work :) Re-making scanning and re-defining fenced regions is a possible enhancement for the pass.

Sounding interesting, is there a link to the work?

@wenju-he I meant CPU middle end pass, which while is not doing the same as this pass, yet have quite interesting idea for function preparation :)

I see. Right it is not the same. I agree that a region based algorithm would be better. Basically the pass here is merging equivalent regions.

wenju-he · 2025-07-10T06:57:55Z

llvm/lib/SYCLLowerIR/SYCLOptimizeBarriers.cpp

+      if (Fence == RegionMemScope::Unknown)
+        continue;
+
+      if (DT.dominates(B1->CI, B2->CI)) {


is there repeated calculation for the case

B1 = A0, B2 = A1, A0 dominates A1, noFencedAccessesCFG returns false

B1 = A1, B2 = A0, A1 post-dominates A0, noFencedAccessesCFG is called again on the same instructions

another potential repeated calculation case is:
A0 dominates A1, A1 dominates A2. noFencedAccessesCFG is called twice on the instructions between A0 and A1.

Yeah, this is something I'm refactoring and will continue to refactor by merging elimination in CFG and downgrade functions.

Should be partially resolved.

It removes redundant barriers (both back-to-back and in general in CFG) and downgrades global barrier to local if there are no global memory accesses 'between' them. See description in SYCLOptimizeBackToBackBarrier.cpp for more details. Signed-off-by: Sidorov, Dmitry <[email protected]>

llvm/lib/SYCLLowerIR/SYCLOptimizeBarriers.cpp

TODO: merge CFG elimination and barrier downgrade Signed-off-by: Sidorov, Dmitry <[email protected]>

llvm/lib/SYCLLowerIR/SYCLOptimizeBarriers.cpp

sarnex

cmake lgtm

Signed-off-by: Dmitry Sidorov <[email protected]>

asudarsa · 2025-07-15T02:10:56Z

Moving it out of a draft, @wenju-he please take another look, @intel/dpcpp-tools-reviewers please take a look.

There are currently several TODOs and FIXMEs in the code, but I guess in the current PR I will only address these 2 related issues: #19353 (comment) and #19353 (comment) , other than that the pass is (from my perspective) ready.

UPD: also the pass should be moved out from SYCLLowerIR , and I'm open to the suggestions :)

Hi @MrSidims

Can you please state your reasons behind requiring such a move? That might help to get some ideas. Thanks

asudarsa · 2025-07-15T02:15:53Z

llvm/lib/SYCLLowerIR/SYCLOptimizeBarriers.cpp

+//      loads from __spirv_BuiltIn GVs)
+//      – Unknown  : any other mayReadOrWriteMemory() (intrinsics, calls,
+//      generic addrspace)
+//    * Walk the function and record every barrier call into a list of


Lines 14-26 and 27-39 seem similar. Please check. Thanks

asudarsa · 2025-07-15T02:21:34Z

llvm/lib/SYCLLowerIR/SYCLOptimizeBarriers.cpp

+//      accesses >= B.MemScope, then remove B.
+//    - **Exit** : For each barrier B, if on *every* path from B to any function
+//    return there are no
+//      accesses >= B.MemScope, then remove B.


what does no accesses >= B.MemScope mean? Thanks

I'll clarify that (and it seems like clang-format interferes the understanding - I'll disable it for the header), but basically it means, that there are no accesses from a wider or equal memory, then memory scope and memory semantics of the barrier. Aka we look at address space that accompanies the access, see to which memory it belongs and compare with the fenced memory.

wenju-he · 2025-07-15T07:58:43Z

llvm/lib/Passes/PassBuilderPipelines.cpp

@@ -808,6 +813,8 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
                                    .sinkCommonInsts(true)));
  FPM.addPass(InstCombinePass());
  invokePeepholeEPCallbacks(FPM, Level);
+  if (SYCLOptimizationMode)
+    FPM.addPass(SYCLOptimizeBarriersPass());


should the pass be run via registerPipeline*Callback in function

llvm/clang/lib/CodeGen/BackendUtil.cpp

Line 873 in a960c7f

void EmitAssemblyHelper::RunOptimizationPipeline(

Most of SYCL passes are added there.

Applied, thanks!

wenju-he · 2025-07-15T08:10:22Z

llvm/lib/Passes/PassBuilderPipelines.cpp

@@ -575,6 +578,8 @@ PassBuilder::buildO1FunctionSimplificationPipeline(OptimizationLevel Level,
      SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));
  FPM.addPass(InstCombinePass());
  invokePeepholeEPCallbacks(FPM, Level);
+  if (SYCLOptimizationMode)


It would be interesting if non sycl-opt mode can use the pass as well. It seems not possible at the moment as libspirv ControlBarrier implementation is inlined before optimization pass pipeline starts.

I probably misunderstanding what SYCLOptimizationMode is. I though it basically mean [O1, O2, O3] on the device code in the frontend (and no optnone passed to BE), or it's not the case?

SYCLOptimizationMode disables some optimizations. SYCLOptimizationMode should be meant for spir64 or SPIR-V target only as the target lacks TTI.

Moved the pass from here.

wenju-he · 2025-07-15T08:58:44Z

llvm/lib/SYCLLowerIR/SYCLOptimizeBarriers.cpp

+        // FIXME: this check is way too conservative.
+        if (Fence != RegionMemScope::Unknown && ADominatesB &&
+            PDT.dominates(B->CI, A->CI) &&
+            noFencedAccessesCFG(A->CI, B->CI, Fence, BBMemInfo)) {


I think there could be repeated calculation if this function is called multiple times and a basic block is re-scanned in multiple ranges (A, B), (A, C), etc.
It might be enough to calculate the highest ExecScope and MemScope score of each basic block only once. Then the score can be used if the basic block is scanned multiple times.

Correct, do you mean on line 532 just compare with Required? Realistically it should be enough, though in some rare scenarios it will result in a missing optimization. For now I've added a shortcut in hasFencedAccesses for cases when path from barrier A to barrier B is looking like this: BB-A -> other BB -> BB-B - with this shortcut scan for other BB will be omitted.

I mean each basic block has its own score, i.e. highest score of the instructions within the basic block. After calculating the score, there is no need to iterate over instructions over and over again.

maarquitos14 · 2025-07-14T15:29:09Z

llvm/lib/Passes/PassBuilderPipelines.cpp

@@ -147,6 +147,9 @@
 #include "llvm/Transforms/Vectorize/SLPVectorizer.h"
 #include "llvm/Transforms/Vectorize/VectorCombine.h"

+// TODO: move it elsewhere


Is this something we should do before merging this PR?

Yeah, I'm thinking of just Transforms

maarquitos14 · 2025-07-14T15:31:54Z

llvm/lib/SYCLLowerIR/SYCLOptimizeBarriers.cpp

+//      loads from __spirv_BuiltIn GVs)
+//      – Unknown  : any other mayReadOrWriteMemory() (intrinsics, calls,
+//      generic addrspace)
+//    * Walk the function and record every barrier call into a list of


Is this comment not redundant with the block just above?

llvm/lib/SYCLLowerIR/SYCLOptimizeBarriers.cpp

Signed-off-by: Sidorov, Dmitry <[email protected]>

maarquitos14

Just a few more nits, but overall looks very good. Great job!

maarquitos14 · 2025-07-16T10:19:21Z

llvm/lib/SYCLLowerIR/SYCLOptimizeBarriers.cpp

+                          << ") returned " << false << "\n");
+        return false;
+      }
+      // do not enqueue successors (there are none).


Suggested change

// do not enqueue successors (there are none).

// Do not enqueue successors (there are none).

maarquitos14 · 2025-07-16T10:20:26Z

llvm/lib/SYCLLowerIR/SYCLOptimizeBarriers.cpp

+            if (semanticsSuperset(A.Semantic, B.Semantic) &&
+                !semanticsSuperset(B.Semantic, A.Semantic))
+              return false;
+            // then fall back to exec/mem‐scope width as before:


Suggested change

// then fall back to exec/mem‐scope width as before:

// Then fall back to exec/mem‐scope width as before:

maarquitos14 · 2025-07-16T10:21:37Z

llvm/lib/SYCLLowerIR/SYCLOptimizeBarriers.cpp

+      }
+      break;
+    }
+    if (Cur.CI) // still alive?


Suggested change

if (Cur.CI) // still alive?

if (Cur.CI) // Still alive?

maarquitos14 · 2025-07-16T10:23:41Z

llvm/lib/SYCLLowerIR/SYCLOptimizeBarriers.cpp

+}
+
+// True if BD is the first real instruction of the function.
+static bool isAtKernelEntry(BarrierDesc &BD) {


This is still non-const, though. Can we add it?

maarquitos14 · 2025-07-16T10:23:46Z

llvm/lib/SYCLLowerIR/SYCLOptimizeBarriers.cpp

+}
+
+// True if BD is immediately before a return/unreachable and nothing follows.
+static bool isAtKernelExit(BarrierDesc &BD) {


Can we add it?

llvm/lib/SYCLLowerIR/SYCLOptimizeBarriers.cpp

MrSidims had a problem deploying to WindowsCILock July 9, 2025 00:09 — with GitHub Actions Failure

MrSidims force-pushed the optimize-barrier-2 branch from 1a48836 to e5b70b9 Compare July 9, 2025 08:36

MrSidims temporarily deployed to WindowsCILock July 9, 2025 08:36 — with GitHub Actions Inactive

MrSidims commented Jul 9, 2025

View reviewed changes

MrSidims temporarily deployed to WindowsCILock July 9, 2025 09:45 — with GitHub Actions Inactive

wenju-he reviewed Jul 10, 2025

View reviewed changes

MrSidims force-pushed the optimize-barrier-2 branch from e5b70b9 to 6d51dad Compare July 13, 2025 11:21

MrSidims had a problem deploying to WindowsCILock July 13, 2025 11:21 — with GitHub Actions Error

MrSidims force-pushed the optimize-barrier-2 branch from 6d51dad to 7710333 Compare July 13, 2025 11:22

MrSidims marked this pull request as ready for review July 13, 2025 11:22

MrSidims requested review from a team as code owners July 13, 2025 11:22

MrSidims requested a review from aelovikov-intel July 13, 2025 11:22

MrSidims had a problem deploying to WindowsCILock July 13, 2025 11:23 — with GitHub Actions Failure

MrSidims changed the title ~~[testing for now][SYCL] Add barrier optimization pass~~ [SYCL] Add barrier optimization pass Jul 13, 2025

MrSidims temporarily deployed to WindowsCILock July 13, 2025 12:04 — with GitHub Actions Inactive

MrSidims marked this pull request as draft July 13, 2025 12:05

MrSidims commented Jul 13, 2025

View reviewed changes

llvm/lib/SYCLLowerIR/SYCLOptimizeBarriers.cpp Outdated Show resolved Hide resolved

Rewrite logic for fenced memory detection and many more

1397c5b

TODO: merge CFG elimination and barrier downgrade Signed-off-by: Sidorov, Dmitry <[email protected]>

MrSidims force-pushed the optimize-barrier-2 branch from 7710333 to 527e8e3 Compare July 14, 2025 10:00

MrSidims temporarily deployed to WindowsCILock July 14, 2025 10:00 — with GitHub Actions Inactive

MrSidims commented Jul 14, 2025

View reviewed changes

llvm/lib/SYCLLowerIR/SYCLOptimizeBarriers.cpp Outdated Show resolved Hide resolved

MrSidims requested a review from wenju-he July 14, 2025 10:06

MrSidims temporarily deployed to WindowsCILock July 14, 2025 11:22 — with GitHub Actions Inactive

MrSidims temporarily deployed to WindowsCILock July 14, 2025 12:04 — with GitHub Actions Inactive

sarnex approved these changes Jul 14, 2025

View reviewed changes

sarnex requested a review from a team July 14, 2025 14:34

MrSidims had a problem deploying to WindowsCILock July 15, 2025 00:15 — with GitHub Actions Error

refactor downgrade/CFG removal

1b25b97

Signed-off-by: Dmitry Sidorov <[email protected]>

MrSidims force-pushed the optimize-barrier-2 branch from 8b88749 to 1b25b97 Compare July 15, 2025 00:17

MrSidims temporarily deployed to WindowsCILock July 15, 2025 00:18 — with GitHub Actions Inactive

MrSidims had a problem deploying to WindowsCILock July 15, 2025 01:00 — with GitHub Actions Failure

MrSidims temporarily deployed to WindowsCILock July 15, 2025 01:00 — with GitHub Actions Inactive

asudarsa reviewed Jul 15, 2025

View reviewed changes

wenju-he reviewed Jul 15, 2025

View reviewed changes

maarquitos14 reviewed Jul 15, 2025

View reviewed changes

fix comments, names etc

07e133e

Signed-off-by: Sidorov, Dmitry <[email protected]>

MrSidims temporarily deployed to WindowsCILock July 15, 2025 23:00 — with GitHub Actions Inactive

MrSidims temporarily deployed to WindowsCILock July 15, 2025 23:36 — with GitHub Actions Inactive

Add pass via callback

6939728

Signed-off-by: Sidorov, Dmitry <[email protected]>

MrSidims requested a review from a team as a code owner July 16, 2025 00:33

MrSidims had a problem deploying to WindowsCILock July 16, 2025 00:33 — with GitHub Actions Error

reuse BBMemInfo a bit more

cd8096b

Signed-off-by: Sidorov, Dmitry <[email protected]>

MrSidims temporarily deployed to WindowsCILock July 16, 2025 01:07 — with GitHub Actions Inactive

MrSidims temporarily deployed to WindowsCILock July 16, 2025 01:43 — with GitHub Actions Inactive

maarquitos14 reviewed Jul 16, 2025

View reviewed changes

	// do not enqueue successors (there are none).
	// Do not enqueue successors (there are none).

	// then fall back to exec/mem‐scope width as before:
	// Then fall back to exec/mem‐scope width as before:

[SYCL] Add barrier optimization pass #19353

Are you sure you want to change the base?

[SYCL] Add barrier optimization pass #19353

Conversation

MrSidims commented Jul 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MrSidims Jul 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MrSidims Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sarnex left a comment

Choose a reason for hiding this comment

Uh oh!

asudarsa commented Jul 15, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MrSidims Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

MrSidims Jul 13, 2025 •

edited

Loading

MrSidims Jul 15, 2025 •

edited

Loading

MrSidims Jul 15, 2025 •

edited

Loading