Skip to content

[mlir][nvgpu] Rename mbarrier.test.wait to mbarrier.test. #96505

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

chr1sj0nes
Copy link
Contributor

test.wait is inconsistent with try_wait and misleading as it is non-blocking.

`test.wait` is inconsistent with `try_wait` and misleading as it is non-blocking.
@llvmbot
Copy link
Member

llvmbot commented Jun 24, 2024

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-gpu

Author: Chris Jones (chr1sj0nes)

Changes

test.wait is inconsistent with try_wait and misleading as it is non-blocking.


Full diff: https://github.com/llvm/llvm-project/pull/96505.diff

3 Files Affected:

  • (modified) mlir/include/mlir/Dialect/NVGPU/IR/NVGPU.td (+16-16)
  • (modified) mlir/lib/Conversion/NVGPUToNVVM/NVGPUToNVVM.cpp (+6-6)
  • (modified) mlir/test/Conversion/NVGPUToNVVM/nvgpu-to-nvvm.mlir (+3-3)
diff --git a/mlir/include/mlir/Dialect/NVGPU/IR/NVGPU.td b/mlir/include/mlir/Dialect/NVGPU/IR/NVGPU.td
index dda8f31e688fe..a128acd7d47dd 100644
--- a/mlir/include/mlir/Dialect/NVGPU/IR/NVGPU.td
+++ b/mlir/include/mlir/Dialect/NVGPU/IR/NVGPU.td
@@ -526,22 +526,6 @@ def NVGPU_MBarrierInitOp : NVGPU_Op<"mbarrier.init", []> {
   let assemblyFormat = "$barriers `[` $mbarId `]` `,` $count (`,` `predicate` `=` $predicate^)? attr-dict `:` type($barriers)";
 }
 
-def NVGPU_MBarrierTestWaitOp : NVGPU_Op<"mbarrier.test.wait", []> {
-  let summary = "Checks if the `nvgpu.mbarrier` has completed its current phase.";
-  let description = [{
-    Checks whether the mbarrier object has completed the phase. It is is a 
-    non-blocking instruction which tests for the completion of the phase.
-
-    Example:
-    ```mlir
-      %isComplete = nvgpu.mbarrier.test.wait %barrier, %token : !nvgpu.mbarrier.barrier<memorySpace = #gpu.address_space<workgroup>>, !nvgpu.mbarrier.token
-    ```
-  }];
-  let arguments = (ins NVGPU_MBarrierGroup:$barriers, NVGPU_MBarrierToken:$token, Index:$mbarId);
-  let results = (outs I1:$waitComplete);
-  let assemblyFormat = "$barriers `[` $mbarId `]` `,` $token attr-dict `:` type($barriers) `,` type($token)";
-}
-
 def NVGPU_MBarrierArriveOp : NVGPU_Op<"mbarrier.arrive", []> {
   let summary = "Performs arrive operation on the `nvgpu.mbarrier.arrive`.";
   let description = [{
@@ -601,6 +585,22 @@ def NVGPU_MBarrierArriveExpectTxOp : NVGPU_Op<"mbarrier.arrive.expect_tx", []> {
   let assemblyFormat = "$barriers `[` $mbarId `]` `,` $txcount  (`,` `predicate` `=` $predicate^)? attr-dict `:` type($barriers)";
 }
 
+def NVGPU_MBarrierTestOp : NVGPU_Op<"mbarrier.test", []> {
+  let summary = "Checks if the `nvgpu.mbarrier` has completed its current phase.";
+  let description = [{
+    Checks whether the mbarrier object has completed the phase. It is is a 
+    non-blocking instruction which tests for the completion of the phase.
+
+    Example:
+    ```mlir
+      %isComplete = nvgpu.mbarrier.test %barrier, %token : !nvgpu.mbarrier.barrier<memorySpace = #gpu.address_space<workgroup>>, !nvgpu.mbarrier.token
+    ```
+  }];
+  let arguments = (ins NVGPU_MBarrierGroup:$barriers, NVGPU_MBarrierToken:$token, Index:$mbarId);
+  let results = (outs I1:$waitComplete);
+  let assemblyFormat = "$barriers `[` $mbarId `]` `,` $token attr-dict `:` type($barriers) `,` type($token)";
+}
+
 def NVGPU_MBarrierTryWaitParityOp : NVGPU_Op<"mbarrier.try_wait.parity", []> {
   let summary = "Waits for the `nvgpu.mbarrier` to complete its current phase.";
   let description = [{
diff --git a/mlir/lib/Conversion/NVGPUToNVVM/NVGPUToNVVM.cpp b/mlir/lib/Conversion/NVGPUToNVVM/NVGPUToNVVM.cpp
index 11d29754aa760..f610a0ecfdb7f 100644
--- a/mlir/lib/Conversion/NVGPUToNVVM/NVGPUToNVVM.cpp
+++ b/mlir/lib/Conversion/NVGPUToNVVM/NVGPUToNVVM.cpp
@@ -896,12 +896,12 @@ struct NVGPUMBarrierArriveNoCompleteLowering
   }
 };
 
-/// Lowers `nvgpu.mbarrier.test.wait` to `nvvm.mbarrier.test.wait`
-struct NVGPUMBarrierTestWaitLowering
-    : public MBarrierBasePattern<nvgpu::MBarrierTestWaitOp> {
-  using MBarrierBasePattern<nvgpu::MBarrierTestWaitOp>::MBarrierBasePattern;
+/// Lowers `nvgpu.mbarrier.test` to `nvvm.mbarrier.test.wait`
+struct NVGPUMBarrierTestLowering
+    : public MBarrierBasePattern<nvgpu::MBarrierTestOp> {
+  using MBarrierBasePattern<nvgpu::MBarrierTestOp>::MBarrierBasePattern;
   LogicalResult
-  matchAndRewrite(nvgpu::MBarrierTestWaitOp op, OpAdaptor adaptor,
+  matchAndRewrite(nvgpu::MBarrierTestOp op, OpAdaptor adaptor,
                   ConversionPatternRewriter &rewriter) const override {
     ImplicitLocOpBuilder b(op->getLoc(), rewriter);
     Value barrier =
@@ -1675,7 +1675,7 @@ void mlir::populateNVGPUToNVVMConversionPatterns(LLVMTypeConverter &converter,
       NVGPUMBarrierInitLowering,             // nvgpu.mbarrier.init
       NVGPUMBarrierArriveLowering,           // nvgpu.mbarrier.arrive
       NVGPUMBarrierArriveNoCompleteLowering, // nvgpu.mbarrier.arrive.no_complete
-      NVGPUMBarrierTestWaitLowering,         // nvgpu.mbarrier.test_wait_parity
+      NVGPUMBarrierTestLowering,             // nvgpu.mbarrier.test
       NVGPUMBarrierTryWaitParityLowering,    // nvgpu.mbarrier.try_wait_parity
       NVGPUTmaAsyncLoadOpLowering,           // nvgpu.tma.async.load
       NVGPUTmaAsyncStoreOpLowering,          // nvgpu.tma.async.store
diff --git a/mlir/test/Conversion/NVGPUToNVVM/nvgpu-to-nvvm.mlir b/mlir/test/Conversion/NVGPUToNVVM/nvgpu-to-nvvm.mlir
index 86a552c03a473..2454532d3e11e 100644
--- a/mlir/test/Conversion/NVGPUToNVVM/nvgpu-to-nvvm.mlir
+++ b/mlir/test/Conversion/NVGPUToNVVM/nvgpu-to-nvvm.mlir
@@ -497,7 +497,7 @@ func.func @mbarrier() {
   // CHECK: %[[base3:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)> 
   // CHECK: %[[barPtr3:.+]] = llvm.getelementptr %[[base3]][%[[mid]]] : (!llvm.ptr<3>, i64) -> !llvm.ptr<3>, i64
   // CHECK: nvvm.mbarrier.test.wait.shared %[[barPtr3]], %[[token]]
-  %isDone = nvgpu.mbarrier.test.wait %barrier[%c0], %token : !barrierType, !tokenType
+  %isDone = nvgpu.mbarrier.test %barrier[%c0], %token : !barrierType, !tokenType
 
   func.return 
 }
@@ -527,7 +527,7 @@ func.func @mbarrier_nocomplete() {
   // CHECK: %[[base3:.+]] = llvm.extractvalue %[[barStr]][1] : !llvm.struct<(ptr<3>, ptr<3>, i64, array<1 x i64>, array<1 x i64>)> 
   // CHECK: %[[barPtr3:.+]] = llvm.getelementptr %[[base3]][%[[mid]]] : (!llvm.ptr<3>, i64) -> !llvm.ptr<3>, i64
   // CHECK: nvvm.mbarrier.test.wait.shared %[[barPtr3]], %[[token]]
-  %isDone = nvgpu.mbarrier.test.wait %barrier[%c0], %token : !barrierType, !tokenType
+  %isDone = nvgpu.mbarrier.test %barrier[%c0], %token : !barrierType, !tokenType
 
   func.return 
 }
@@ -552,7 +552,7 @@ func.func @mbarrier_wait(%barriers : !nvgpu.mbarrier.group<memorySpace = #gpu.ad
 // CHECK: %[[S5:.+]] = llvm.getelementptr %[[S4]][%[[S3]]] : (!llvm.ptr<3>, i64) -> !llvm.ptr<3>, i64
 // CHECK: nvvm.mbarrier.test.wait.shared {{.*}}, %[[CARG1]]
     %mbarId = arith.remui %i, %numBarriers : index
-    %isDone = nvgpu.mbarrier.test.wait %barriers[%mbarId], %token : !nvgpu.mbarrier.group<memorySpace = #gpu.address_space<workgroup>, num_barriers = 5>, !tokenType
+    %isDone = nvgpu.mbarrier.test %barriers[%mbarId], %token : !nvgpu.mbarrier.group<memorySpace = #gpu.address_space<workgroup>, num_barriers = 5>, !tokenType
   }
   return
 }

@grypp
Copy link
Member

grypp commented Jun 24, 2024

Hi, thanks for the patch.
I am not seeing the motivation of this patch. It copy-paste the op and lowering to different place. We would like to keep the name of the OP similar to ptx, so the current design looks consistent.

@grypp
Copy link
Member

grypp commented Jun 24, 2024

Nit: We use titles like [mlir][nvgpu]

@chr1sj0nes
Copy link
Contributor Author

I guess I am misunderstanding the purpose of the nvgpu dialect. I thought the nvvm dialect maps 1:1 with PTX but, as nvgpu is slightly higher level, we are not bound by PTX naming here. And in this case, I think the PTX op is very poorly named.

However, if we are going for consistency with PTX, should it not be named mbarrier.test_wait? I guess that is an identical discussion to #96503.

@chr1sj0nes chr1sj0nes changed the title [mlir:nvgpu] Rename mbarrier.test.wait to mbarrier.test. [mlir][nvgpu] Rename mbarrier.test.wait to mbarrier.test. Jun 25, 2024
@grypp
Copy link
Member

grypp commented Jun 25, 2024

We are not bound with PTX naming in nvgpu dialect. But this PR changes the name of the existing OP without a clear motivation.

The nvgpu dialect serves as a bridge the nvvm dialect and higher-level dialects such as memref or vector. While the nvvm dialect is very close to PTX, it still abstracts some elements at a higher level than raw PTX. It's more like 1:N mapping.

The high-level bit in nvgpu's mbarrier OP family is its ability to encapsulate multiple memory barriers into a single SSA value. These barriers can then be indexed and accessed using another SSA value. This level of abstraction and dynamic interaction is not feasible within the nvvm dialect due to its reliance on low-level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants