Skip to content

[flang] Use outermost fir.dummy_scope for TBAA of local allocations. #146006

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 30, 2025

Conversation

vzakhari
Copy link
Contributor

This change only matters when MLIR function inlining kicks in,
and I am still just experimenting with this.

The added LIT test shows the particular example where the current
TBAA assignment is incorrect:

  subroutine bar(this)
    ...
  end subroutine bar
  function foo() result(res)
    type(t) :: res
    call bar(res)
  end function foo
  subroutine test(arg)
    ! %0 = fir.alloca !fir.type<_QMmTt{x:f32}> {bindc_name = ".result"}
    type(t) :: var
    var = foo()
    ! fir.save_result foo's result to %0 (temp-alloca)
    ! fir.declare %0 {uniq_name = ".tmp.func_result"}
    ! load from %0 -> store into var
    arg = var%x
  end subroutine test

One way for FIR inlining to handle foo() call is to substitute
all uses of the res alloca (inside foo) with uses of the temp-alloca
(inside test). This means that the temp-alloca is then used
by the code inside bar (e.g. to store something into it).
After the inlining, the innermost dominating fir.dummy_scope
for fir.declare %0 is the one from bar. If we use this scope
for assigning TBAA to the load from temp-alloca, then it will be
in the same TBAA tree as the stores into this inside bar.
The load and the stores will have different sub-trees, because
this is a dummy argument, and temp-alloca is a local allocation.
So they will appear as non-conflicting while they are accessing
the same memory.

This change makes sure that all local allocations always
use the outermost scope.

This change only matters when MLIR function inlining kicks in,
and I am still just experimenting with this.

The added LIT test shows the particular example where the current
TBAA assignment is incorrect:
```
  subroutine bar(this)
    ...
  end subroutine bar
  function foo() result(res)
    type(t) :: res
    call bar(res)
  end function foo
  subroutine test(arg)
    ! %0 = fir.alloca !fir.type<_QMmTt{x:f32}> {bindc_name = ".result"}
    type(t) :: var
    var = foo()
    ! fir.save_result foo's result to %0 (temp-alloca)
    ! fir.declare %0 {uniq_name = ".tmp.func_result"}
    ! load from %0 -> store into var
    arg = var%x
  end subroutine test
```

One way for FIR inlining to handle `foo()` call is to substitute
all uses of the `res` alloca (inside `foo`) with uses of the temp-alloca
(inside `test`). This means that the temp-alloca is then used
by the code inside bar (e.g. to store something into it).
After the inlining, the innermost dominating fir.dummy_scope
for `fir.declare %0` is the one from `bar`. If we use this scope
for assigning TBAA to the load from temp-alloca, then it will be
in the same TBAA tree as the stores into `this` inside `bar`.
The load and the stores will have different sub-trees, because
`this` is a dummy argument, and temp-alloca is a local allocation.
So they will appear as non-conflicting while they are accessing
the same memory.

This change makes sure that all local allocations always
use the outermost scope.
@vzakhari vzakhari requested a review from jeanPerier June 27, 2025 02:55
@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir labels Jun 27, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 27, 2025

@llvm/pr-subscribers-flang-fir-hlfir

Author: Slava Zakharin (vzakhari)

Changes

This change only matters when MLIR function inlining kicks in,
and I am still just experimenting with this.

The added LIT test shows the particular example where the current
TBAA assignment is incorrect:

  subroutine bar(this)
    ...
  end subroutine bar
  function foo() result(res)
    type(t) :: res
    call bar(res)
  end function foo
  subroutine test(arg)
    ! %0 = fir.alloca !fir.type&lt;_QMmTt{x:f32}&gt; {bindc_name = ".result"}
    type(t) :: var
    var = foo()
    ! fir.save_result foo's result to %0 (temp-alloca)
    ! fir.declare %0 {uniq_name = ".tmp.func_result"}
    ! load from %0 -&gt; store into var
    arg = var%x
  end subroutine test

One way for FIR inlining to handle foo() call is to substitute
all uses of the res alloca (inside foo) with uses of the temp-alloca
(inside test). This means that the temp-alloca is then used
by the code inside bar (e.g. to store something into it).
After the inlining, the innermost dominating fir.dummy_scope
for fir.declare %0 is the one from bar. If we use this scope
for assigning TBAA to the load from temp-alloca, then it will be
in the same TBAA tree as the stores into this inside bar.
The load and the stores will have different sub-trees, because
this is a dummy argument, and temp-alloca is a local allocation.
So they will appear as non-conflicting while they are accessing
the same memory.

This change makes sure that all local allocations always
use the outermost scope.


Full diff: https://github.com/llvm/llvm-project/pull/146006.diff

3 Files Affected:

  • (modified) flang/lib/Optimizer/Transforms/AddAliasTags.cpp (+26-4)
  • (added) flang/test/Transforms/tbaa-for-local-vars.fir (+92)
  • (modified) flang/test/Transforms/tbaa-with-dummy-scope2.fir (+4-5)
diff --git a/flang/lib/Optimizer/Transforms/AddAliasTags.cpp b/flang/lib/Optimizer/Transforms/AddAliasTags.cpp
index 5cfbdc33285f9..4e788f75d9d77 100644
--- a/flang/lib/Optimizer/Transforms/AddAliasTags.cpp
+++ b/flang/lib/Optimizer/Transforms/AddAliasTags.cpp
@@ -73,7 +73,12 @@ class PassState {
   }
 
   void processFunctionScopes(mlir::func::FuncOp func);
-  fir::DummyScopeOp getDeclarationScope(fir::DeclareOp declareOp);
+  // For the given fir.declare returns the dominating fir.dummy_scope
+  // operation.
+  fir::DummyScopeOp getDeclarationScope(fir::DeclareOp declareOp) const;
+  // For the given fir.declare returns the outermost fir.dummy_scope
+  // in the current function.
+  fir::DummyScopeOp getOutermostScope(fir::DeclareOp declareOp) const;
 
 private:
   mlir::DominanceInfo &domInfo;
@@ -119,9 +124,8 @@ void PassState::processFunctionScopes(mlir::func::FuncOp func) {
   }
 }
 
-// For the given fir.declare returns the dominating fir.dummy_scope
-// operation.
-fir::DummyScopeOp PassState::getDeclarationScope(fir::DeclareOp declareOp) {
+fir::DummyScopeOp
+PassState::getDeclarationScope(fir::DeclareOp declareOp) const {
   auto func = declareOp->getParentOfType<mlir::func::FuncOp>();
   assert(func && "fir.declare does not have parent func.func");
   auto &scopeOps = sortedScopeOperations.at(func);
@@ -132,6 +136,15 @@ fir::DummyScopeOp PassState::getDeclarationScope(fir::DeclareOp declareOp) {
   return nullptr;
 }
 
+fir::DummyScopeOp PassState::getOutermostScope(fir::DeclareOp declareOp) const {
+  auto func = declareOp->getParentOfType<mlir::func::FuncOp>();
+  assert(func && "fir.declare does not have parent func.func");
+  auto &scopeOps = sortedScopeOperations.at(func);
+  if (!scopeOps.empty())
+    return scopeOps[0];
+  return nullptr;
+}
+
 class AddAliasTagsPass : public fir::impl::AddAliasTagsBase<AddAliasTagsPass> {
 public:
   void runOnOperation() override;
@@ -279,6 +292,15 @@ void AddAliasTagsPass::runOnAliasInterface(fir::FirAliasTagOpInterface op,
     else
       unknownAllocOp = true;
 
+    if (auto declOp = source.origin.instantiationPoint) {
+      // Use the outermost scope for local allocations,
+      // because using the innermost scope may result
+      // in incorrect TBAA, when calls are inlined in MLIR.
+      auto declareOp = mlir::dyn_cast<fir::DeclareOp>(declOp);
+      assert(declareOp && "Instantiation point must be fir.declare");
+      scopeOp = state.getOutermostScope(declareOp);
+    }
+
     if (unknownAllocOp) {
       LLVM_DEBUG(llvm::dbgs().indent(2)
                  << "WARN: unknown defining op for SourceKind::Allocate " << *op
diff --git a/flang/test/Transforms/tbaa-for-local-vars.fir b/flang/test/Transforms/tbaa-for-local-vars.fir
new file mode 100644
index 0000000000000..82058ffef290a
--- /dev/null
+++ b/flang/test/Transforms/tbaa-for-local-vars.fir
@@ -0,0 +1,92 @@
+// Test TBAA tags assignment for access off the fir.declare
+// placed in the middle of the routine (e.g. a temporary).
+// Original example:
+// module m
+//   type t
+//      real :: x
+//   end type t
+// contains
+//   subroutine bar(this)
+//     class(t), intent(out) :: this
+//     this%x = 1.0
+//   end subroutine bar
+//   function foo() result(res)
+//     type(t) :: res
+//     call bar(res)
+//   end function foo
+//   subroutine test(arg)
+//     type(t) :: var
+//     var = foo()
+//     arg = var%x
+//   end subroutine test
+// end module m
+//
+// The calls were manually inlined in FIR with fir.save_result's
+// destination operand being used instead of the temporary
+// alloca of the result inside foo. Runtime calls were removed
+// to reduce the test.
+// The temporary function result fir.declare is then dominated
+// by fir.dummy_scope of the foo function. If we use this scope
+// to assign the TBAA local-alloc tag to the accesses of
+// the temporary, then it won't conflict with the TBAA dummy tag
+// assigned to the accesses of this argument of bar.
+// That would be incorrect, because they access the same memory.
+// Check that the local-alloc tag is placed into the outermost
+// scope's TBAA tree.
+// RUN: fir-opt --fir-add-alias-tags %s | FileCheck %s
+
+// CHECK: #[[$ATTR_0:.+]] = #llvm.tbaa_root<id = "Flang function root _QMmPtest - Scope 2">
+// CHECK: #[[$ATTR_1:.+]] = #llvm.tbaa_root<id = "Flang function root _QMmPtest">
+// CHECK: #[[$ATTR_2:.+]] = #llvm.tbaa_type_desc<id = "any access", members = {<#[[$ATTR_0]], 0>}>
+// CHECK: #[[$ATTR_3:.+]] = #llvm.tbaa_type_desc<id = "any access", members = {<#[[$ATTR_1]], 0>}>
+// CHECK: #[[$ATTR_4:.+]] = #llvm.tbaa_type_desc<id = "any data access", members = {<#[[$ATTR_2]], 0>}>
+// CHECK: #[[$ATTR_5:.+]] = #llvm.tbaa_type_desc<id = "any data access", members = {<#[[$ATTR_3]], 0>}>
+// CHECK: #[[$ATTR_6:.+]] = #llvm.tbaa_type_desc<id = "dummy arg data", members = {<#[[$ATTR_4]], 0>}>
+// CHECK: #[[$ATTR_7:.+]] = #llvm.tbaa_type_desc<id = "target data", members = {<#[[$ATTR_5]], 0>}>
+// CHECK: #[[$ATTR_9:.+]] = #llvm.tbaa_type_desc<id = "dummy arg data/_QMmFbarEthis", members = {<#[[$ATTR_6]], 0>}>
+// CHECK: #[[$ATTR_10:.+]] = #llvm.tbaa_type_desc<id = "allocated data", members = {<#[[$ATTR_7]], 0>}>
+// CHECK: #[[$ATTR_12:.+]] = #llvm.tbaa_tag<base_type = #[[$ATTR_9]], access_type = #[[$ATTR_9]], offset = 0>
+// CHECK: #[[$ATTR_13:.+]] = #llvm.tbaa_tag<base_type = #[[$ATTR_10]], access_type = #[[$ATTR_10]], offset = 0>
+
+// CHECK-LABEL:   func.func @_QMmPtest(
+// CHECK-SAME:      %[[ARG0:.*]]: !fir.ref<f32> {fir.bindc_name = "arg"}) {
+// CHECK:           %[[VAL_0:.*]] = arith.constant 1.000000e+00 : f32
+// CHECK:           %[[VAL_1:.*]] = fir.alloca !fir.type<_QMmTt{x:f32}> {bindc_name = ".result"}
+// CHECK:           %[[VAL_2:.*]] = fir.dummy_scope : !fir.dscope
+// CHECK:           %[[VAL_3:.*]] = fir.declare %[[ARG0]] dummy_scope %[[VAL_2]] {uniq_name = "_QMmFtestEarg"} : (!fir.ref<f32>, !fir.dscope) -> !fir.ref<f32>
+// CHECK:           %[[VAL_6:.*]] = fir.dummy_scope : !fir.dscope
+// CHECK:           %[[VAL_7:.*]] = fir.declare %[[VAL_1]] {uniq_name = "_QMmFfooEres"} : (!fir.ref<!fir.type<_QMmTt{x:f32}>>) -> !fir.ref<!fir.type<_QMmTt{x:f32}>>
+// CHECK:           %[[VAL_8:.*]] = fir.embox %[[VAL_7]] : (!fir.ref<!fir.type<_QMmTt{x:f32}>>) -> !fir.box<!fir.type<_QMmTt{x:f32}>>
+// CHECK:           %[[VAL_9:.*]] = fir.convert %[[VAL_8]] : (!fir.box<!fir.type<_QMmTt{x:f32}>>) -> !fir.class<!fir.type<_QMmTt{x:f32}>>
+// CHECK:           %[[VAL_10:.*]] = fir.dummy_scope : !fir.dscope
+// CHECK:           %[[VAL_11:.*]] = fir.declare %[[VAL_9]] dummy_scope %[[VAL_10]] {fortran_attrs = #fir.var_attrs<intent_out>, uniq_name = "_QMmFbarEthis"} : (!fir.class<!fir.type<_QMmTt{x:f32}>>, !fir.dscope) -> !fir.class<!fir.type<_QMmTt{x:f32}>>
+// CHECK:           %[[VAL_12:.*]] = fir.coordinate_of %[[VAL_11]], x : (!fir.class<!fir.type<_QMmTt{x:f32}>>) -> !fir.ref<f32>
+// CHECK:           fir.store %[[VAL_0]] to %[[VAL_12]] {tbaa = [#[[$ATTR_12]]]} : !fir.ref<f32>
+// CHECK:           %[[VAL_13:.*]] = fir.declare %[[VAL_1]] {uniq_name = ".tmp.func_result"} : (!fir.ref<!fir.type<_QMmTt{x:f32}>>) -> !fir.ref<!fir.type<_QMmTt{x:f32}>>
+// CHECK:           %[[VAL_14:.*]] = fir.coordinate_of %[[VAL_13]], x : (!fir.ref<!fir.type<_QMmTt{x:f32}>>) -> !fir.ref<f32>
+// CHECK:           %[[VAL_16:.*]] = fir.load %[[VAL_14]] {tbaa = [#[[$ATTR_13]]]} : !fir.ref<f32>
+func.func @_QMmPtest(%arg0: !fir.ref<f32> {fir.bindc_name = "arg"}) {
+  %cst = arith.constant 1.000000e+00 : f32
+  %0 = fir.alloca !fir.type<_QMmTt{x:f32}> {bindc_name = ".result"}
+  %1 = fir.dummy_scope : !fir.dscope
+  %2 = fir.declare %arg0 dummy_scope %1 {uniq_name = "_QMmFtestEarg"} : (!fir.ref<f32>, !fir.dscope) -> !fir.ref<f32>
+  %3 = fir.alloca !fir.type<_QMmTt{x:f32}> {bindc_name = "var", uniq_name = "_QMmFtestEvar"}
+  %4 = fir.declare %3 {uniq_name = "_QMmFtestEvar"} : (!fir.ref<!fir.type<_QMmTt{x:f32}>>) -> !fir.ref<!fir.type<_QMmTt{x:f32}>>
+  %5 = fir.dummy_scope : !fir.dscope
+  %6 = fir.declare %0 {uniq_name = "_QMmFfooEres"} : (!fir.ref<!fir.type<_QMmTt{x:f32}>>) -> !fir.ref<!fir.type<_QMmTt{x:f32}>>
+  %7 = fir.embox %6 : (!fir.ref<!fir.type<_QMmTt{x:f32}>>) -> !fir.box<!fir.type<_QMmTt{x:f32}>>
+  %8 = fir.convert %7 : (!fir.box<!fir.type<_QMmTt{x:f32}>>) -> !fir.class<!fir.type<_QMmTt{x:f32}>>
+  %9 = fir.dummy_scope : !fir.dscope
+  %10 = fir.declare %8 dummy_scope %9 {fortran_attrs = #fir.var_attrs<intent_out>, uniq_name = "_QMmFbarEthis"} : (!fir.class<!fir.type<_QMmTt{x:f32}>>, !fir.dscope) -> !fir.class<!fir.type<_QMmTt{x:f32}>>
+  %14 = fir.coordinate_of %10, x : (!fir.class<!fir.type<_QMmTt{x:f32}>>) -> !fir.ref<f32>
+  fir.store %cst to %14 : !fir.ref<f32>
+  %15 = fir.declare %0 {uniq_name = ".tmp.func_result"} : (!fir.ref<!fir.type<_QMmTt{x:f32}>>) -> !fir.ref<!fir.type<_QMmTt{x:f32}>>
+  %16 = fir.coordinate_of %15, x : (!fir.ref<!fir.type<_QMmTt{x:f32}>>) -> !fir.ref<f32>
+  %17 = fir.coordinate_of %4, x : (!fir.ref<!fir.type<_QMmTt{x:f32}>>) -> !fir.ref<f32>
+  %18 = fir.load %16 : !fir.ref<f32>
+  fir.store %18 to %17 : !fir.ref<f32>
+  %19 = fir.load %17 : !fir.ref<f32>
+  fir.store %19 to %2 : !fir.ref<f32>
+  return
+}
+
diff --git a/flang/test/Transforms/tbaa-with-dummy-scope2.fir b/flang/test/Transforms/tbaa-with-dummy-scope2.fir
index 249471de458d3..fd711a4d70eb4 100644
--- a/flang/test/Transforms/tbaa-with-dummy-scope2.fir
+++ b/flang/test/Transforms/tbaa-with-dummy-scope2.fir
@@ -84,18 +84,17 @@ func.func @_QPtest2() attributes {noinline} {
   fir.store %c2_i32 to %2 : !fir.ref<i32>
   return
 }
-// CHECK: #[[$ATTR_0:.+]] = #llvm.tbaa_root<id = "Flang function root _QPtest2 - Scope 1">
-// CHECK: #[[$ATTR_1:.+]] = #llvm.tbaa_root<id = "Flang function root _QPtest2">
+// CHECK: #[[$ATTR_0:.+]] = #llvm.tbaa_root<id = "Flang function root _QPtest2">
+// CHECK: #[[$ATTR_1:.+]] = #llvm.tbaa_root<id = "Flang function root _QPtest2 - Scope 1">
 // CHECK: #[[$ATTR_2:.+]] = #llvm.tbaa_type_desc<id = "any access", members = {<#[[$ATTR_0]], 0>}>
 // CHECK: #[[$ATTR_3:.+]] = #llvm.tbaa_type_desc<id = "any access", members = {<#[[$ATTR_1]], 0>}>
 // CHECK: #[[$ATTR_4:.+]] = #llvm.tbaa_type_desc<id = "any data access", members = {<#[[$ATTR_2]], 0>}>
 // CHECK: #[[$ATTR_5:.+]] = #llvm.tbaa_type_desc<id = "any data access", members = {<#[[$ATTR_3]], 0>}>
 // CHECK: #[[$TARGETDATA_0:.+]] = #llvm.tbaa_type_desc<id = "target data", members = {<#[[$ATTR_4]], 0>}>
-// CHECK: #[[$ATTR_6:.+]] = #llvm.tbaa_type_desc<id = "dummy arg data", members = {<#[[$ATTR_4]], 0>}>
-// CHECK: #[[$TARGETDATA_1:.+]] = #llvm.tbaa_type_desc<id = "target data", members = {<#[[$ATTR_5]], 0>}>
+// CHECK: #[[$ATTR_6:.+]] = #llvm.tbaa_type_desc<id = "dummy arg data", members = {<#[[$ATTR_5]], 0>}>
 // CHECK: #[[$LOCAL_ATTR_0:.+]] = #llvm.tbaa_type_desc<id = "allocated data", members = {<#[[$TARGETDATA_0]], 0>}>
 // CHECK: #[[$ATTR_8:.+]] = #llvm.tbaa_type_desc<id = "dummy arg data/_QFtest2FinnerEx", members = {<#[[$ATTR_6]], 0>}>
-// CHECK: #[[$ATTR_7:.+]] = #llvm.tbaa_type_desc<id = "global data", members = {<#[[$TARGETDATA_1]], 0>}>
+// CHECK: #[[$ATTR_7:.+]] = #llvm.tbaa_type_desc<id = "global data", members = {<#[[$TARGETDATA_0]], 0>}>
 // CHECK: #[[$ATTR_10:.+]] = #llvm.tbaa_tag<base_type = #[[$ATTR_8]], access_type = #[[$ATTR_8]], offset = 0>
 // CHECK: #[[$LOCAL_ATTR_1:.+]] = #llvm.tbaa_type_desc<id = "allocated data/_QFtest2FinnerEy", members = {<#[[$LOCAL_ATTR_0]], 0>}>
 // CHECK: #[[$ATTR_9:.+]] = #llvm.tbaa_type_desc<id = "global data/_QMmEglob", members = {<#[[$ATTR_7]], 0>}>

Copy link
Contributor

@jeanPerier jeanPerier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Slava!
Just one question to better understand the change.

Also, I think that if this change were to pessimize performance, we could also maybe fix the situation by moving the fir.declare of the result before the call. With your explanation, I realize that the situation is likely very specific to function results, because these are the only temps that we pass in a call and for which we emit the declare after the call. It is likely not a correctness issue to mislabel temps created in test after the call to foo as belonging to bar since they cannot be used in bar via the dummy (since the alloca will always be after the call).
Although it is still weird to label such temps from test as belonging to bar, so your change makes more sense to me.

// operation.
fir::DummyScopeOp PassState::getDeclarationScope(fir::DeclareOp declareOp) {
fir::DummyScopeOp
PassState::getDeclarationScope(fir::DeclareOp declareOp) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the usage of this function not replaced by getOutermostScope?
Is it invalid to use getOutermostScope without local tbaa?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moving the fir.declare of the result before the call

Right, I got the same idea while writing the commit message :) this will definitely resolve the issue in the test, but I do not want to introduce such a requirement on the placement of fir.declare. So using the outer scope seems to be a more sustainable solution.

I think using getOutermostScope for dummy arguments would be incorrect. E.g. if we create TBAA tags for the dummies of the callee in the same tree as the caller's dummy TBAA tags, that will tell that all the dummies do not conflict. This would be incorrect in many cases.

@vzakhari vzakhari merged commit 90da616 into llvm:main Jun 30, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants