Open
Description
git version: b2b267e
system: Ubuntu 18.04.6 LTS
Description:
I am experiencing an inconsistent result when executing the same MLIR program with and without --scf-parallel-loop-tiling
--canonicalize
.
The output becomes correct when either of these two options is removed, so I'm unsure which optimization contains the bug.
Steps to Reproduce:
1. MLIR Program (a.mlir):
a.mlir:
module {
func.func private @printMemrefI32(tensor<*xi32>)
func.func private @printMemrefF32(tensor<*xf32>)
func.func @main() {
%7 = "tosa.const"() <{values = dense<6220> : tensor<1x6x6xi32>}> : () -> tensor<1x6x6xi32>
%9 = "tosa.const"() <{values = dense<-298> : tensor<1x6x6xi32>}> : () -> tensor<1x6x6xi32>
%51 = tosa.bitwise_or %7, %9 : (tensor<1x6x6xi32>, tensor<1x6x6xi32>) -> tensor<1x6x6xi32>
%cast = tensor.cast %51 : tensor<1x6x6xi32> to tensor<*xi32>
call @printMemrefI32(%cast) : (tensor<*xi32>) -> ()
return
}
}
2. Command to Run without optimizations :
/data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-opt a.mlir -pass-pipeline="builtin.module(func.func(tosa-to-linalg))" | /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-opt \
-tosa-to-arith -one-shot-bufferize="bufferize-function-boundaries" -convert-linalg-to-parallel-loops \
-convert-index-to-llvm -convert-arith-to-llvm -convert-scf-to-cf -convert-arith-to-llvm \
-convert-cf-to-llvm -finalize-memref-to-llvm -convert-func-to-llvm -lower-affine -convert-arith-to-llvm \
-reconcile-unrealized-casts | timeout 10 /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-cpu-runner -e main \
-entry-point-result=void --shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_c_runner_utils.so \
--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_runner_utils.so \
--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_async_runtime.so
3. Output without optimizations ::
[[[-290, -290, -290, -290, -290, -290],
[-290, -290, -290, -290, -290, -290],
[-290, -290, -290, -290, -290, -290],
[-290, -290, -290, -290, -290, -290],
[-290, -290, -290, -290, -290, -290],
[-290, -290, -290, -290, -290, -290]]]
4. Command to Run with optimizations :
/data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-opt a.mlir -pass-pipeline="builtin.module(func.func(tosa-to-linalg))" | /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-opt \
-tosa-to-arith -one-shot-bufferize="bufferize-function-boundaries" \
-convert-linalg-to-parallel-loops -convert-index-to-llvm -convert-arith-to-llvm \
--scf-parallel-loop-tiling="parallel-loop-tile-sizes=1,4 no-min-max-bounds=true" \
--canonicalize --scf-parallel-loop-tiling="parallel-loop-tile-sizes=1,4 no-min-max-bounds=true" \
-convert-scf-to-cf -convert-arith-to-llvm -convert-cf-to-llvm -finalize-memref-to-llvm \
-convert-func-to-llvm -lower-affine -convert-arith-to-llvm \
-reconcile-unrealized-casts | timeout 10 /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-cpu-runner -e main -entry-point-result=void \
--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_c_runner_utils.so \
--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_runner_utils.so \
--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_async_runtime.so
5. Output with optimizations :
[[[-290, -290, -290, -290, -290, -290],
[-290, -290, -290, -290, -290, -290],
[-290, -290, -290, -290, -290, -290],
[-290, -290, -290, -290, -290, -290],
[0, 0, 0, 32767, -1553697280, 22067],
[-1553697280, 22067, -1553697264, 22067, 0, 0]]]