Skip to content

Commit

Permalink
Default flags: enable function outlining and use -O3 for peano's opt (
Browse files Browse the repository at this point in the history
nod-ai#950)

This PR changes 2 default compiler settings to 

1: _enable_ function outlining by default
2: use `-O3` in peano's opt by default 

For one one our benchmarks (bf16 -> f32 matmul, M=4096 N=K=512), I
observe the following times (in milliseconds).

| Optimization Level | Outlining      | No Outlining      |
|---------------------|----------------|--------------------|
| -O2                | 10.7           | 5.4               |
| -O3                | 4.0            | 5.4               |


This PR effectively goes from the top-right above to the bottom-left
above. The 1.4 ms speed-up is not in the margin of noise. I can't
explain it yet, still investigating... to verify locally, you can try

```
./run.py outdir ${IREE} --peano_dir=$PEANO} --target_device=npu1_4col \
 --xrt_lite_n_core_rows=4 --xrt_lite_n_core_cols=4 \
 --tests=vanilla_matmul_benchmark_4096_512_512_bf16_f32 --verbose \
 --aie_compilation_flags="--iree-amdaie-enable-function-outlining=0  --iree-amd-aie-additional-peano-opt-flags="-O3\""
```

And vary the value of `--iree-amdaie-enable-function-outlining=.` and
`--iree-amd-aie-additional-peano-opt-flags=.`

Perf numbers also in CI.
  • Loading branch information
newling authored Dec 6, 2024
1 parent 7c00e86 commit 31b2cdb
Show file tree
Hide file tree
Showing 4 changed files with 109 additions and 16 deletions.
110 changes: 102 additions & 8 deletions build_tools/ci/cpu_comparison/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -1439,15 +1439,103 @@ def __init__(self):
)
)

performance_tests = [
{
"M": 512,
"N": 512,
"K": 4096,
"use_ukernel": False,
"peano_opt_level": 2,
"outline": False,
},
{
"M": 512,
"N": 512,
"K": 4096,
"use_ukernel": False,
"peano_opt_level": 2,
"outline": True,
},
{
"M": 512,
"N": 512,
"K": 4096,
"use_ukernel": False,
"peano_opt_level": 3,
"outline": False,
},
{
"M": 512,
"N": 512,
"K": 4096,
"use_ukernel": False,
"peano_opt_level": 3,
"outline": True,
},
{
"M": 512,
"N": 512,
"K": 4096,
"use_ukernel": True,
"peano_opt_level": 3,
"outline": True,
},
{
"M": 512,
"N": 4096,
"K": 512,
"use_ukernel": False,
"peano_opt_level": 3,
"outline": True,
},
{
"M": 512,
"N": 4096,
"K": 512,
"use_ukernel": True,
"peano_opt_level": 3,
"outline": True,
},
{
"M": 4096,
"N": 512,
"K": 512,
"use_ukernel": False,
"peano_opt_level": 3,
"outline": True,
},
{
"M": 4096,
"N": 512,
"K": 512,
"use_ukernel": True,
"peano_opt_level": 3,
"outline": True,
},
]

# Some bf16 Performance tests:
for M, N, K, use_ukernel in [
(512, 512, 4096, False),
(512, 512, 4096, True),
(512, 4096, 512, False),
(512, 4096, 512, True),
(4096, 512, 512, False),
(4096, 512, 512, True),
]:
for test in performance_tests:
M = test["M"]
N = test["N"]
K = test["K"]
use_ukernel = test["use_ukernel"]
peano_opt_level = test["peano_opt_level"]
outline = test["outline"]

outlining_string = "--iree-amdaie-enable-function-outlining=" + str(
int(outline)
)
peano_opt_level_string = f'"-O{peano_opt_level}"'
aie_compilation_flags = [
outlining_string,
f"--iree-amd-aie-additional-peano-opt-flags={peano_opt_level_string}",
]

name_suffix = "O" + str(peano_opt_level)
if outline:
name_suffix += "_outline"

self.register(
Matmul(
M,
Expand All @@ -1457,8 +1545,12 @@ def __init__(self):
"f32",
use_ukernel=use_ukernel,
n_repeats=2,
aie_compilation_flags=aie_compilation_flags,
name_suffix=name_suffix,
additional_labels=["PerformanceCorrectness"],
)
)

self.register(
MatmulBenchmark(
M,
Expand All @@ -1470,6 +1562,8 @@ def __init__(self):
use_ukernel=use_ukernel,
n_repeats=5,
n_kernel_runs=100,
aie_compilation_flags=aie_compilation_flags,
name_suffix=name_suffix,
)
)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ struct AMDAIEOptions {
bool enableVectorizationPasses{true};
bool enableCoalescingLoops{false};
bool enableCollapsingUnitDims{false};
bool enableFunctionOutlining{false};
bool enableFunctionOutlining{true};
bool insertLoopAroundCoreBlock{false};
bool matmulElementwiseFusion{false};
AMDAIEDevice AMDAIETargetDevice{AMDAIEDevice::npu1_4col};
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ FailureOr<std::vector<std::string>> makePeanoOptArgs(
// Extend the max limit of the search depth in BasicAA
"-basic-aa-max-lookup-search-depth=10",
//
"-O2",
"-O3",
//
"--inline-threshold=10",
// missing from libc
Expand Down
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
// This test demonstrates enabling / disabling function outlining in the default
// pipeline (note that below the pipeline is not specified explicitly with
// the flag --iree-amdaie-tile-pipeline). We check 3 paths:
// pipeline. We check 3 paths:
//
// 1) Explicitly disabling linalg function outlining with
// --iree-amdaie-enable-function-outlining=0
// --iree-amdaie-enable-function-outlining=0
//
// 2) Explicitly enabling linalg function outlining with
// --iree-amdaie-enable-function-outlining=1
// --iree-amdaie-enable-function-outlining=1
//
// 3) Not specifying the flag at all, which should use the default value (0).
// 3) Not specifying the flag at all, which should use the default value (1).


// 1) Explicitly disabled:
Expand Down Expand Up @@ -36,4 +35,4 @@ func.func @matmul(%lhs: tensor<64x64xbf16>,

// CHECK-DISABLED-NOT: func.call
// CHECK-ENABLED: func.call
// CHECK-DEFAULT-NOT: func.call
// CHECK-DEFAULT: func.call

0 comments on commit 31b2cdb

Please sign in to comment.