Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIEX] Scheduler improvements #147

Merged
merged 4 commits into from
Aug 9, 2024
Merged

[AIEX] Scheduler improvements #147

merged 4 commits into from
Aug 9, 2024

Conversation

gbossu
Copy link
Collaborator

@gbossu gbossu commented Aug 7, 2024

Pre-RA: More conservative scheduling when under high register pressure. This is very helpful for SW pipelining. I'll have another PR which makes the MachinePipeliner find more schedules, and this PR here helps us not spill

Post-RA: Change the loop-aware scheduling to have an "expensive convergence" mode, when we increase the latency safety margin per instruciton, instead of for all instructions.

QoR results below. Overall it's good. There are some regressions, but we'll get rid of them with less unrolling and more SWP.

| Core_Compute_Cycle_Count               | Erf_aie2_int8_0_ptr_interface | Erf_aie2_int8_0 | SigmoidTemplated_bf16_0 | Conv2D_DW_bf16_1 | Floor_aie2_0  | Hardswish_aie2_1 | HardswishAsHardsigmoid_aie2_1 | HardswishAsHardsigmoid_aie2_0 | Hardswish_aie2_0 | AvgPool2D_aie2_bfloat16_0 | AvgPool2D_aie2_bfloat16_1 | Requantize_0 | Clip_aie2_int8 | Requantize_1 | SiLU_aie2_bf16 | Sub_aie2_bf16_0 | SubBroadcasting_aie2_bf16_0 | AddBf16_aie2_0 | AddAttributeBroadcasting_aie2_bf16 | SubAttributeBroadcasting_aie2_bf16_0 | AddBroadcastingBf16_aie2_0 | Scale_Add_bf16_0 | Scale_Add_bf16_1 | FullyConnect_aie2_int8 | BitwiseNot_aie2_0 | LogicalNot_aie2_0 | DivAttributeBroadcasting_aie2_bf16_0 | DivBroadcasting_aie2_1 | Abs_int8_0   | ElemDiv_aie2_1 | Sign_bf16_0  | Neg_aie2_0   | Conv2D_DW_bf16_0 | Elu_aie2_int8_0 | ReduceSumAxis_1_aie2_int8 | ReduceSumAxis_2_aie2_int8 | ReduceSumAxis_4_aie2_int8 | ArgMin1d_bf16_0 | Log_bf16_0   | Conv2D_Transpose_AIE2_0 | Conv2D_Transpose_AIE2_1 | Sign_bf16_1  | BitShift_AIE2_int8 | SiLU_aie2_int8_1 | SiLU_aie2_int8 | LayerNormC8Part1_aie2_int8_0 | DivAttributeBroadcasting_aie2_int8_0 | ElemDiv_aie2_0 | DivBroadcasting_aie2_0 | ReduceMeanAxis_1_aie2_int8 | ReduceMeanAxis_2_aie2_int8 | ReduceMeanAxis_4_aie2_int8 | Conv2D_DW_1  | Conv2D_bf16_0 | Add2D_bf16_0 | ArgMax1d_bf16_0 | ReduceSumAxis_6_aie2_int8 | ReduceSumAxis_5_aie2_int8 | Conv2D_DW_0  | ReduceSumAxis_3_aie2_int8 | Add2D_bf16_1 | Conv2D_bf16_1 | ArgMin1d_int8_0 | ReduceMeanAxis_6_aie2_int8 | ReduceMeanAxis_3_aie2_int8 | ReduceMeanAxis_5_aie2_int8 | LayerNormC8Part1_aie2_bf16_0 | Softmax_1    | GEMM_int8_1   | ReduceSumAxis_2_aie2_bf16 | Exp_bf16_1   | Mish_aie2_int8 | ReduceSumAxis_3_aie2_bf16 | InstanceNormPart2_aie2_bf16_0 | ReduceSumAxis_7_aie2_bf16 | ReduceSumAxis_6_aie2_bf16 | LayerNorm_0   | Conv2D_11x11s4_1 | Mish_aie2_bfloat16 | ReduceMeanAxis_7_aie2_int8 | LayerNorm_1   | Abs_bf16_0   | ArgMax1d_int8_0 | AvgPool2dVariant_aie2_bf16_0 | AvgPool2dVariant_aie2_bf16_1 | AvgPool2dVariant_aie2_int8_0 | AvgPool2dVariant_aie2_int8_1 | BatchNorm1d_aie2_bfloat16 | BatchNorm1d_aie2_int8 | BatchNorm2D_0 | BatchNorm2D_1 | BitwiseAnd_int8_0 | BitwiseOr_int8_0 | BitwiseXor_aie2_int8 | Cast_aie2_bfloat16 | Cast_aie2_bfloat16_1 | Cast_aie2_int8 | Cast_aie2_int8_1 | Ceil_AIE2_bfloat16 | Ceil_AIE2_int8 | Clip_aie2_bf16 | CompareOpsBroadcasting_K_EQ_GE_GT_LE_LT_CMP_GE_int8_aie2 | CompareOps_K_EQ_GE_GT_LE_LT_CMP_EQ_int8_aie2 | CompareOps_K_EQ_GE_GT_LE_LT_CMP_GE_int8_aie2 | CompareOps_K_EQ_GE_GT_LE_LT_CMP_GE_int8_aie2_ptr_interface | DegroupG4_aie2_bf16_0 | DegroupG4_aie2_bf16_1 | DegroupG4_aie2_int8_0 | DegroupG4_aie2_int8_1 | DegroupG8_aie2_bf16_0 | DegroupG8_aie2_bf16_1 | DegroupG8_aie2_int8_0 | DegroupG8_aie2_int8_1 | EleMax_aie2_int8 | EleMin_aie2_int8 | Erf_aie2_bf16_0 | Exp_bf16_0   | Expand_aie2_bfloat16 | Expand_aie2_int8 | Floor_aie2_1 | GELU_0       | GELU_1       | GEMV_0       | GEMV_1       | GeluTemplated_aie2_bf16 | GeluTemplated_aie2_int8 | GroupG4_aie2_bf16_0 | GroupG4_aie2_bf16_1 | GroupG4_aie2_int8_0 | GroupG4_aie2_int8_1 | GroupG8_aie2_bf16_0 | GroupG8_aie2_bf16_1 | GroupG8_aie2_int8_0 | GroupG8_aie2_int8_1 | HardSigmoidTemplated_bf16_0 | HardSigmoidTemplated_int8_0 | InstanceNormPart1_aie2_bf16_0 | InstanceNormPart2_aie2_int8_0 | InterpolateLinear1D_AIE2_bfloat16 | InterpolateLinear1D_AIE2_int8 | LogicalXor_aie2_int8 | MaxPool2D_0  | MaxPool2D_1  | Mul2D_0      | Mul2D_1      | MulAttributeBroadcasting_aie2_bf16_0 | MulBf16_aie2_0 | MulBroadcastingBf16_aie2_0 | Neg_aie2_1   | Pad2D_bf16_0 | Pad3D_AIE2_bfloat16 | Pad3D_AIE2_int8 | PixelShuffle_aie2_bf16 | PixelShuffle_aie2_int8 | PowAttributeBroadcasting_aie2_bf16_0 | PowAttributeBroadcasting_aie2_int8_0 | Pow_int8_0   | Range_bfloat16_aie2_0 | Range_bfloat16_aie2_1 | Range_int8_aie2_0 | Range_int8_aie2_1 | Reciprocal_aie2_0 | Reciprocal_aie2_1 | ReduceMax_bf16_0 | ReduceMax_int8_0 | ReduceMax_int8_1 | ReduceMeanAxis_1_aie2_bf16 | ReduceMeanAxis_2_aie2_bf16 | ReduceMeanAxis_3_aie2_bf16 | ReduceMeanAxis_4_aie2_bf16 | ReduceMeanAxis_5_aie2_bf16 | ReduceMeanAxis_6_aie2_bf16 | ReduceMeanNoc8_AIE2_bfloat16 | ReduceMeanNoc8_AIE2_int8 | ReduceMin1D_aie2_bf16 | ReduceMin1D_aie2_int8 | ReduceMin_bf16_0 | ReduceMin_int8_0 | ReduceMin_int8_1 | ReduceSumAxis_7_aie2_int8 | ReduceSum_bf16_0 | ReduceSum_int8_0 | ReduceSum_int8_1 | Rescale_aie2_int8_0 | Round_aie2_0 | Rsqrt_aie2_bf16_0 | Scale_Add_0  | Scale_Add_1  | Select_aie2_bf16 | Select_aie2_int8 | SigmoidTemplated_int8_0 | SigmoidTemplated_int8_1 | Sigmoid_bf16_0 | Sigmoid_bf16_1 | Sign_int8_0  | Sign_int8_1  | Sin_aie2_bf16 | Sin_aie2_int8 | Slice_bfloat16_0 | Slice_int8_0 | Softmax_bf16_1 | Sqrt_bf16_0   | Sqrt_bf16_1  | Sqrt_int8_0   | Sqrt_int8_1   | Squeeze_bfloat16_0 | Squeeze_int8_0 | TanhTemplated_aie2_bfloat16 | Tanh_0       | Tanh_1       | Tile_aie2_bf16_0 | Tile_aie2_int8_1 | Topk1D_bf16_0 | Topk1D_bf16_1 | Topk1D_int8_0 | Topk1D_int8_1 | Topk2D_bf16_0 | Topk2D_bf16_1 | Topk2D_int8_0 | Topk2D_int8_1 | Transpose_aie2_bf16_021 | Transpose_aie2_bf16_021_pad | Transpose_aie2_bf16_102 | Transpose_aie2_bf16_102_pad | Transpose_aie2_bf16_120 | Transpose_aie2_bf16_120_pad | Transpose_aie2_bf16_201 | Transpose_aie2_bf16_201_pad | Transpose_aie2_bf16_210 | Transpose_aie2_bf16_210_pad | Transpose_aie2_int8_021 | Transpose_aie2_int8_021_pad | Transpose_aie2_int8_102 | Transpose_aie2_int8_102_pad | Transpose_aie2_int8_120 | Transpose_aie2_int8_120_pad | Transpose_aie2_int8_201 | Transpose_aie2_int8_201_pad | Transpose_aie2_int8_210 | Transpose_aie2_int8_210_pad | ReduceSumAxis_4_aie2_bf16 | PixelUnshuffle_bf16_0 | PixelUnshuffle_int8_0 | ReduceSumAxis_5_aie2_bf16 | Softmax_bf16_0 | Conv2D_Transpose_bf16_AIE2_1 | Conv2D_Transpose_bf16_AIE2_0 | ReduceMax_bf16_1 | ReduceMeanAxis_7_aie2_bf16 | ReduceMin_bf16_1 | ReduceSumAxis_1_aie2_bf16 | InstanceNormPart1_aie2_int8_0 | Conv2D_1     | ReduceSum_bf16_1 | Conv2D_11x11s4_0 | Elu_aie2_bf16_0 | CompareOpsBroadcasting_K_EQ_GE_GT_LE_LT_CMP_GE_bfloat16_aie2 | CompareOps_K_EQ_GE_GT_LE_LT_CMP_EQ_bfloat16_aie2 | CompareOps_K_EQ_GE_GT_LE_LT_CMP_GE_bfloat16_aie2 | Conv2D_ReLU_int8_0 | Conv2D_2x8_0 | GEMM_bf16_0  | Conv1D_DW_AIE2_bf16_0 | Conv1D_DW_AIE2_bf16_1 | Pow_bf16_0    | DilatedConv2D_1 | TanhTemplated_aie2_int8 | GEMM_int8_0  | Tanh_int8_0  | Tanh_int8_1  | Rsqrt_aie2_int8_0 | Conv2D_LReLU_0 | Conv2D_0     | Conv2D_mixed_batch_1 | Conv2D_ReLU_Standalone_1 | GEMM_bf16_1  | FullyConnect_aie2_bf16 | BilinearInterpolation_1 | Conv2D_ReLU_int8_1 | Conv2D_ReLU_0 | Conv2D_ReLU_Standalone_0 | Conv2D_FC_1  | Mul2d_bf16_0 | Add2D_0      | Add2D_Standalone_0 | LayerNormC8Part2_aie2_bf16_0 | Conv2D_FC_0  | Mul2d_bf16_1 | Add2D_Standalone_1 | Shrink_aie2_1 | Conv2D_SV60  | LayerNormC8Part2_aie2_int8_0 | BilinearInterpolation_0 | SubBroadcasting_aie2_int8_0 | SubBroadcasting_aie2_int8_0_ptr_interface | Group_Conv2D_1 | AddAttributeBroadcasting_aie2_int8 | SubAttributeBroadcasting_aie2_int8_0 | AddBroadcasting_aie2_0 | Conv1D_DW_AIE2_int8_0 | Sub_aie2_int8_0 | Sub_aie2_int8_0_ptr_interface | Add_aie2_0   | Conv2D_LReLU_1 | Group_Conv2D_0 | Conv1D_DW_AIE2_int8_1 | int8         | Conv2D_7x7s2_Layer1_1 | Conv2D_ReLU_1 | Conv2D_mixed_batch_0 | HardSigmoid_bf16_1 | Conv2D_7x7s2_Layer1_0 | HardSigmoid_bf16_0 | Conv2D_11x11s4_Layer1_1 | Conv2D_11x11s4_Layer1_0 | Round_aie2_1 | Sigmoid_int8_1 | bfloat16      | Sigmoid_int8_0 | MulBroadcasting_aie2_0 | MulAttributeBroadcasting_aie2_int8_0 | Mul_aie2_0    | Conv2D_2x8_1  | HardSigmoid_int8_0 | HardSigmoid_int8_1 | Log_int8_0    | EleMax_aie2_bfloat16 | EleMin_aie2_bfloat16 | Shrink_aie2_0 | AvgPool2D_1   | AvgPool2D_aie2_int8_1 | AvgPool2D_0   | AvgPool2D_aie2_int8_0 | Averege diff | Diff stdev | Quantile #1 | Quantile #2 | Quantile #3 | Quantile #4 | Quantile #5 | Quantile #6 | Quantile #7 | Quantile #8 | Quantile #9 |
| -------------------------------------- | ----------------------------- | --------------- | ----------------------- | ---------------- | ------------- | ---------------- | ----------------------------- | ----------------------------- | ---------------- | ------------------------- | ------------------------- | ------------ | -------------- | ------------ | -------------- | --------------- | --------------------------- | -------------- | ---------------------------------- | ------------------------------------ | -------------------------- | ---------------- | ---------------- | ---------------------- | ----------------- | ----------------- | ------------------------------------ | ---------------------- | ------------ | -------------- | ------------ | ------------ | ---------------- | --------------- | ------------------------- | ------------------------- | ------------------------- | --------------- | ------------ | ----------------------- | ----------------------- | ------------ | ------------------ | ---------------- | -------------- | ---------------------------- | ------------------------------------ | -------------- | ---------------------- | -------------------------- | -------------------------- | -------------------------- | ------------ | ------------- | ------------ | --------------- | ------------------------- | ------------------------- | ------------ | ------------------------- | ------------ | ------------- | --------------- | -------------------------- | -------------------------- | -------------------------- | ---------------------------- | ------------ | ------------- | ------------------------- | ------------ | -------------- | ------------------------- | ----------------------------- | ------------------------- | ------------------------- | ------------- | ---------------- | ------------------ | -------------------------- | ------------- | ------------ | --------------- | ---------------------------- | ---------------------------- | ---------------------------- | ---------------------------- | ------------------------- | --------------------- | ------------- | ------------- | ----------------- | ---------------- | -------------------- | ------------------ | -------------------- | -------------- | ---------------- | ------------------ | -------------- | -------------- | -------------------------------------------------------- | -------------------------------------------- | -------------------------------------------- | ---------------------------------------------------------- | --------------------- | --------------------- | --------------------- | --------------------- | --------------------- | --------------------- | --------------------- | --------------------- | ---------------- | ---------------- | --------------- | ------------ | -------------------- | ---------------- | ------------ | ------------ | ------------ | ------------ | ------------ | ----------------------- | ----------------------- | ------------------- | ------------------- | ------------------- | ------------------- | ------------------- | ------------------- | ------------------- | ------------------- | --------------------------- | --------------------------- | ----------------------------- | ----------------------------- | --------------------------------- | ----------------------------- | -------------------- | ------------ | ------------ | ------------ | ------------ | ------------------------------------ | -------------- | -------------------------- | ------------ | ------------ | ------------------- | --------------- | ---------------------- | ---------------------- | ------------------------------------ | ------------------------------------ | ------------ | --------------------- | --------------------- | ----------------- | ----------------- | ----------------- | ----------------- | ---------------- | ---------------- | ---------------- | -------------------------- | -------------------------- | -------------------------- | -------------------------- | -------------------------- | -------------------------- | ---------------------------- | ------------------------ | --------------------- | --------------------- | ---------------- | ---------------- | ---------------- | ------------------------- | ---------------- | ---------------- | ---------------- | ------------------- | ------------ | ----------------- | ------------ | ------------ | ---------------- | ---------------- | ----------------------- | ----------------------- | -------------- | -------------- | ------------ | ------------ | ------------- | ------------- | ---------------- | ------------ | -------------- | ------------- | ------------ | ------------- | ------------- | ------------------ | -------------- | --------------------------- | ------------ | ------------ | ---------------- | ---------------- | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | ----------------------- | --------------------------- | ----------------------- | --------------------------- | ----------------------- | --------------------------- | ----------------------- | --------------------------- | ----------------------- | --------------------------- | ----------------------- | --------------------------- | ----------------------- | --------------------------- | ----------------------- | --------------------------- | ----------------------- | --------------------------- | ----------------------- | --------------------------- | ------------------------- | --------------------- | --------------------- | ------------------------- | -------------- | ---------------------------- | ---------------------------- | ---------------- | -------------------------- | ---------------- | ------------------------- | ----------------------------- | ------------ | ---------------- | ---------------- | --------------- | ------------------------------------------------------------ | ------------------------------------------------ | ------------------------------------------------ | ------------------ | ------------ | ------------ | --------------------- | --------------------- | ------------- | --------------- | ----------------------- | ------------ | ------------ | ------------ | ----------------- | -------------- | ------------ | -------------------- | ------------------------ | ------------ | ---------------------- | ----------------------- | ------------------ | ------------- | ------------------------ | ------------ | ------------ | ------------ | ------------------ | ---------------------------- | ------------ | ------------ | ------------------ | ------------- | ------------ | ---------------------------- | ----------------------- | --------------------------- | ----------------------------------------- | -------------- | ---------------------------------- | ------------------------------------ | ---------------------- | --------------------- | --------------- | ----------------------------- | ------------ | -------------- | -------------- | --------------------- | ------------ | --------------------- | ------------- | -------------------- | ------------------ | --------------------- | ------------------ | ----------------------- | ----------------------- | ------------ | -------------- | ------------- | -------------- | ---------------------- | ------------------------------------ | ------------- | ------------- | ------------------ | ------------------ | ------------- | -------------------- | -------------------- | ------------- | ------------- | --------------------- | ------------- | --------------------- | ------------ | ---------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- |
| Baseline                               | 2028(+0.00%)                  | 2049(+0.00%)    | 2881(+0.00%)            | 4398(+0.00%)     | 345(+0.00%)   | 4003(+0.00%)     | 4008(+0.00%)                  | 1683(+0.00%)                  | 1683(+0.00%)     | 2722(+0.00%)              | 1844(+0.00%)              | 1893(+0.00%) | 283(+0.00%)    | 1013(+0.00%) | 4084(+0.00%)   | 782(+0.00%)     | 799(+0.00%)                 | 804(+0.00%)    | 820(+0.00%)                        | 820(+0.00%)                          | 821(+0.00%)                | 1147(+0.00%)     | 1147(+0.00%)     | 673(+0.00%)            | 199(+0.00%)       | 185(+0.00%)       | 5697(+0.00%)                         | 1473(+0.00%)           | 477(+0.00%)  | 1450(+0.00%)   | 1020(+0.00%) | 580(+0.00%)  | 1164(+0.00%)     | 635(+0.00%)     | 8325(+0.00%)              | 8364(+0.00%)              | 8395(+0.00%)              | 390(+0.00%)     | 4854(+0.00%) | 39538(+0.00%)           | 10636(+0.00%)           | 180(+0.00%)  | 1985(+0.00%)       | 4001(+0.00%)     | 4003(+0.00%)   | 7538(+0.00%)                 | 8256(+0.00%)                         | 2093(+0.00%)   | 2113(+0.00%)           | 8517(+0.00%)               | 8558(+0.00%)               | 8507(+0.00%)               | 763(+0.00%)  | 31380(+0.00%) | 166(+0.00%)  | 352(+0.00%)     | 4073(+0.00%)              | 4091(+0.00%)              | 2857(+0.00%) | 4087(+0.00%)              | 226(+0.00%)  | 55120(+0.00%) | 305(+0.00%)     | 4131(+0.00%)               | 4135(+0.00%)               | 4153(+0.00%)               | 8648(+0.00%)                 | 503(+0.00%)  | 37310(+0.00%) | 13231(+0.00%)             | 1493(+0.00%) | 10148(+0.00%)  | 8080(+0.00%)              | 14298(+0.00%)                 | 7315(+0.00%)              | 8062(+0.00%)              | 19802(+0.00%) | 5483(+0.00%)     | 6042(+0.00%)       | 3411(+0.00%)               | 16736(+0.00%) | 410(+0.00%)  | 413(+0.00%)     | 3051(+0.00%)                 | 1810(+0.00%)                 | 2862(+0.00%)                 | 4089(+0.00%)                 | 499(+0.00%)               | 503(+0.00%)           | 330(+0.00%)   | 518(+0.00%)   | 388(+0.00%)       | 388(+0.00%)      | 484(+0.00%)          | 1906(+0.00%)       | 1906(+0.00%)         | 1238(+0.00%)   | 1238(+0.00%)     | 1667(+0.00%)       | 386(+0.00%)    | 183(+0.00%)    | 979(+0.00%)                                              | 962(+0.00%)                                  | 962(+0.00%)                                  | 962(+0.00%)                                                | 534(+0.00%)           | 990(+0.00%)           | 294(+0.00%)           | 526(+0.00%)           | 678(+0.00%)           | 1117(+0.00%)          | 366(+0.00%)           | 589(+0.00%)           | 224(+0.00%)      | 224(+0.00%)      | 5804(+0.00%)    | 7407(+0.00%) | 2017(+0.00%)         | 1971(+0.00%)     | 912(+0.00%)  | 2827(+0.00%) | 3755(+0.00%) | 483(+0.00%)  | 393(+0.00%)  | 1396(+0.00%)            | 1223(+0.00%)            | 461(+0.00%)         | 1507(+0.00%)        | 261(+0.00%)         | 786(+0.00%)         | 714(+0.00%)         | 1634(+0.00%)        | 386(+0.00%)         | 849(+0.00%)         | 1021(+0.00%)                | 256(+0.00%)                 | 2831(+0.00%)                  | 12762(+0.00%)                 | 15087(+0.00%)                     | 11587(+0.00%)                 | 360(+0.00%)          | 787(+0.00%)  | 543(+0.00%)  | 910(+0.00%)  | 910(+0.00%)  | 1060(+0.00%)                         | 1044(+0.00%)   | 1061(+0.00%)               | 372(+0.00%)  | 6370(+0.00%) | 9379(+0.00%)        | 9840(+0.00%)    | 8672(+0.00%)           | 8672(+0.00%)           | 40847(+0.00%)                        | 4297(+0.00%)                         | 4297(+0.00%) | 3696(+0.00%)          | 2536(+0.00%)          | 1133(+0.00%)      | 1676(+0.00%)      | 1323(+0.00%)      | 2504(+0.00%)      | 17856(+0.00%)    | 35861(+0.00%)    | 24036(+0.00%)    | 13669(+0.00%)              | 13669(+0.00%)              | 8168(+0.00%)               | 13659(+0.00%)              | 8164(+0.00%)               | 8152(+0.00%)               | 50062(+0.00%)                | 83475(+0.00%)            | 162(+0.00%)           | 137(+0.00%)           | 17856(+0.00%)    | 19512(+0.00%)    | 23790(+0.00%)    | 3361(+0.00%)              | 47492(+0.00%)    | 41024(+0.00%)    | 16926(+0.00%)    | 285(+0.00%)         | 588(+0.00%)  | 3546(+0.00%)      | 355(+0.00%)  | 355(+0.00%)  | 445(+0.00%)      | 273(+0.00%)      | 1351(+0.00%)            | 1351(+0.00%)            | 4055(+0.00%)   | 2615(+0.00%)   | 411(+0.00%)  | 96(+0.00%)   | 3016(+0.00%)  | 840(+0.00%)   | 765(+0.00%)      | 1365(+0.00%) | 1650(+0.00%)   | 29751(+0.00%) | 3767(+0.00%) | 21906(+0.00%) | 21906(+0.00%) | 168(+0.00%)        | 168(+0.00%)    | 2499(+0.00%)                | 2879(+0.00%) | 3823(+0.00%) | 4081(+0.00%)     | 2371(+0.00%)     | 1219(+0.00%)  | 171(+0.00%)   | 846(+0.00%)   | 122(+0.00%)   | 34471(+0.00%) | 305(+0.00%)   | 32520(+0.00%) | 271(+0.00%)   | 1773(+0.00%)            | 2340(+0.00%)                | 1138(+0.00%)            | 1122(+0.00%)                | 1781(+0.00%)            | 1677(+0.00%)                | 1790(+0.00%)            | 1686(+0.00%)                | 1794(+0.00%)            | 1794(+0.00%)                | 2603(+0.00%)            | 3184(+0.00%)                | 1134(+0.00%)            | 1070(+0.00%)                | 2614(+0.00%)            | 2614(+0.00%)                | 2624(+0.00%)            | 2468(+0.00%)                | 2620(+0.00%)            | 2464(+0.00%)                | 13249(+0.00%)             | 17036(+0.00%)         | 17036(+0.00%)         | 8090(+0.00%)              | 7690(+0.00%)   | 6275(+0.00%)                 | 5161(+0.00%)                 | 14790(+0.00%)    | 7370(+0.00%)               | 29334(+0.00%)    | 13227(+0.00%)             | 11625(+0.00%)                 | 2728(+0.00%) | 19126(+0.00%)    | 5485(+0.00%)     | 1465(+0.00%)    | 1857(+0.00%)                                                 | 1838(+0.00%)                                     | 1838(+0.00%)                                     | 11339(+0.00%)      | 1988(+0.00%) | 3789(+0.00%) | 3532(+0.00%)          | 4100(+0.00%)          | 37134(+0.00%) | 6139(+0.00%)    | 360(+0.00%)             | 3041(+0.00%) | 347(+0.00%)  | 447(+0.00%)  | 2604(+0.00%)      | 2211(+0.00%)   | 8919(+0.00%) | 24916(+0.00%)        | 2496(+0.00%)             | 8206(+0.00%) | 1270(+0.00%)           | 388(+0.00%)             | 986(+0.00%)        | 1310(+0.00%)  | 1310(+0.00%)             | 1259(+0.00%) | 506(+0.00%)  | 424(+0.00%)  | 424(+0.00%)        | 12052(+0.00%)                | 2972(+0.00%) | 282(+0.00%)  | 808(+0.00%)        | 798(+0.00%)   | 920(+0.00%)  | 10802(+0.00%)                | 710(+0.00%)             | 1016(+0.00%)                | 1016(+0.00%)                              | 4766(+0.00%)   | 1039(+0.00%)                       | 1039(+0.00%)                         | 1037(+0.00%)           | 1607(+0.00%)          | 997(+0.00%)     | 997(+0.00%)                   | 1016(+0.00%) | 5495(+0.00%)   | 4221(+0.00%)   | 1863(+0.00%)          | 1037(+0.00%) | 1756(+0.00%)          | 31087(+0.00%) | 11720(+0.00%)        | 829(+0.00%)        | 6310(+0.00%)          | 1261(+0.00%)       | 3274(+0.00%)            | 4688(+0.00%)            | 1332(+0.00%) | 130(+0.00%)    | 1188(+0.00%)  | 111(+0.00%)    | 529(+0.00%)            | 528(+0.00%)                          | 512(+0.00%)   | 4948(+0.00%)  | 363(+0.00%)        | 375(+0.00%)        | 1879(+0.00%)  | 626(+0.00%)          | 626(+0.00%)          | 1028(+0.00%)  | 1023(+0.00%)  | 1023(+0.00%)          | 1511(+0.00%)  | 1511(+0.00%)          | +0.00%       | 0.00       | +0.00%      | +0.00%      | +0.00%      | +0.00%      | +0.00%      | +0.00%      | +0.00%      | +0.00%      | +0.00%      |
| Expensive Post-ra loop scheduling      | 2028(+0.00%)                  | 2049(+0.00%)    | 2881(+0.00%)            | 4398(+0.00%)     | 345(+0.00%)   | 4003(+0.00%)     | 4008(+0.00%)                  | 1683(+0.00%)                  | 1683(+0.00%)     | 2722(+0.00%)              | 1844(+0.00%)              | 1893(+0.00%) | 283(+0.00%)    | 1013(+0.00%) | 4084(+0.00%)   | 782(+0.00%)     | 799(+0.00%)                 | 804(+0.00%)    | 820(+0.00%)                        | 820(+0.00%)                          | 821(+0.00%)                | 1147(+0.00%)     | 1147(+0.00%)     | 673(+0.00%)            | 199(+0.00%)       | 185(+0.00%)       | 5697(+0.00%)                         | 1473(+0.00%)           | 477(+0.00%)  | 1450(+0.00%)   | 1020(+0.00%) | 580(+0.00%)  | 1164(+0.00%)     | 635(+0.00%)     | 8325(+0.00%)              | 8364(+0.00%)              | 8395(+0.00%)              | 390(+0.00%)     | 4854(+0.00%) | 39538(+0.00%)           | 10636(+0.00%)           | 180(+0.00%)  | 1985(+0.00%)       | 4001(+0.00%)     | 4003(+0.00%)   | 7538(+0.00%)                 | 8256(+0.00%)                         | 2093(+0.00%)   | 2113(+0.00%)           | 8517(+0.00%)               | 8558(+0.00%)               | 8507(+0.00%)               | 763(+0.00%)  | 31380(+0.00%) | 166(+0.00%)  | 352(+0.00%)     | 4073(+0.00%)              | 4091(+0.00%)              | 2857(+0.00%) | 4087(+0.00%)              | 226(+0.00%)  | 55120(+0.00%) | 305(+0.00%)     | 4131(+0.00%)               | 4135(+0.00%)               | 4153(+0.00%)               | 8648(+0.00%)                 | 503(+0.00%)  | 37310(+0.00%) | 13231(+0.00%)             | 1493(+0.00%) | 10148(+0.00%)  | 8080(+0.00%)              | 14298(+0.00%)                 | 7315(+0.00%)              | 8062(+0.00%)              | 19802(+0.00%) | 5483(+0.00%)     | 6042(+0.00%)       | 3411(+0.00%)               | 16736(+0.00%) | 410(+0.00%)  | 413(+0.00%)     | 3051(+0.00%)                 | 1810(+0.00%)                 | 2862(+0.00%)                 | 4089(+0.00%)                 | 499(+0.00%)               | 503(+0.00%)           | 330(+0.00%)   | 518(+0.00%)   | 388(+0.00%)       | 388(+0.00%)      | 484(+0.00%)          | 1906(+0.00%)       | 1906(+0.00%)         | 1238(+0.00%)   | 1238(+0.00%)     | 1667(+0.00%)       | 386(+0.00%)    | 183(+0.00%)    | 979(+0.00%)                                              | 962(+0.00%)                                  | 962(+0.00%)                                  | 962(+0.00%)                                                | 534(+0.00%)           | 990(+0.00%)           | 294(+0.00%)           | 526(+0.00%)           | 678(+0.00%)           | 1117(+0.00%)          | 366(+0.00%)           | 589(+0.00%)           | 224(+0.00%)      | 224(+0.00%)      | 5804(+0.00%)    | 7407(+0.00%) | 2017(+0.00%)         | 1971(+0.00%)     | 912(+0.00%)  | 2827(+0.00%) | 3755(+0.00%) | 483(+0.00%)  | 393(+0.00%)  | 1396(+0.00%)            | 1223(+0.00%)            | 461(+0.00%)         | 1507(+0.00%)        | 261(+0.00%)         | 786(+0.00%)         | 714(+0.00%)         | 1634(+0.00%)        | 386(+0.00%)         | 849(+0.00%)         | 1021(+0.00%)                | 256(+0.00%)                 | 2831(+0.00%)                  | 12762(+0.00%)                 | 15087(+0.00%)                     | 11587(+0.00%)                 | 360(+0.00%)          | 787(+0.00%)  | 543(+0.00%)  | 910(+0.00%)  | 910(+0.00%)  | 1060(+0.00%)                         | 1044(+0.00%)   | 1061(+0.00%)               | 372(+0.00%)  | 6370(+0.00%) | 9379(+0.00%)        | 9840(+0.00%)    | 8672(+0.00%)           | 8672(+0.00%)           | 40847(+0.00%)                        | 4297(+0.00%)                         | 4297(+0.00%) | 3696(+0.00%)          | 2536(+0.00%)          | 1133(+0.00%)      | 1676(+0.00%)      | 1323(+0.00%)      | 2504(+0.00%)      | 17856(+0.00%)    | 35861(+0.00%)    | 24036(+0.00%)    | 13669(+0.00%)              | 13669(+0.00%)              | 8168(+0.00%)               | 13659(+0.00%)              | 8164(+0.00%)               | 8152(+0.00%)               | 50062(+0.00%)                | 83475(+0.00%)            | 162(+0.00%)           | 137(+0.00%)           | 17856(+0.00%)    | 19512(+0.00%)    | 23790(+0.00%)    | 3361(+0.00%)              | 47492(+0.00%)    | 41024(+0.00%)    | 16926(+0.00%)    | 285(+0.00%)         | 588(+0.00%)  | 3546(+0.00%)      | 355(+0.00%)  | 355(+0.00%)  | 445(+0.00%)      | 273(+0.00%)      | 1351(+0.00%)            | 1351(+0.00%)            | 4055(+0.00%)   | 2615(+0.00%)   | 411(+0.00%)  | 96(+0.00%)   | 3016(+0.00%)  | 840(+0.00%)   | 765(+0.00%)      | 1365(+0.00%) | 1650(+0.00%)   | 29751(+0.00%) | 3767(+0.00%) | 21906(+0.00%) | 21906(+0.00%) | 168(+0.00%)        | 168(+0.00%)    | 2499(+0.00%)                | 2879(+0.00%) | 3823(+0.00%) | 4081(+0.00%)     | 2371(+0.00%)     | 1219(+0.00%)  | 171(+0.00%)   | 846(+0.00%)   | 122(+0.00%)   | 34471(+0.00%) | 305(+0.00%)   | 32520(+0.00%) | 271(+0.00%)   | 1773(+0.00%)            | 2340(+0.00%)                | 1138(+0.00%)            | 1122(+0.00%)                | 1781(+0.00%)            | 1677(+0.00%)                | 1790(+0.00%)            | 1686(+0.00%)                | 1794(+0.00%)            | 1794(+0.00%)                | 2603(+0.00%)            | 3184(+0.00%)                | 1134(+0.00%)            | 1070(+0.00%)                | 2614(+0.00%)            | 2614(+0.00%)                | 2624(+0.00%)            | 2468(+0.00%)                | 2620(+0.00%)            | 2464(+0.00%)                | 13249(+0.00%)             | 17036(+0.00%)         | 17036(+0.00%)         | 8090(+0.00%)              | 7691(+0.01%)   | 6275(+0.00%)                 | 5161(+0.00%)                 | 14790(+0.00%)    | 7370(+0.00%)               | 29334(+0.00%)    | 13227(+0.00%)             | 11625(+0.00%)                 | 2720(-0.29%) | 19126(+0.00%)    | 5485(+0.00%)     | 1465(+0.00%)    | 1857(+0.00%)                                                 | 1838(+0.00%)                                     | 1838(+0.00%)                                     | 11304(-0.31%)      | 1994(+0.30%) | 3789(+0.00%) | 3388(-4.08%)          | 3932(-4.10%)          | 37134(+0.00%) | 6067(-1.17%)    | 360(+0.00%)             | 3041(+0.00%) | 347(+0.00%)  | 447(+0.00%)  | 2604(+0.00%)      | 2191(-0.90%)   | 8759(-1.79%) | 24436(-1.93%)        | 2448(-1.92%)             | 8206(+0.00%) | 1270(+0.00%)           | 388(+0.00%)             | 970(-1.62%)        | 1290(-1.53%)  | 1290(-1.53%)             | 1227(-2.54%) | 506(+0.00%)  | 424(+0.00%)  | 424(+0.00%)        | 12052(+0.00%)                | 2876(-3.23%) | 282(+0.00%)  | 808(+0.00%)        | 798(+0.00%)   | 900(-2.17%)  | 10802(+0.00%)                | 710(+0.00%)             | 1016(+0.00%)                | 1016(+0.00%)                              | 4574(-4.03%)   | 1039(+0.00%)                       | 1039(+0.00%)                         | 1037(+0.00%)           | 1607(+0.00%)          | 997(+0.00%)     | 997(+0.00%)                   | 1016(+0.00%) | 5239(-4.66%)   | 4029(-4.55%)   | 1863(+0.00%)          | 1037(+0.00%) | 1652(-5.92%)          | 29039(-6.59%) | 10952(-6.55%)        | 766(-7.60%)        | 5842(-7.42%)          | 1162(-7.85%)       | 3018(-7.82%)            | 4304(-8.19%)            | 1332(+0.00%) | 130(+0.00%)    | 1188(+0.00%)  | 111(+0.00%)    | 529(+0.00%)            | 528(+0.00%)                          | 512(+0.00%)   | 4948(+0.00%)  | 318(-12.40%)       | 328(-12.53%)       | 1879(+0.00%)  | 626(+0.00%)          | 626(+0.00%)          | 1028(+0.00%)  | 1023(+0.00%)  | 1023(+0.00%)          | 1511(+0.00%)  | 1511(+0.00%)          | -0.37%       | 1.57       | +0.00%      | +0.00%      | +0.00%      | +0.00%      | +0.00%      | +0.00%      | +0.00%      | +0.00%      | +0.00%      |
| Conservative pre-ra pressure reduction | 2540(+25.25%)                 | 2561(+24.99%)   | 3329(+15.55%)           | 4961(+12.80%)    | 388(+12.46%)  | 4387(+9.59%)     | 4392(+9.58%)                  | 1843(+9.51%)                  | 1843(+9.51%)     | 2913(+7.02%)              | 1969(+6.78%)              | 2021(+6.76%) | 302(+6.71%)    | 1077(+6.32%) | 4340(+6.27%)   | 830(+6.14%)     | 847(+6.01%)                 | 852(+5.97%)    | 868(+5.85%)                        | 868(+5.85%)                          | 869(+5.85%)                | 1208(+5.32%)     | 1208(+5.32%)     | 705(+4.75%)            | 208(+4.52%)       | 193(+4.32%)       | 5891(+3.41%)                         | 1523(+3.39%)           | 493(+3.35%)  | 1498(+3.31%)   | 1051(+3.04%) | 596(+2.76%)  | 1194(+2.58%)     | 650(+2.36%)     | 8513(+2.26%)              | 8552(+2.25%)              | 8583(+2.24%)              | 398(+2.05%)     | 4951(+2.00%) | 40278(+1.87%)           | 10826(+1.79%)           | 183(+1.67%)  | 2017(+1.61%)       | 4065(+1.60%)     | 4067(+1.60%)   | 7602(+0.85%)                 | 8320(+0.78%)                         | 2109(+0.76%)   | 2129(+0.76%)           | 8577(+0.70%)               | 8618(+0.70%)               | 8563(+0.66%)               | 768(+0.66%)  | 31584(+0.65%) | 167(+0.60%)  | 354(+0.57%)     | 4095(+0.54%)              | 4113(+0.54%)              | 2871(+0.49%) | 4107(+0.49%)              | 227(+0.44%)  | 55303(+0.33%) | 306(+0.33%)     | 4143(+0.29%)               | 4147(+0.29%)               | 4165(+0.29%)               | 8668(+0.23%)                 | 504(+0.20%)  | 37375(+0.17%) | 13251(+0.15%)             | 1495(+0.13%) | 10161(+0.13%)  | 8090(+0.12%)              | 14314(+0.11%)                 | 7323(+0.11%)              | 8068(+0.07%)              | 19814(+0.06%) | 5486(+0.05%)     | 6045(+0.05%)       | 3412(+0.03%)               | 16739(+0.02%) | 410(+0.00%)  | 413(+0.00%)     | 3051(+0.00%)                 | 1810(+0.00%)                 | 2862(+0.00%)                 | 4089(+0.00%)                 | 499(+0.00%)               | 503(+0.00%)           | 330(+0.00%)   | 518(+0.00%)   | 388(+0.00%)       | 388(+0.00%)      | 484(+0.00%)          | 1906(+0.00%)       | 1906(+0.00%)         | 1238(+0.00%)   | 1238(+0.00%)     | 1667(+0.00%)       | 386(+0.00%)    | 183(+0.00%)    | 979(+0.00%)                                              | 962(+0.00%)                                  | 962(+0.00%)                                  | 962(+0.00%)                                                | 534(+0.00%)           | 990(+0.00%)           | 294(+0.00%)           | 526(+0.00%)           | 678(+0.00%)           | 1117(+0.00%)          | 366(+0.00%)           | 589(+0.00%)           | 224(+0.00%)      | 224(+0.00%)      | 5804(+0.00%)    | 7407(+0.00%) | 2017(+0.00%)         | 1971(+0.00%)     | 912(+0.00%)  | 2827(+0.00%) | 3755(+0.00%) | 483(+0.00%)  | 393(+0.00%)  | 1396(+0.00%)            | 1223(+0.00%)            | 461(+0.00%)         | 1507(+0.00%)        | 261(+0.00%)         | 786(+0.00%)         | 714(+0.00%)         | 1634(+0.00%)        | 386(+0.00%)         | 849(+0.00%)         | 1021(+0.00%)                | 256(+0.00%)                 | 2831(+0.00%)                  | 12762(+0.00%)                 | 15087(+0.00%)                     | 11587(+0.00%)                 | 360(+0.00%)          | 787(+0.00%)  | 543(+0.00%)  | 910(+0.00%)  | 910(+0.00%)  | 1060(+0.00%)                         | 1044(+0.00%)   | 1061(+0.00%)               | 372(+0.00%)  | 6370(+0.00%) | 9379(+0.00%)        | 9840(+0.00%)    | 8672(+0.00%)           | 8672(+0.00%)           | 40847(+0.00%)                        | 4297(+0.00%)                         | 4297(+0.00%) | 3696(+0.00%)          | 2536(+0.00%)          | 1133(+0.00%)      | 1676(+0.00%)      | 1323(+0.00%)      | 2504(+0.00%)      | 17856(+0.00%)    | 35861(+0.00%)    | 24036(+0.00%)    | 13669(+0.00%)              | 13669(+0.00%)              | 8168(+0.00%)               | 13659(+0.00%)              | 8164(+0.00%)               | 8152(+0.00%)               | 50062(+0.00%)                | 83475(+0.00%)            | 162(+0.00%)           | 137(+0.00%)           | 17856(+0.00%)    | 19512(+0.00%)    | 23790(+0.00%)    | 3361(+0.00%)              | 47492(+0.00%)    | 41024(+0.00%)    | 16926(+0.00%)    | 285(+0.00%)         | 588(+0.00%)  | 3546(+0.00%)      | 355(+0.00%)  | 355(+0.00%)  | 445(+0.00%)      | 273(+0.00%)      | 1351(+0.00%)            | 1351(+0.00%)            | 4055(+0.00%)   | 2615(+0.00%)   | 411(+0.00%)  | 96(+0.00%)   | 3016(+0.00%)  | 840(+0.00%)   | 765(+0.00%)      | 1365(+0.00%) | 1650(+0.00%)   | 29751(+0.00%) | 3767(+0.00%) | 21906(+0.00%) | 21906(+0.00%) | 168(+0.00%)        | 168(+0.00%)    | 2499(+0.00%)                | 2879(+0.00%) | 3823(+0.00%) | 4081(+0.00%)     | 2371(+0.00%)     | 1219(+0.00%)  | 171(+0.00%)   | 846(+0.00%)   | 122(+0.00%)   | 34471(+0.00%) | 305(+0.00%)   | 32520(+0.00%) | 271(+0.00%)   | 1773(+0.00%)            | 2340(+0.00%)                | 1138(+0.00%)            | 1122(+0.00%)                | 1781(+0.00%)            | 1677(+0.00%)                | 1790(+0.00%)            | 1686(+0.00%)                | 1794(+0.00%)            | 1794(+0.00%)                | 2603(+0.00%)            | 3184(+0.00%)                | 1134(+0.00%)            | 1070(+0.00%)                | 2614(+0.00%)            | 2614(+0.00%)                | 2624(+0.00%)            | 2468(+0.00%)                | 2620(+0.00%)            | 2464(+0.00%)                | 13246(-0.02%)             | 17032(-0.02%)         | 17032(-0.02%)         | 8088(-0.02%)              | 7688(-0.04%)   | 6271(-0.06%)                 | 5157(-0.08%)                 | 14778(-0.08%)    | 7364(-0.08%)               | 29310(-0.08%)    | 13215(-0.09%)             | 11613(-0.10%)                 | 2725(+0.18%) | 19103(-0.12%)    | 5478(-0.13%)     | 1461(-0.27%)    | 1850(-0.38%)                                                 | 1831(-0.38%)                                     | 1831(-0.38%)                                     | 11291(-0.12%)      | 1979(-0.75%) | 3741(-1.27%) | 3484(+2.83%)          | 4044(+2.85%)          | 36622(-1.38%) | 6040(-0.45%)    | 354(-1.67%)             | 2990(-1.68%) | 341(-1.73%)  | 439(-1.79%)  | 2556(-1.84%)      | 2170(-0.96%)   | 8734(-0.29%) | 24388(-0.20%)        | 2439(-0.37%)             | 8001(-2.50%) | 1238(-2.52%)           | 378(-2.58%)             | 960(-1.03%)        | 1273(-1.32%)  | 1273(-1.32%)             | 1219(-0.65%) | 487(-3.75%)  | 408(-3.77%)  | 408(-3.77%)        | 11597(-3.78%)                | 2859(-0.59%) | 271(-3.90%)  | 776(-3.96%)        | 766(-4.01%)   | 883(-1.89%)  | 10365(-4.05%)                | 680(-4.23%)             | 970(-4.53%)                 | 970(-4.53%)                               | 4550(-0.52%)   | 991(-4.62%)                        | 991(-4.62%)                          | 989(-4.63%)            | 1531(-4.73%)          | 949(-4.81%)     | 949(-4.81%)                   | 966(-4.92%)  | 5218(-0.40%)   | 4007(-0.55%)   | 1762(-5.42%)          | 973(-6.17%)  | 1643(-0.54%)          | 29014(-0.09%) | 10918(-0.31%)        | 766(+0.00%)        | 5825(-0.29%)          | 1162(+0.00%)       | 3009(-0.30%)            | 4287(-0.39%)            | 1201(-9.83%) | 117(-10.00%)   | 1064(-10.44%) | 98(-11.71%)    | 466(-11.91%)           | 465(-11.93%)                         | 449(-12.30%)  | 4335(-12.39%) | 318(+0.00%)        | 328(+0.00%)        | 1625(-13.52%) | 500(-20.13%)         | 500(-20.13%)         | 730(-28.99%)  | 641(-37.34%)  | 641(-37.34%)          | 929(-38.52%)  | 929(-38.52%)          | -0.47%       | 5.99       | -3.77%      | -0.29%      | +0.00%      | +0.00%      | +0.00%      | +0.00%      | +0.00%      | +0.29%      | +2.80%      |
| Total diff                             | REGR(+25.25%)                 | REGR(+24.99%)   | REGR(+15.55%)           | REGR(+12.80%)    | REGR(+12.46%) | REGR(+9.59%)     | REGR(+9.58%)                  | REGR(+9.51%)                  | REGR(+9.51%)     | REGR(+7.02%)              | REGR(+6.78%)              | REGR(+6.76%) | REGR(+6.71%)   | REGR(+6.32%) | REGR(+6.27%)   | REGR(+6.14%)    | REGR(+6.01%)                | REGR(+5.97%)   | REGR(+5.85%)                       | REGR(+5.85%)                         | REGR(+5.85%)               | REGR(+5.32%)     | REGR(+5.32%)     | REGR(+4.75%)           | REGR(+4.52%)      | REGR(+4.32%)      | REGR(+3.41%)                         | REGR(+3.39%)           | REGR(+3.35%) | REGR(+3.31%)   | REGR(+3.04%) | REGR(+2.76%) | REGR(+2.58%)     | REGR(+2.36%)    | REGR(+2.26%)              | REGR(+2.25%)              | REGR(+2.24%)              | REGR(+2.05%)    | REGR(+2.00%) | REGR(+1.87%)            | REGR(+1.79%)            | REGR(+1.67%) | REGR(+1.61%)       | REGR(+1.60%)     | REGR(+1.60%)   | REGR(+0.85%)                 | REGR(+0.78%)                         | REGR(+0.76%)   | REGR(+0.76%)           | REGR(+0.70%)               | REGR(+0.70%)               | REGR(+0.66%)               | REGR(+0.66%) | REGR(+0.65%)  | REGR(+0.60%) | REGR(+0.57%)    | REGR(+0.54%)              | REGR(+0.54%)              | REGR(+0.49%) | REGR(+0.49%)              | REGR(+0.44%) | REGR(+0.33%)  | REGR(+0.33%)    | REGR(+0.29%)               | REGR(+0.29%)               | REGR(+0.29%)               | REGR(+0.23%)                 | REGR(+0.20%) | REGR(+0.17%)  | REGR(+0.15%)              | REGR(+0.13%) | REGR(+0.13%)   | REGR(+0.12%)              | REGR(+0.11%)                  | REGR(+0.11%)              | SAME(+0.07%)              | SAME(+0.06%)  | SAME(+0.05%)     | SAME(+0.05%)       | SAME(+0.03%)               | SAME(+0.02%)  | SAME(+0.00%) | SAME(+0.00%)    | SAME(+0.00%)                 | SAME(+0.00%)                 | SAME(+0.00%)                 | SAME(+0.00%)                 | SAME(+0.00%)              | SAME(+0.00%)          | SAME(+0.00%)  | SAME(+0.00%)  | SAME(+0.00%)      | SAME(+0.00%)     | SAME(+0.00%)         | SAME(+0.00%)       | SAME(+0.00%)         | SAME(+0.00%)   | SAME(+0.00%)     | SAME(+0.00%)       | SAME(+0.00%)   | SAME(+0.00%)   | SAME(+0.00%)                                             | SAME(+0.00%)                                 | SAME(+0.00%)                                 | SAME(+0.00%)                                               | SAME(+0.00%)          | SAME(+0.00%)          | SAME(+0.00%)          | SAME(+0.00%)          | SAME(+0.00%)          | SAME(+0.00%)          | SAME(+0.00%)          | SAME(+0.00%)          | SAME(+0.00%)     | SAME(+0.00%)     | SAME(+0.00%)    | SAME(+0.00%) | SAME(+0.00%)         | SAME(+0.00%)     | SAME(+0.00%) | SAME(+0.00%) | SAME(+0.00%) | SAME(+0.00%) | SAME(+0.00%) | SAME(+0.00%)            | SAME(+0.00%)            | SAME(+0.00%)        | SAME(+0.00%)        | SAME(+0.00%)        | SAME(+0.00%)        | SAME(+0.00%)        | SAME(+0.00%)        | SAME(+0.00%)        | SAME(+0.00%)        | SAME(+0.00%)                | SAME(+0.00%)                | SAME(+0.00%)                  | SAME(+0.00%)                  | SAME(+0.00%)                      | SAME(+0.00%)                  | SAME(+0.00%)         | SAME(+0.00%) | SAME(+0.00%) | SAME(+0.00%) | SAME(+0.00%) | SAME(+0.00%)                         | SAME(+0.00%)   | SAME(+0.00%)               | SAME(+0.00%) | SAME(+0.00%) | SAME(+0.00%)        | SAME(+0.00%)    | SAME(+0.00%)           | SAME(+0.00%)           | SAME(+0.00%)                         | SAME(+0.00%)                         | SAME(+0.00%) | SAME(+0.00%)          | SAME(+0.00%)          | SAME(+0.00%)      | SAME(+0.00%)      | SAME(+0.00%)      | SAME(+0.00%)      | SAME(+0.00%)     | SAME(+0.00%)     | SAME(+0.00%)     | SAME(+0.00%)               | SAME(+0.00%)               | SAME(+0.00%)               | SAME(+0.00%)               | SAME(+0.00%)               | SAME(+0.00%)               | SAME(+0.00%)                 | SAME(+0.00%)             | SAME(+0.00%)          | SAME(+0.00%)          | SAME(+0.00%)     | SAME(+0.00%)     | SAME(+0.00%)     | SAME(+0.00%)              | SAME(+0.00%)     | SAME(+0.00%)     | SAME(+0.00%)     | SAME(+0.00%)        | SAME(+0.00%) | SAME(+0.00%)      | SAME(+0.00%) | SAME(+0.00%) | SAME(+0.00%)     | SAME(+0.00%)     | SAME(+0.00%)            | SAME(+0.00%)            | SAME(+0.00%)   | SAME(+0.00%)   | SAME(+0.00%) | SAME(+0.00%) | SAME(+0.00%)  | SAME(+0.00%)  | SAME(+0.00%)     | SAME(+0.00%) | SAME(+0.00%)   | SAME(+0.00%)  | SAME(+0.00%) | SAME(+0.00%)  | SAME(+0.00%)  | SAME(+0.00%)       | SAME(+0.00%)   | SAME(+0.00%)                | SAME(+0.00%) | SAME(+0.00%) | SAME(+0.00%)     | SAME(+0.00%)     | SAME(+0.00%)  | SAME(+0.00%)  | SAME(+0.00%)  | SAME(+0.00%)  | SAME(+0.00%)  | SAME(+0.00%)  | SAME(+0.00%)  | SAME(+0.00%)  | SAME(+0.00%)            | SAME(+0.00%)                | SAME(+0.00%)            | SAME(+0.00%)                | SAME(+0.00%)            | SAME(+0.00%)                | SAME(+0.00%)            | SAME(+0.00%)                | SAME(+0.00%)            | SAME(+0.00%)                | SAME(+0.00%)            | SAME(+0.00%)                | SAME(+0.00%)            | SAME(+0.00%)                | SAME(+0.00%)            | SAME(+0.00%)                | SAME(+0.00%)            | SAME(+0.00%)                | SAME(+0.00%)            | SAME(+0.00%)                | SAME(-0.02%)              | SAME(-0.02%)          | SAME(-0.02%)          | SAME(-0.02%)              | SAME(-0.03%)   | SAME(-0.06%)                 | SAME(-0.08%)                 | SAME(-0.08%)     | SAME(-0.08%)               | SAME(-0.08%)     | SAME(-0.09%)              | IMPR(-0.10%)                  | IMPR(-0.11%) | IMPR(-0.12%)     | IMPR(-0.13%)     | IMPR(-0.27%)    | IMPR(-0.38%)                                                 | IMPR(-0.38%)                                     | IMPR(-0.38%)                                     | IMPR(-0.42%)       | IMPR(-0.45%) | IMPR(-1.27%) | IMPR(-1.36%)          | IMPR(-1.37%)          | IMPR(-1.38%)  | IMPR(-1.61%)    | IMPR(-1.67%)            | IMPR(-1.68%) | IMPR(-1.73%) | IMPR(-1.79%) | IMPR(-1.84%)      | IMPR(-1.85%)   | IMPR(-2.07%) | IMPR(-2.12%)         | IMPR(-2.28%)             | IMPR(-2.50%) | IMPR(-2.52%)           | IMPR(-2.58%)            | IMPR(-2.64%)       | IMPR(-2.82%)  | IMPR(-2.82%)             | IMPR(-3.18%) | IMPR(-3.75%) | IMPR(-3.77%) | IMPR(-3.77%)       | IMPR(-3.78%)                 | IMPR(-3.80%) | IMPR(-3.90%) | IMPR(-3.96%)       | IMPR(-4.01%)  | IMPR(-4.02%) | IMPR(-4.05%)                 | IMPR(-4.23%)            | IMPR(-4.53%)                | IMPR(-4.53%)                              | IMPR(-4.53%)   | IMPR(-4.62%)                       | IMPR(-4.62%)                         | IMPR(-4.63%)           | IMPR(-4.73%)          | IMPR(-4.81%)    | IMPR(-4.81%)                  | IMPR(-4.92%) | IMPR(-5.04%)   | IMPR(-5.07%)   | IMPR(-5.42%)          | IMPR(-6.17%) | IMPR(-6.44%)          | IMPR(-6.67%)  | IMPR(-6.84%)         | IMPR(-7.60%)       | IMPR(-7.69%)          | IMPR(-7.85%)       | IMPR(-8.09%)            | IMPR(-8.55%)            | IMPR(-9.83%) | IMPR(-10.00%)  | IMPR(-10.44%) | IMPR(-11.71%)  | IMPR(-11.91%)          | IMPR(-11.93%)                        | IMPR(-12.30%) | IMPR(-12.39%) | IMPR(-12.40%)      | IMPR(-12.53%)      | IMPR(-13.52%) | IMPR(-20.13%)        | IMPR(-20.13%)        | IMPR(-28.99%) | IMPR(-37.34%) | IMPR(-37.34%)         | IMPR(-38.52%) | IMPR(-38.52%)         | -0.84%       | 6.17       | -4.78%      | -1.67%      | +0.00%      | +0.00%      | +0.00%      | +0.00%      | +0.00%      | +0.24%      | +2.49%      |

@@ -169,7 +173,8 @@ bool InterBlockScheduling::leaveBlock() {
// If we are very unlucky, we may step both the latency margin and
// the resource margin to the max. Any more indicates failure to converge,
// and we abort to prevent an infinite loop.
if (BS.FixPoint.NumIters > 2 * HR->getConflictHorizon()) {
if (BS.FixPoint.NumIters >
2 * HR->getConflictHorizon() + MaxExpensiveIterations) {
Copy link
Collaborator

@martien-de-jong martien-de-jong Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we first do MaxExpensiveIterations, then fall back to the global safety margins. Reverse the two terms here, and perhaps make the comment more precise?

auto Res = BS.FixPoint.PerMILatencyMargin.try_emplace(MINeedsHigherCap, 0);
if (BS.FixPoint.NumIters > MaxExpensiveIterations) {
// Increase the latency margin per instruction, unless we already iterated
// more than MaxExpensiveIterations without converging.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think the comment should be on the outside of this if. And then perhaps also order the branches accordingly.

# We should see most of the VLDA.UPS instructions move down in the loop
# BB to reduce the reg pressure and avoid spills. They can later be moved back
# up by the post-RA scheduler. This should also make the 4 acc1024 COPY
# instructions coalesce-able.
Copy link
Collaborator

@martien-de-jong martien-de-jong Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have a [presched, RA] example that actually demonstrates reduced spilling?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really tried to manipulate that example, but can't get it to spill without writing absolutely ugly code. I'd leave it like this if that's fine to you. In a follow-up PR where I change the MachinePipeliner, i'll add an end-to-end IR test that shows Add2D getting nicely SW pipelined.

tryPressure(TryCand.RPDelta.CurrentMax, Cand.RPDelta.CurrentMax, TryCand,
Cand, RegMax, TRI, DAG->MF))
return TryCand.Reason != NoCand;
// Avoid increasing the max pressure of the entire region.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CHECK: isTrackingPressure() is trivially true here.

martien-de-jong
martien-de-jong previously approved these changes Aug 8, 2024
Copy link
Collaborator

@martien-de-jong martien-de-jong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Especially the obvious convergence of the per-instruction latency cap.

// more than MaxExpensiveIterations without converging.
BS.FixPoint.LatencyMargin++;
} else {
++Res.first->second;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps assert that we don't exceed MaxLatency. (or ConflictHorizon)

@@ -40,7 +40,7 @@ static cl::opt<bool>
cl::desc("Track reg pressure more accurately and "
"delay some instructions to avoid spills."));
static cl::opt<unsigned> NumCriticalFreeRegs(
"aie-premisched-near-critical-regs", cl::init(4),
"aie-premisched-near-critical-regs", cl::init(2),
Copy link
Collaborator Author

@gbossu gbossu Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I'm reducing the limit here, but then this get multiplied by the number of pressure units required by the reg class. E.g. the number of free units we try to maintain for W is 2, for X it is 4, and for Y it is 8.

gbossu added 2 commits August 8, 2024 15:21
We want to increase the safety margin for one instruciton at a time
here, instead of doing it for all instructions at once.
@@ -37,6 +37,10 @@ static cl::opt<bool> LoopEpilogueAnalysis(
"aie-loop-epilogue-analysis", cl::init(true),
cl::desc("[AIE] Perform Loop/Epilogue analysis with loop scheduling"));

static cl::opt<int> MaxExpensiveIterations(
"aie-loop-aware-expensive-iterations", cl::init(25),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a rationale behind this number?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and no. I feel anything over 50 is too much, and anything below 10 is not enough if we need to move a couple of instructions up by 2-3 cycles. So 25 felt like a good compromise. And this works well for loops with an II between 5 and 10 cycles, which is the territory of the PreRA pipeliner for us.

@@ -169,7 +173,8 @@ bool InterBlockScheduling::leaveBlock() {
// If we are very unlucky, we may step both the latency margin and
// the resource margin to the max. Any more indicates failure to converge,
// and we abort to prevent an infinite loop.
if (BS.FixPoint.NumIters > 2 * HR->getConflictHorizon()) {
if (BS.FixPoint.NumIters >
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering your change, does this error become more common without this increase?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh it never triggered, I just changed the condition to account for the extra iterations, otherwise we would fail thinking we are in an infinite loop.

PSetThresholds.clear();
for (unsigned PSet = 0, EndPSet = RegionMaxPressure.size(); PSet < EndPSet;
++PSet) {
unsigned MaxPressure = RegionMaxPressure[PSet];
Copy link
Collaborator

@andcarminati andcarminati Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const unsigned MaxPressure

andcarminati
andcarminati previously approved these changes Aug 8, 2024
Copy link
Collaborator

@andcarminati andcarminati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A nice piece of work!

In a follow-up commit, the premisched will re-order the instructions to
reduce the pressure and avoid spills during RA.
@gbossu gbossu dismissed stale reviews from andcarminati and martien-de-jong via 8d997c4 August 8, 2024 16:32
@gbossu gbossu force-pushed the gaetan.improve.scheds branch from 9a38355 to 8d997c4 Compare August 8, 2024 16:32
 - Reserve a certain number of registers, not regunits
 - Be extra careful when the region max pressure exceeds limits
@gbossu gbossu force-pushed the gaetan.improve.scheds branch from 8d997c4 to 71614b9 Compare August 9, 2024 07:19
@gbossu gbossu merged commit 9a7a198 into aie-public Aug 9, 2024
8 checks passed
@gbossu gbossu deleted the gaetan.improve.scheds branch August 9, 2024 09:01
@gbossu gbossu mentioned this pull request Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants