[AIEX] Scheduler improvements #147

gbossu · 2024-08-07T13:57:00Z

Pre-RA: More conservative scheduling when under high register pressure. This is very helpful for SW pipelining. I'll have another PR which makes the MachinePipeliner find more schedules, and this PR here helps us not spill

Post-RA: Change the loop-aware scheduling to have an "expensive convergence" mode, when we increase the latency safety margin per instruciton, instead of for all instructions.

QoR results below. Overall it's good. There are some regressions, but we'll get rid of them with less unrolling and more SWP.

| Core_Compute_Cycle_Count               | Erf_aie2_int8_0_ptr_interface | Erf_aie2_int8_0 | SigmoidTemplated_bf16_0 | Conv2D_DW_bf16_1 | Floor_aie2_0  | Hardswish_aie2_1 | HardswishAsHardsigmoid_aie2_1 | HardswishAsHardsigmoid_aie2_0 | Hardswish_aie2_0 | AvgPool2D_aie2_bfloat16_0 | AvgPool2D_aie2_bfloat16_1 | Requantize_0 | Clip_aie2_int8 | Requantize_1 | SiLU_aie2_bf16 | Sub_aie2_bf16_0 | SubBroadcasting_aie2_bf16_0 | AddBf16_aie2_0 | AddAttributeBroadcasting_aie2_bf16 | SubAttributeBroadcasting_aie2_bf16_0 | AddBroadcastingBf16_aie2_0 | Scale_Add_bf16_0 | Scale_Add_bf16_1 | FullyConnect_aie2_int8 | BitwiseNot_aie2_0 | LogicalNot_aie2_0 | DivAttributeBroadcasting_aie2_bf16_0 | DivBroadcasting_aie2_1 | Abs_int8_0   | ElemDiv_aie2_1 | Sign_bf16_0  | Neg_aie2_0   | Conv2D_DW_bf16_0 | Elu_aie2_int8_0 | ReduceSumAxis_1_aie2_int8 | ReduceSumAxis_2_aie2_int8 | ReduceSumAxis_4_aie2_int8 | ArgMin1d_bf16_0 | Log_bf16_0   | Conv2D_Transpose_AIE2_0 | Conv2D_Transpose_AIE2_1 | Sign_bf16_1  | BitShift_AIE2_int8 | SiLU_aie2_int8_1 | SiLU_aie2_int8 | LayerNormC8Part1_aie2_int8_0 | DivAttributeBroadcasting_aie2_int8_0 | ElemDiv_aie2_0 | DivBroadcasting_aie2_0 | ReduceMeanAxis_1_aie2_int8 | ReduceMeanAxis_2_aie2_int8 | ReduceMeanAxis_4_aie2_int8 | Conv2D_DW_1  | Conv2D_bf16_0 | Add2D_bf16_0 | ArgMax1d_bf16_0 | ReduceSumAxis_6_aie2_int8 | ReduceSumAxis_5_aie2_int8 | Conv2D_DW_0  | ReduceSumAxis_3_aie2_int8 | Add2D_bf16_1 | Conv2D_bf16_1 | ArgMin1d_int8_0 | ReduceMeanAxis_6_aie2_int8 | ReduceMeanAxis_3_aie2_int8 | ReduceMeanAxis_5_aie2_int8 | LayerNormC8Part1_aie2_bf16_0 | Softmax_1    | GEMM_int8_1   | ReduceSumAxis_2_aie2_bf16 | Exp_bf16_1   | Mish_aie2_int8 | ReduceSumAxis_3_aie2_bf16 | InstanceNormPart2_aie2_bf16_0 | ReduceSumAxis_7_aie2_bf16 | ReduceSumAxis_6_aie2_bf16 | LayerNorm_0   | Conv2D_11x11s4_1 | Mish_aie2_bfloat16 | ReduceMeanAxis_7_aie2_int8 | LayerNorm_1   | Abs_bf16_0   | ArgMax1d_int8_0 | AvgPool2dVariant_aie2_bf16_0 | AvgPool2dVariant_aie2_bf16_1 | AvgPool2dVariant_aie2_int8_0 | AvgPool2dVariant_aie2_int8_1 | BatchNorm1d_aie2_bfloat16 | BatchNorm1d_aie2_int8 | BatchNorm2D_0 | BatchNorm2D_1 | BitwiseAnd_int8_0 | BitwiseOr_int8_0 | BitwiseXor_aie2_int8 | Cast_aie2_bfloat16 | Cast_aie2_bfloat16_1 | Cast_aie2_int8 | Cast_aie2_int8_1 | Ceil_AIE2_bfloat16 | Ceil_AIE2_int8 | Clip_aie2_bf16 | CompareOpsBroadcasting_K_EQ_GE_GT_LE_LT_CMP_GE_int8_aie2 | CompareOps_K_EQ_GE_GT_LE_LT_CMP_EQ_int8_aie2 | CompareOps_K_EQ_GE_GT_LE_LT_CMP_GE_int8_aie2 | CompareOps_K_EQ_GE_GT_LE_LT_CMP_GE_int8_aie2_ptr_interface | DegroupG4_aie2_bf16_0 | DegroupG4_aie2_bf16_1 | DegroupG4_aie2_int8_0 | DegroupG4_aie2_int8_1 | DegroupG8_aie2_bf16_0 | DegroupG8_aie2_bf16_1 | DegroupG8_aie2_int8_0 | DegroupG8_aie2_int8_1 | EleMax_aie2_int8 | EleMin_aie2_int8 | Erf_aie2_bf16_0 | Exp_bf16_0   | Expand_aie2_bfloat16 | Expand_aie2_int8 | Floor_aie2_1 | GELU_0       | GELU_1       | GEMV_0       | GEMV_1       | GeluTemplated_aie2_bf16 | GeluTemplated_aie2_int8 | GroupG4_aie2_bf16_0 | GroupG4_aie2_bf16_1 | GroupG4_aie2_int8_0 | GroupG4_aie2_int8_1 | GroupG8_aie2_bf16_0 | GroupG8_aie2_bf16_1 | GroupG8_aie2_int8_0 | GroupG8_aie2_int8_1 | HardSigmoidTemplated_bf16_0 | HardSigmoidTemplated_int8_0 | InstanceNormPart1_aie2_bf16_0 | InstanceNormPart2_aie2_int8_0 | InterpolateLinear1D_AIE2_bfloat16 | InterpolateLinear1D_AIE2_int8 | LogicalXor_aie2_int8 | MaxPool2D_0  | MaxPool2D_1  | Mul2D_0      | Mul2D_1      | MulAttributeBroadcasting_aie2_bf16_0 | MulBf16_aie2_0 | MulBroadcastingBf16_aie2_0 | Neg_aie2_1   | Pad2D_bf16_0 | Pad3D_AIE2_bfloat16 | Pad3D_AIE2_int8 | PixelShuffle_aie2_bf16 | PixelShuffle_aie2_int8 | PowAttributeBroadcasting_aie2_bf16_0 | PowAttributeBroadcasting_aie2_int8_0 | Pow_int8_0   | Range_bfloat16_aie2_0 | Range_bfloat16_aie2_1 | Range_int8_aie2_0 | Range_int8_aie2_1 | Reciprocal_aie2_0 | Reciprocal_aie2_1 | ReduceMax_bf16_0 | ReduceMax_int8_0 | ReduceMax_int8_1 | ReduceMeanAxis_1_aie2_bf16 | ReduceMeanAxis_2_aie2_bf16 | ReduceMeanAxis_3_aie2_bf16 | ReduceMeanAxis_4_aie2_bf16 | ReduceMeanAxis_5_aie2_bf16 | ReduceMeanAxis_6_aie2_bf16 | ReduceMeanNoc8_AIE2_bfloat16 | ReduceMeanNoc8_AIE2_int8 | ReduceMin1D_aie2_bf16 | ReduceMin1D_aie2_int8 | ReduceMin_bf16_0 | ReduceMin_int8_0 | ReduceMin_int8_1 | ReduceSumAxis_7_aie2_int8 | ReduceSum_bf16_0 | ReduceSum_int8_0 | ReduceSum_int8_1 | Rescale_aie2_int8_0 | Round_aie2_0 | Rsqrt_aie2_bf16_0 | Scale_Add_0  | Scale_Add_1  | Select_aie2_bf16 | Select_aie2_int8 | SigmoidTemplated_int8_0 | SigmoidTemplated_int8_1 | Sigmoid_bf16_0 | Sigmoid_bf16_1 | Sign_int8_0  | Sign_int8_1  | Sin_aie2_bf16 | Sin_aie2_int8 | Slice_bfloat16_0 | Slice_int8_0 | Softmax_bf16_1 | Sqrt_bf16_0   | Sqrt_bf16_1  | Sqrt_int8_0   | Sqrt_int8_1   | Squeeze_bfloat16_0 | Squeeze_int8_0 | TanhTemplated_aie2_bfloat16 | Tanh_0       | Tanh_1       | Tile_aie2_bf16_0 | Tile_aie2_int8_1 | Topk1D_bf16_0 | Topk1D_bf16_1 | Topk1D_int8_0 | Topk1D_int8_1 | Topk2D_bf16_0 | Topk2D_bf16_1 | Topk2D_int8_0 | Topk2D_int8_1 | Transpose_aie2_bf16_021 | Transpose_aie2_bf16_021_pad | Transpose_aie2_bf16_102 | Transpose_aie2_bf16_102_pad | Transpose_aie2_bf16_120 | Transpose_aie2_bf16_120_pad | Transpose_aie2_bf16_201 | Transpose_aie2_bf16_201_pad | Transpose_aie2_bf16_210 | Transpose_aie2_bf16_210_pad | Transpose_aie2_int8_021 | Transpose_aie2_int8_021_pad | Transpose_aie2_int8_102 | Transpose_aie2_int8_102_pad | Transpose_aie2_int8_120 | Transpose_aie2_int8_120_pad | Transpose_aie2_int8_201 | Transpose_aie2_int8_201_pad | Transpose_aie2_int8_210 | Transpose_aie2_int8_210_pad | ReduceSumAxis_4_aie2_bf16 | PixelUnshuffle_bf16_0 | PixelUnshuffle_int8_0 | ReduceSumAxis_5_aie2_bf16 | Softmax_bf16_0 | Conv2D_Transpose_bf16_AIE2_1 | Conv2D_Transpose_bf16_AIE2_0 | ReduceMax_bf16_1 | ReduceMeanAxis_7_aie2_bf16 | ReduceMin_bf16_1 | ReduceSumAxis_1_aie2_bf16 | InstanceNormPart1_aie2_int8_0 | Conv2D_1     | ReduceSum_bf16_1 | Conv2D_11x11s4_0 | Elu_aie2_bf16_0 | CompareOpsBroadcasting_K_EQ_GE_GT_LE_LT_CMP_GE_bfloat16_aie2 | CompareOps_K_EQ_GE_GT_LE_LT_CMP_EQ_bfloat16_aie2 | CompareOps_K_EQ_GE_GT_LE_LT_CMP_GE_bfloat16_aie2 | Conv2D_ReLU_int8_0 | Conv2D_2x8_0 | GEMM_bf16_0  | Conv1D_DW_AIE2_bf16_0 | Conv1D_DW_AIE2_bf16_1 | Pow_bf16_0    | DilatedConv2D_1 | TanhTemplated_aie2_int8 | GEMM_int8_0  | Tanh_int8_0  | Tanh_int8_1  | Rsqrt_aie2_int8_0 | Conv2D_LReLU_0 | Conv2D_0     | Conv2D_mixed_batch_1 | Conv2D_ReLU_Standalone_1 | GEMM_bf16_1  | FullyConnect_aie2_bf16 | BilinearInterpolation_1 | Conv2D_ReLU_int8_1 | Conv2D_ReLU_0 | Conv2D_ReLU_Standalone_0 | Conv2D_FC_1  | Mul2d_bf16_0 | Add2D_0      | Add2D_Standalone_0 | LayerNormC8Part2_aie2_bf16_0 | Conv2D_FC_0  | Mul2d_bf16_1 | Add2D_Standalone_1 | Shrink_aie2_1 | Conv2D_SV60  | LayerNormC8Part2_aie2_int8_0 | BilinearInterpolation_0 | SubBroadcasting_aie2_int8_0 | SubBroadcasting_aie2_int8_0_ptr_interface | Group_Conv2D_1 | AddAttributeBroadcasting_aie2_int8 | SubAttributeBroadcasting_aie2_int8_0 | AddBroadcasting_aie2_0 | Conv1D_DW_AIE2_int8_0 | Sub_aie2_int8_0 | Sub_aie2_int8_0_ptr_interface | Add_aie2_0   | Conv2D_LReLU_1 | Group_Conv2D_0 | Conv1D_DW_AIE2_int8_1 | int8         | Conv2D_7x7s2_Layer1_1 | Conv2D_ReLU_1 | Conv2D_mixed_batch_0 | HardSigmoid_bf16_1 | Conv2D_7x7s2_Layer1_0 | HardSigmoid_bf16_0 | Conv2D_11x11s4_Layer1_1 | Conv2D_11x11s4_Layer1_0 | Round_aie2_1 | Sigmoid_int8_1 | bfloat16      | Sigmoid_int8_0 | MulBroadcasting_aie2_0 | MulAttributeBroadcasting_aie2_int8_0 | Mul_aie2_0    | Conv2D_2x8_1  | HardSigmoid_int8_0 | HardSigmoid_int8_1 | Log_int8_0    | EleMax_aie2_bfloat16 | EleMin_aie2_bfloat16 | Shrink_aie2_0 | AvgPool2D_1   | AvgPool2D_aie2_int8_1 | AvgPool2D_0   | AvgPool2D_aie2_int8_0 | Averege diff | Diff stdev | Quantile #1 | Quantile #2 | Quantile #3 | Quantile #4 | Quantile #5 | Quantile #6 | Quantile #7 | Quantile #8 | Quantile #9 |
| -------------------------------------- | ----------------------------- | --------------- | ----------------------- | ---------------- | ------------- | ---------------- | ----------------------------- | ----------------------------- | ---------------- | ------------------------- | ------------------------- | ------------ | -------------- | ------------ | -------------- | --------------- | --------------------------- | -------------- | ---------------------------------- | ------------------------------------ | -------------------------- | ---------------- | ---------------- | ---------------------- | ----------------- | ----------------- | ------------------------------------ | ---------------------- | ------------ | -------------- | ------------ | ------------ | ---------------- | --------------- | ------------------------- | ------------------------- | ------------------------- | --------------- | ------------ | ----------------------- | ----------------------- | ------------ | ------------------ | ---------------- | -------------- | ---------------------------- | ------------------------------------ | -------------- | ---------------------- | -------------------------- | -------------------------- | -------------------------- | ------------ | ------------- | ------------ | --------------- | ------------------------- | ------------------------- | ------------ | ------------------------- | ------------ | ------------- | --------------- | -------------------------- | -------------------------- | -------------------------- | ---------------------------- | ------------ | ------------- | ------------------------- | ------------ | -------------- | ------------------------- | ----------------------------- | ------------------------- | ------------------------- | ------------- | ---------------- | ------------------ | -------------------------- | ------------- | ------------ | --------------- | ---------------------------- | ---------------------------- | ---------------------------- | ---------------------------- | ------------------------- | --------------------- | ------------- | ------------- | ----------------- | ---------------- | -------------------- | ------------------ | -------------------- | -------------- | ---------------- | ------------------ | -------------- | -------------- | -------------------------------------------------------- | -------------------------------------------- | -------------------------------------------- | ---------------------------------------------------------- | --------------------- | --------------------- | --------------------- | --------------------- | --------------------- | --------------------- | --------------------- | --------------------- | ---------------- | ---------------- | --------------- | ------------ | -------------------- | ---------------- | ------------ | ------------ | ------------ | ------------ | ------------ | ----------------------- | ----------------------- | ------------------- | ------------------- | ------------------- | ------------------- | ------------------- | ------------------- | ------------------- | ------------------- | --------------------------- | --------------------------- | ----------------------------- | ----------------------------- | --------------------------------- | ----------------------------- | -------------------- | ------------ | ------------ | ------------ | ------------ | ------------------------------------ | -------------- | -------------------------- | ------------ | ------------ | ------------------- | --------------- | ---------------------- | ---------------------- | ------------------------------------ | ------------------------------------ | ------------ | --------------------- | --------------------- | ----------------- | ----------------- | ----------------- | ----------------- | ---------------- | ---------------- | ---------------- | -------------------------- | -------------------------- | -------------------------- | -------------------------- | -------------------------- | -------------------------- | ---------------------------- | ------------------------ | --------------------- | --------------------- | ---------------- | ---------------- | ---------------- | ------------------------- | ---------------- | ---------------- | ---------------- | ------------------- | ------------ | ----------------- | ------------ | ------------ | ---------------- | ---------------- | ----------------------- | ----------------------- | -------------- | -------------- | ------------ | ------------ | ------------- | ------------- | ---------------- | ------------ | -------------- | ------------- | ------------ | ------------- | ------------- | ------------------ | -------------- | --------------------------- | ------------ | ------------ | ---------------- | ---------------- | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | ----------------------- | --------------------------- | ----------------------- | --------------------------- | ----------------------- | --------------------------- | ----------------------- | --------------------------- | ----------------------- | --------------------------- | ----------------------- | --------------------------- | ----------------------- | --------------------------- | ----------------------- | --------------------------- | ----------------------- | --------------------------- | ----------------------- | --------------------------- | ------------------------- | --------------------- | --------------------- | ------------------------- | -------------- | ---------------------------- | ---------------------------- | ---------------- | -------------------------- | ---------------- | ------------------------- | ----------------------------- | ------------ | ---------------- | ---------------- | --------------- | ------------------------------------------------------------ | ------------------------------------------------ | ------------------------------------------------ | ------------------ | ------------ | ------------ | --------------------- | --------------------- | ------------- | --------------- | ----------------------- | ------------ | ------------ | ------------ | ----------------- | -------------- | ------------ | -------------------- | ------------------------ | ------------ | ---------------------- | ----------------------- | ------------------ | ------------- | ------------------------ | ------------ | ------------ | ------------ | ------------------ | ---------------------------- | ------------ | ------------ | ------------------ | ------------- | ------------ | ---------------------------- | ----------------------- | --------------------------- | ----------------------------------------- | -------------- | ---------------------------------- | ------------------------------------ | ---------------------- | --------------------- | --------------- | ----------------------------- | ------------ | -------------- | -------------- | --------------------- | ------------ | --------------------- | ------------- | -------------------- | ------------------ | --------------------- | ------------------ | ----------------------- | ----------------------- | ------------ | -------------- | ------------- | -------------- | ---------------------- | ------------------------------------ | ------------- | ------------- | ------------------ | ------------------ | ------------- | -------------------- | -------------------- | ------------- | ------------- | --------------------- | ------------- | --------------------- | ------------ | ---------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- |
| Baseline                               | 2028(+0.00%)                  | 2049(+0.00%)    | 2881(+0.00%)            | 4398(+0.00%)     | 345(+0.00%)   | 4003(+0.00%)     | 4008(+0.00%)                  | 1683(+0.00%)                  | 1683(+0.00%)     | 2722(+0.00%)              | 1844(+0.00%)              | 1893(+0.00%) | 283(+0.00%)    | 1013(+0.00%) | 4084(+0.00%)   | 782(+0.00%)     | 799(+0.00%)                 | 804(+0.00%)    | 820(+0.00%)                        | 820(+0.00%)                          | 821(+0.00%)                | 1147(+0.00%)     | 1147(+0.00%)     | 673(+0.00%)            | 199(+0.00%)       | 185(+0.00%)       | 5697(+0.00%)                         | 1473(+0.00%)           | 477(+0.00%)  | 1450(+0.00%)   | 1020(+0.00%) | 580(+0.00%)  | 1164(+0.00%)     | 635(+0.00%)     | 8325(+0.00%)              | 8364(+0.00%)              | 8395(+0.00%)              | 390(+0.00%)     | 4854(+0.00%) | 39538(+0.00%)           | 10636(+0.00%)           | 180(+0.00%)  | 1985(+0.00%)       | 4001(+0.00%)     | 4003(+0.00%)   | 7538(+0.00%)                 | 8256(+0.00%)                         | 2093(+0.00%)   | 2113(+0.00%)           | 8517(+0.00%)               | 8558(+0.00%)               | 8507(+0.00%)               | 763(+0.00%)  | 31380(+0.00%) | 166(+0.00%)  | 352(+0.00%)     | 4073(+0.00%)              | 4091(+0.00%)              | 2857(+0.00%) | 4087(+0.00%)              | 226(+0.00%)  | 55120(+0.00%) | 305(+0.00%)     | 4131(+0.00%)               | 4135(+0.00%)               | 4153(+0.00%)               | 8648(+0.00%)                 | 503(+0.00%)  | 37310(+0.00%) | 13231(+0.00%)             | 1493(+0.00%) | 10148(+0.00%)  | 8080(+0.00%)              | 14298(+0.00%)                 | 7315(+0.00%)              | 8062(+0.00%)              | 19802(+0.00%) | 5483(+0.00%)     | 6042(+0.00%)       | 3411(+0.00%)               | 16736(+0.00%) | 410(+0.00%)  | 413(+0.00%)     | 3051(+0.00%)                 | 1810(+0.00%)                 | 2862(+0.00%)                 | 4089(+0.00%)                 | 499(+0.00%)               | 503(+0.00%)           | 330(+0.00%)   | 518(+0.00%)   | 388(+0.00%)       | 388(+0.00%)      | 484(+0.00%)          | 1906(+0.00%)       | 1906(+0.00%)         | 1238(+0.00%)   | 1238(+0.00%)     | 1667(+0.00%)       | 386(+0.00%)    | 183(+0.00%)    | 979(+0.00%)                                              | 962(+0.00%)                                  | 962(+0.00%)                                  | 962(+0.00%)                                                | 534(+0.00%)           | 990(+0.00%)           | 294(+0.00%)           | 526(+0.00%)           | 678(+0.00%)           | 1117(+0.00%)          | 366(+0.00%)           | 589(+0.00%)           | 224(+0.00%)      | 224(+0.00%)      | 5804(+0.00%)    | 7407(+0.00%) | 2017(+0.00%)         | 1971(+0.00%)     | 912(+0.00%)  | 2827(+0.00%) | 3755(+0.00%) | 483(+0.00%)  | 393(+0.00%)  | 1396(+0.00%)            | 1223(+0.00%)            | 461(+0.00%)         | 1507(+0.00%)        | 261(+0.00%)         | 786(+0.00%)         | 714(+0.00%)         | 1634(+0.00%)        | 386(+0.00%)         | 849(+0.00%)         | 1021(+0.00%)                | 256(+0.00%)                 | 2831(+0.00%)                  | 12762(+0.00%)                 | 15087(+0.00%)                     | 11587(+0.00%)                 | 360(+0.00%)          | 787(+0.00%)  | 543(+0.00%)  | 910(+0.00%)  | 910(+0.00%)  | 1060(+0.00%)                         | 1044(+0.00%)   | 1061(+0.00%)               | 372(+0.00%)  | 6370(+0.00%) | 9379(+0.00%)        | 9840(+0.00%)    | 8672(+0.00%)           | 8672(+0.00%)           | 40847(+0.00%)                        | 4297(+0.00%)                         | 4297(+0.00%) | 3696(+0.00%)          | 2536(+0.00%)          | 1133(+0.00%)      | 1676(+0.00%)      | 1323(+0.00%)      | 2504(+0.00%)      | 17856(+0.00%)    | 35861(+0.00%)    | 24036(+0.00%)    | 13669(+0.00%)              | 13669(+0.00%)              | 8168(+0.00%)               | 13659(+0.00%)              | 8164(+0.00%)               | 8152(+0.00%)               | 50062(+0.00%)                | 83475(+0.00%)            | 162(+0.00%)           | 137(+0.00%)           | 17856(+0.00%)    | 19512(+0.00%)    | 23790(+0.00%)    | 3361(+0.00%)              | 47492(+0.00%)    | 41024(+0.00%)    | 16926(+0.00%)    | 285(+0.00%)         | 588(+0.00%)  | 3546(+0.00%)      | 355(+0.00%)  | 355(+0.00%)  | 445(+0.00%)      | 273(+0.00%)      | 1351(+0.00%)            | 1351(+0.00%)            | 4055(+0.00%)   | 2615(+0.00%)   | 411(+0.00%)  | 96(+0.00%)   | 3016(+0.00%)  | 840(+0.00%)   | 765(+0.00%)      | 1365(+0.00%) | 1650(+0.00%)   | 29751(+0.00%) | 3767(+0.00%) | 21906(+0.00%) | 21906(+0.00%) | 168(+0.00%)        | 168(+0.00%)    | 2499(+0.00%)                | 2879(+0.00%) | 3823(+0.00%) | 4081(+0.00%)     | 2371(+0.00%)     | 1219(+0.00%)  | 171(+0.00%)   | 846(+0.00%)   | 122(+0.00%)   | 34471(+0.00%) | 305(+0.00%)   | 32520(+0.00%) | 271(+0.00%)   | 1773(+0.00%)            | 2340(+0.00%)                | 1138(+0.00%)            | 1122(+0.00%)                | 1781(+0.00%)            | 1677(+0.00%)                | 1790(+0.00%)            | 1686(+0.00%)                | 1794(+0.00%)            | 1794(+0.00%)                | 2603(+0.00%)            | 3184(+0.00%)                | 1134(+0.00%)            | 1070(+0.00%)                | 2614(+0.00%)            | 2614(+0.00%)                | 2624(+0.00%)            | 2468(+0.00%)                | 2620(+0.00%)            | 2464(+0.00%)                | 13249(+0.00%)             | 17036(+0.00%)         | 17036(+0.00%)         | 8090(+0.00%)              | 7690(+0.00%)   | 6275(+0.00%)                 | 5161(+0.00%)                 | 14790(+0.00%)    | 7370(+0.00%)               | 29334(+0.00%)    | 13227(+0.00%)             | 11625(+0.00%)                 | 2728(+0.00%) | 19126(+0.00%)    | 5485(+0.00%)     | 1465(+0.00%)    | 1857(+0.00%)                                                 | 1838(+0.00%)                                     | 1838(+0.00%)                                     | 11339(+0.00%)      | 1988(+0.00%) | 3789(+0.00%) | 3532(+0.00%)          | 4100(+0.00%)          | 37134(+0.00%) | 6139(+0.00%)    | 360(+0.00%)             | 3041(+0.00%) | 347(+0.00%)  | 447(+0.00%)  | 2604(+0.00%)      | 2211(+0.00%)   | 8919(+0.00%) | 24916(+0.00%)        | 2496(+0.00%)             | 8206(+0.00%) | 1270(+0.00%)           | 388(+0.00%)             | 986(+0.00%)        | 1310(+0.00%)  | 1310(+0.00%)             | 1259(+0.00%) | 506(+0.00%)  | 424(+0.00%)  | 424(+0.00%)        | 12052(+0.00%)                | 2972(+0.00%) | 282(+0.00%)  | 808(+0.00%)        | 798(+0.00%)   | 920(+0.00%)  | 10802(+0.00%)                | 710(+0.00%)             | 1016(+0.00%)                | 1016(+0.00%)                              | 4766(+0.00%)   | 1039(+0.00%)                       | 1039(+0.00%)                         | 1037(+0.00%)           | 1607(+0.00%)          | 997(+0.00%)     | 997(+0.00%)                   | 1016(+0.00%) | 5495(+0.00%)   | 4221(+0.00%)   | 1863(+0.00%)          | 1037(+0.00%) | 1756(+0.00%)          | 31087(+0.00%) | 11720(+0.00%)        | 829(+0.00%)        | 6310(+0.00%)          | 1261(+0.00%)       | 3274(+0.00%)            | 4688(+0.00%)            | 1332(+0.00%) | 130(+0.00%)    | 1188(+0.00%)  | 111(+0.00%)    | 529(+0.00%)            | 528(+0.00%)                          | 512(+0.00%)   | 4948(+0.00%)  | 363(+0.00%)        | 375(+0.00%)        | 1879(+0.00%)  | 626(+0.00%)          | 626(+0.00%)          | 1028(+0.00%)  | 1023(+0.00%)  | 1023(+0.00%)          | 1511(+0.00%)  | 1511(+0.00%)          | +0.00%       | 0.00       | +0.00%      | +0.00%      | +0.00%      | +0.00%      | +0.00%      | +0.00%      | +0.00%      | +0.00%      | +0.00%      |
| Expensive Post-ra loop scheduling      | 2028(+0.00%)                  | 2049(+0.00%)    | 2881(+0.00%)            | 4398(+0.00%)     | 345(+0.00%)   | 4003(+0.00%)     | 4008(+0.00%)                  | 1683(+0.00%)                  | 1683(+0.00%)     | 2722(+0.00%)              | 1844(+0.00%)              | 1893(+0.00%) | 283(+0.00%)    | 1013(+0.00%) | 4084(+0.00%)   | 782(+0.00%)     | 799(+0.00%)                 | 804(+0.00%)    | 820(+0.00%)                        | 820(+0.00%)                          | 821(+0.00%)                | 1147(+0.00%)     | 1147(+0.00%)     | 673(+0.00%)            | 199(+0.00%)       | 185(+0.00%)       | 5697(+0.00%)                         | 1473(+0.00%)           | 477(+0.00%)  | 1450(+0.00%)   | 1020(+0.00%) | 580(+0.00%)  | 1164(+0.00%)     | 635(+0.00%)     | 8325(+0.00%)              | 8364(+0.00%)              | 8395(+0.00%)              | 390(+0.00%)     | 4854(+0.00%) | 39538(+0.00%)           | 10636(+0.00%)           | 180(+0.00%)  | 1985(+0.00%)       | 4001(+0.00%)     | 4003(+0.00%)   | 7538(+0.00%)                 | 8256(+0.00%)                         | 2093(+0.00%)   | 2113(+0.00%)           | 8517(+0.00%)               | 8558(+0.00%)               | 8507(+0.00%)               | 763(+0.00%)  | 31380(+0.00%) | 166(+0.00%)  | 352(+0.00%)     | 4073(+0.00%)              | 4091(+0.00%)              | 2857(+0.00%) | 4087(+0.00%)              | 226(+0.00%)  | 55120(+0.00%) | 305(+0.00%)     | 4131(+0.00%)               | 4135(+0.00%)               | 4153(+0.00%)               | 8648(+0.00%)                 | 503(+0.00%)  | 37310(+0.00%) | 13231(+0.00%)             | 1493(+0.00%) | 10148(+0.00%)  | 8080(+0.00%)              | 14298(+0.00%)                 | 7315(+0.00%)              | 8062(+0.00%)              | 19802(+0.00%) | 5483(+0.00%)     | 6042(+0.00%)       | 3411(+0.00%)               | 16736(+0.00%) | 410(+0.00%)  | 413(+0.00%)     | 3051(+0.00%)                 | 1810(+0.00%)                 | 2862(+0.00%)                 | 4089(+0.00%)                 | 499(+0.00%)               | 503(+0.00%)           | 330(+0.00%)   | 518(+0.00%)   | 388(+0.00%)       | 388(+0.00%)      | 484(+0.00%)          | 1906(+0.00%)       | 1906(+0.00%)         | 1238(+0.00%)   | 1238(+0.00%)     | 1667(+0.00%)       | 386(+0.00%)    | 183(+0.00%)    | 979(+0.00%)                                              | 962(+0.00%)                                  | 962(+0.00%)                                  | 962(+0.00%)                                                | 534(+0.00%)           | 990(+0.00%)           | 294(+0.00%)           | 526(+0.00%)           | 678(+0.00%)           | 1117(+0.00%)          | 366(+0.00%)           | 589(+0.00%)           | 224(+0.00%)      | 224(+0.00%)      | 5804(+0.00%)    | 7407(+0.00%) | 2017(+0.00%)         | 1971(+0.00%)     | 912(+0.00%)  | 2827(+0.00%) | 3755(+0.00%) | 483(+0.00%)  | 393(+0.00%)  | 1396(+0.00%)            | 1223(+0.00%)            | 461(+0.00%)         | 1507(+0.00%)        | 261(+0.00%)         | 786(+0.00%)         | 714(+0.00%)         | 1634(+0.00%)        | 386(+0.00%)         | 849(+0.00%)         | 1021(+0.00%)                | 256(+0.00%)                 | 2831(+0.00%)                  | 12762(+0.00%)                 | 15087(+0.00%)                     | 11587(+0.00%)                 | 360(+0.00%)          | 787(+0.00%)  | 543(+0.00%)  | 910(+0.00%)  | 910(+0.00%)  | 1060(+0.00%)                         | 1044(+0.00%)   | 1061(+0.00%)               | 372(+0.00%)  | 6370(+0.00%) | 9379(+0.00%)        | 9840(+0.00%)    | 8672(+0.00%)           | 8672(+0.00%)           | 40847(+0.00%)                        | 4297(+0.00%)                         | 4297(+0.00%) | 3696(+0.00%)          | 2536(+0.00%)          | 1133(+0.00%)      | 1676(+0.00%)      | 1323(+0.00%)      | 2504(+0.00%)      | 17856(+0.00%)    | 35861(+0.00%)    | 24036(+0.00%)    | 13669(+0.00%)              | 13669(+0.00%)              | 8168(+0.00%)               | 13659(+0.00%)              | 8164(+0.00%)               | 8152(+0.00%)               | 50062(+0.00%)                | 83475(+0.00%)            | 162(+0.00%)           | 137(+0.00%)           | 17856(+0.00%)    | 19512(+0.00%)    | 23790(+0.00%)    | 3361(+0.00%)              | 47492(+0.00%)    | 41024(+0.00%)    | 16926(+0.00%)    | 285(+0.00%)         | 588(+0.00%)  | 3546(+0.00%)      | 355(+0.00%)  | 355(+0.00%)  | 445(+0.00%)      | 273(+0.00%)      | 1351(+0.00%)            | 1351(+0.00%)            | 4055(+0.00%)   | 2615(+0.00%)   | 411(+0.00%)  | 96(+0.00%)   | 3016(+0.00%)  | 840(+0.00%)   | 765(+0.00%)      | 1365(+0.00%) | 1650(+0.00%)   | 29751(+0.00%) | 3767(+0.00%) | 21906(+0.00%) | 21906(+0.00%) | 168(+0.00%)        | 168(+0.00%)    | 2499(+0.00%)                | 2879(+0.00%) | 3823(+0.00%) | 4081(+0.00%)     | 2371(+0.00%)     | 1219(+0.00%)  | 171(+0.00%)   | 846(+0.00%)   | 122(+0.00%)   | 34471(+0.00%) | 305(+0.00%)   | 32520(+0.00%) | 271(+0.00%)   | 1773(+0.00%)            | 2340(+0.00%)                | 1138(+0.00%)            | 1122(+0.00%)                | 1781(+0.00%)            | 1677(+0.00%)                | 1790(+0.00%)            | 1686(+0.00%)                | 1794(+0.00%)            | 1794(+0.00%)                | 2603(+0.00%)            | 3184(+0.00%)                | 1134(+0.00%)            | 1070(+0.00%)                | 2614(+0.00%)            | 2614(+0.00%)                | 2624(+0.00%)            | 2468(+0.00%)                | 2620(+0.00%)            | 2464(+0.00%)                | 13249(+0.00%)             | 17036(+0.00%)         | 17036(+0.00%)         | 8090(+0.00%)              | 7691(+0.01%)   | 6275(+0.00%)                 | 5161(+0.00%)                 | 14790(+0.00%)    | 7370(+0.00%)               | 29334(+0.00%)    | 13227(+0.00%)             | 11625(+0.00%)                 | 2720(-0.29%) | 19126(+0.00%)    | 5485(+0.00%)     | 1465(+0.00%)    | 1857(+0.00%)                                                 | 1838(+0.00%)                                     | 1838(+0.00%)                                     | 11304(-0.31%)      | 1994(+0.30%) | 3789(+0.00%) | 3388(-4.08%)          | 3932(-4.10%)          | 37134(+0.00%) | 6067(-1.17%)    | 360(+0.00%)             | 3041(+0.00%) | 347(+0.00%)  | 447(+0.00%)  | 2604(+0.00%)      | 2191(-0.90%)   | 8759(-1.79%) | 24436(-1.93%)        | 2448(-1.92%)             | 8206(+0.00%) | 1270(+0.00%)           | 388(+0.00%)             | 970(-1.62%)        | 1290(-1.53%)  | 1290(-1.53%)             | 1227(-2.54%) | 506(+0.00%)  | 424(+0.00%)  | 424(+0.00%)        | 12052(+0.00%)                | 2876(-3.23%) | 282(+0.00%)  | 808(+0.00%)        | 798(+0.00%)   | 900(-2.17%)  | 10802(+0.00%)                | 710(+0.00%)             | 1016(+0.00%)                | 1016(+0.00%)                              | 4574(-4.03%)   | 1039(+0.00%)                       | 1039(+0.00%)                         | 1037(+0.00%)           | 1607(+0.00%)          | 997(+0.00%)     | 997(+0.00%)                   | 1016(+0.00%) | 5239(-4.66%)   | 4029(-4.55%)   | 1863(+0.00%)          | 1037(+0.00%) | 1652(-5.92%)          | 29039(-6.59%) | 10952(-6.55%)        | 766(-7.60%)        | 5842(-7.42%)          | 1162(-7.85%)       | 3018(-7.82%)            | 4304(-8.19%)            | 1332(+0.00%) | 130(+0.00%)    | 1188(+0.00%)  | 111(+0.00%)    | 529(+0.00%)            | 528(+0.00%)                          | 512(+0.00%)   | 4948(+0.00%)  | 318(-12.40%)       | 328(-12.53%)       | 1879(+0.00%)  | 626(+0.00%)          | 626(+0.00%)          | 1028(+0.00%)  | 1023(+0.00%)  | 1023(+0.00%)          | 1511(+0.00%)  | 1511(+0.00%)          | -0.37%       | 1.57       | +0.00%      | +0.00%      | +0.00%      | +0.00%      | +0.00%      | +0.00%      | +0.00%      | +0.00%      | +0.00%      |
| Conservative pre-ra pressure reduction | 2540(+25.25%)                 | 2561(+24.99%)   | 3329(+15.55%)           | 4961(+12.80%)    | 388(+12.46%)  | 4387(+9.59%)     | 4392(+9.58%)                  | 1843(+9.51%)                  | 1843(+9.51%)     | 2913(+7.02%)              | 1969(+6.78%)              | 2021(+6.76%) | 302(+6.71%)    | 1077(+6.32%) | 4340(+6.27%)   | 830(+6.14%)     | 847(+6.01%)                 | 852(+5.97%)    | 868(+5.85%)                        | 868(+5.85%)                          | 869(+5.85%)                | 1208(+5.32%)     | 1208(+5.32%)     | 705(+4.75%)            | 208(+4.52%)       | 193(+4.32%)       | 5891(+3.41%)                         | 1523(+3.39%)           | 493(+3.35%)  | 1498(+3.31%)   | 1051(+3.04%) | 596(+2.76%)  | 1194(+2.58%)     | 650(+2.36%)     | 8513(+2.26%)              | 8552(+2.25%)              | 8583(+2.24%)              | 398(+2.05%)     | 4951(+2.00%) | 40278(+1.87%)           | 10826(+1.79%)           | 183(+1.67%)  | 2017(+1.61%)       | 4065(+1.60%)     | 4067(+1.60%)   | 7602(+0.85%)                 | 8320(+0.78%)                         | 2109(+0.76%)   | 2129(+0.76%)           | 8577(+0.70%)               | 8618(+0.70%)               | 8563(+0.66%)               | 768(+0.66%)  | 31584(+0.65%) | 167(+0.60%)  | 354(+0.57%)     | 4095(+0.54%)              | 4113(+0.54%)              | 2871(+0.49%) | 4107(+0.49%)              | 227(+0.44%)  | 55303(+0.33%) | 306(+0.33%)     | 4143(+0.29%)               | 4147(+0.29%)               | 4165(+0.29%)               | 8668(+0.23%)                 | 504(+0.20%)  | 37375(+0.17%) | 13251(+0.15%)             | 1495(+0.13%) | 10161(+0.13%)  | 8090(+0.12%)              | 14314(+0.11%)                 | 7323(+0.11%)              | 8068(+0.07%)              | 19814(+0.06%) | 5486(+0.05%)     | 6045(+0.05%)       | 3412(+0.03%)               | 16739(+0.02%) | 410(+0.00%)  | 413(+0.00%)     | 3051(+0.00%)                 | 1810(+0.00%)                 | 2862(+0.00%)                 | 4089(+0.00%)                 | 499(+0.00%)               | 503(+0.00%)           | 330(+0.00%)   | 518(+0.00%)   | 388(+0.00%)       | 388(+0.00%)      | 484(+0.00%)          | 1906(+0.00%)       | 1906(+0.00%)         | 1238(+0.00%)   | 1238(+0.00%)     | 1667(+0.00%)       | 386(+0.00%)    | 183(+0.00%)    | 979(+0.00%)                                              | 962(+0.00%)                                  | 962(+0.00%)                                  | 962(+0.00%)                                                | 534(+0.00%)           | 990(+0.00%)           | 294(+0.00%)           | 526(+0.00%)           | 678(+0.00%)           | 1117(+0.00%)          | 366(+0.00%)           | 589(+0.00%)           | 224(+0.00%)      | 224(+0.00%)      | 5804(+0.00%)    | 7407(+0.00%) | 2017(+0.00%)         | 1971(+0.00%)     | 912(+0.00%)  | 2827(+0.00%) | 3755(+0.00%) | 483(+0.00%)  | 393(+0.00%)  | 1396(+0.00%)            | 1223(+0.00%)            | 461(+0.00%)         | 1507(+0.00%)        | 261(+0.00%)         | 786(+0.00%)         | 714(+0.00%)         | 1634(+0.00%)        | 386(+0.00%)         | 849(+0.00%)         | 1021(+0.00%)                | 256(+0.00%)                 | 2831(+0.00%)                  | 12762(+0.00%)                 | 15087(+0.00%)                     | 11587(+0.00%)                 | 360(+0.00%)          | 787(+0.00%)  | 543(+0.00%)  | 910(+0.00%)  | 910(+0.00%)  | 1060(+0.00%)                         | 1044(+0.00%)   | 1061(+0.00%)               | 372(+0.00%)  | 6370(+0.00%) | 9379(+0.00%)        | 9840(+0.00%)    | 8672(+0.00%)           | 8672(+0.00%)           | 40847(+0.00%)                        | 4297(+0.00%)                         | 4297(+0.00%) | 3696(+0.00%)          | 2536(+0.00%)          | 1133(+0.00%)      | 1676(+0.00%)      | 1323(+0.00%)      | 2504(+0.00%)      | 17856(+0.00%)    | 35861(+0.00%)    | 24036(+0.00%)    | 13669(+0.00%)              | 13669(+0.00%)              | 8168(+0.00%)               | 13659(+0.00%)              | 8164(+0.00%)               | 8152(+0.00%)               | 50062(+0.00%)                | 83475(+0.00%)            | 162(+0.00%)           | 137(+0.00%)           | 17856(+0.00%)    | 19512(+0.00%)    | 23790(+0.00%)    | 3361(+0.00%)              | 47492(+0.00%)    | 41024(+0.00%)    | 16926(+0.00%)    | 285(+0.00%)         | 588(+0.00%)  | 3546(+0.00%)      | 355(+0.00%)  | 355(+0.00%)  | 445(+0.00%)      | 273(+0.00%)      | 1351(+0.00%)            | 1351(+0.00%)            | 4055(+0.00%)   | 2615(+0.00%)   | 411(+0.00%)  | 96(+0.00%)   | 3016(+0.00%)  | 840(+0.00%)   | 765(+0.00%)      | 1365(+0.00%) | 1650(+0.00%)   | 29751(+0.00%) | 3767(+0.00%) | 21906(+0.00%) | 21906(+0.00%) | 168(+0.00%)        | 168(+0.00%)    | 2499(+0.00%)                | 2879(+0.00%) | 3823(+0.00%) | 4081(+0.00%)     | 2371(+0.00%)     | 1219(+0.00%)  | 171(+0.00%)   | 846(+0.00%)   | 122(+0.00%)   | 34471(+0.00%) | 305(+0.00%)   | 32520(+0.00%) | 271(+0.00%)   | 1773(+0.00%)            | 2340(+0.00%)                | 1138(+0.00%)            | 1122(+0.00%)                | 1781(+0.00%)            | 1677(+0.00%)                | 1790(+0.00%)            | 1686(+0.00%)                | 1794(+0.00%)            | 1794(+0.00%)                | 2603(+0.00%)            | 3184(+0.00%)                | 1134(+0.00%)            | 1070(+0.00%)                | 2614(+0.00%)            | 2614(+0.00%)                | 2624(+0.00%)            | 2468(+0.00%)                | 2620(+0.00%)            | 2464(+0.00%)                | 13246(-0.02%)             | 17032(-0.02%)         | 17032(-0.02%)         | 8088(-0.02%)              | 7688(-0.04%)   | 6271(-0.06%)                 | 5157(-0.08%)                 | 14778(-0.08%)    | 7364(-0.08%)               | 29310(-0.08%)    | 13215(-0.09%)             | 11613(-0.10%)                 | 2725(+0.18%) | 19103(-0.12%)    | 5478(-0.13%)     | 1461(-0.27%)    | 1850(-0.38%)                                                 | 1831(-0.38%)                                     | 1831(-0.38%)                                     | 11291(-0.12%)      | 1979(-0.75%) | 3741(-1.27%) | 3484(+2.83%)          | 4044(+2.85%)          | 36622(-1.38%) | 6040(-0.45%)    | 354(-1.67%)             | 2990(-1.68%) | 341(-1.73%)  | 439(-1.79%)  | 2556(-1.84%)      | 2170(-0.96%)   | 8734(-0.29%) | 24388(-0.20%)        | 2439(-0.37%)             | 8001(-2.50%) | 1238(-2.52%)           | 378(-2.58%)             | 960(-1.03%)        | 1273(-1.32%)  | 1273(-1.32%)             | 1219(-0.65%) | 487(-3.75%)  | 408(-3.77%)  | 408(-3.77%)        | 11597(-3.78%)                | 2859(-0.59%) | 271(-3.90%)  | 776(-3.96%)        | 766(-4.01%)   | 883(-1.89%)  | 10365(-4.05%)                | 680(-4.23%)             | 970(-4.53%)                 | 970(-4.53%)                               | 4550(-0.52%)   | 991(-4.62%)                        | 991(-4.62%)                          | 989(-4.63%)            | 1531(-4.73%)          | 949(-4.81%)     | 949(-4.81%)                   | 966(-4.92%)  | 5218(-0.40%)   | 4007(-0.55%)   | 1762(-5.42%)          | 973(-6.17%)  | 1643(-0.54%)          | 29014(-0.09%) | 10918(-0.31%)        | 766(+0.00%)        | 5825(-0.29%)          | 1162(+0.00%)       | 3009(-0.30%)            | 4287(-0.39%)            | 1201(-9.83%) | 117(-10.00%)   | 1064(-10.44%) | 98(-11.71%)    | 466(-11.91%)           | 465(-11.93%)                         | 449(-12.30%)  | 4335(-12.39%) | 318(+0.00%)        | 328(+0.00%)        | 1625(-13.52%) | 500(-20.13%)         | 500(-20.13%)         | 730(-28.99%)  | 641(-37.34%)  | 641(-37.34%)          | 929(-38.52%)  | 929(-38.52%)          | -0.47%       | 5.99       | -3.77%      | -0.29%      | +0.00%      | +0.00%      | +0.00%      | +0.00%      | +0.00%      | +0.29%      | +2.80%      |
| Total diff                             | REGR(+25.25%)                 | REGR(+24.99%)   | REGR(+15.55%)           | REGR(+12.80%)    | REGR(+12.46%) | REGR(+9.59%)     | REGR(+9.58%)                  | REGR(+9.51%)                  | REGR(+9.51%)     | REGR(+7.02%)              | REGR(+6.78%)              | REGR(+6.76%) | REGR(+6.71%)   | REGR(+6.32%) | REGR(+6.27%)   | REGR(+6.14%)    | REGR(+6.01%)                | REGR(+5.97%)   | REGR(+5.85%)                       | REGR(+5.85%)                         | REGR(+5.85%)               | REGR(+5.32%)     | REGR(+5.32%)     | REGR(+4.75%)           | REGR(+4.52%)      | REGR(+4.32%)      | REGR(+3.41%)                         | REGR(+3.39%)           | REGR(+3.35%) | REGR(+3.31%)   | REGR(+3.04%) | REGR(+2.76%) | REGR(+2.58%)     | REGR(+2.36%)    | REGR(+2.26%)              | REGR(+2.25%)              | REGR(+2.24%)              | REGR(+2.05%)    | REGR(+2.00%) | REGR(+1.87%)            | REGR(+1.79%)            | REGR(+1.67%) | REGR(+1.61%)       | REGR(+1.60%)     | REGR(+1.60%)   | REGR(+0.85%)                 | REGR(+0.78%)                         | REGR(+0.76%)   | REGR(+0.76%)           | REGR(+0.70%)               | REGR(+0.70%)               | REGR(+0.66%)               | REGR(+0.66%) | REGR(+0.65%)  | REGR(+0.60%) | REGR(+0.57%)    | REGR(+0.54%)              | REGR(+0.54%)              | REGR(+0.49%) | REGR(+0.49%)              | REGR(+0.44%) | REGR(+0.33%)  | REGR(+0.33%)    | REGR(+0.29%)               | REGR(+0.29%)               | REGR(+0.29%)               | REGR(+0.23%)                 | REGR(+0.20%) | REGR(+0.17%)  | REGR(+0.15%)              | REGR(+0.13%) | REGR(+0.13%)   | REGR(+0.12%)              | REGR(+0.11%)                  | REGR(+0.11%)              | SAME(+0.07%)              | SAME(+0.06%)  | SAME(+0.05%)     | SAME(+0.05%)       | SAME(+0.03%)               | SAME(+0.02%)  | SAME(+0.00%) | SAME(+0.00%)    | SAME(+0.00%)                 | SAME(+0.00%)                 | SAME(+0.00%)                 | SAME(+0.00%)                 | SAME(+0.00%)              | SAME(+0.00%)          | SAME(+0.00%)  | SAME(+0.00%)  | SAME(+0.00%)      | SAME(+0.00%)     | SAME(+0.00%)         | SAME(+0.00%)       | SAME(+0.00%)         | SAME(+0.00%)   | SAME(+0.00%)     | SAME(+0.00%)       | SAME(+0.00%)   | SAME(+0.00%)   | SAME(+0.00%)                                             | SAME(+0.00%)                                 | SAME(+0.00%)                                 | SAME(+0.00%)                                               | SAME(+0.00%)          | SAME(+0.00%)          | SAME(+0.00%)          | SAME(+0.00%)          | SAME(+0.00%)          | SAME(+0.00%)          | SAME(+0.00%)          | SAME(+0.00%)          | SAME(+0.00%)     | SAME(+0.00%)     | SAME(+0.00%)    | SAME(+0.00%) | SAME(+0.00%)         | SAME(+0.00%)     | SAME(+0.00%) | SAME(+0.00%) | SAME(+0.00%) | SAME(+0.00%) | SAME(+0.00%) | SAME(+0.00%)            | SAME(+0.00%)            | SAME(+0.00%)        | SAME(+0.00%)        | SAME(+0.00%)        | SAME(+0.00%)        | SAME(+0.00%)        | SAME(+0.00%)        | SAME(+0.00%)        | SAME(+0.00%)        | SAME(+0.00%)                | SAME(+0.00%)                | SAME(+0.00%)                  | SAME(+0.00%)                  | SAME(+0.00%)                      | SAME(+0.00%)                  | SAME(+0.00%)         | SAME(+0.00%) | SAME(+0.00%) | SAME(+0.00%) | SAME(+0.00%) | SAME(+0.00%)                         | SAME(+0.00%)   | SAME(+0.00%)               | SAME(+0.00%) | SAME(+0.00%) | SAME(+0.00%)        | SAME(+0.00%)    | SAME(+0.00%)           | SAME(+0.00%)           | SAME(+0.00%)                         | SAME(+0.00%)                         | SAME(+0.00%) | SAME(+0.00%)          | SAME(+0.00%)          | SAME(+0.00%)      | SAME(+0.00%)      | SAME(+0.00%)      | SAME(+0.00%)      | SAME(+0.00%)     | SAME(+0.00%)     | SAME(+0.00%)     | SAME(+0.00%)               | SAME(+0.00%)               | SAME(+0.00%)               | SAME(+0.00%)               | SAME(+0.00%)               | SAME(+0.00%)               | SAME(+0.00%)                 | SAME(+0.00%)             | SAME(+0.00%)          | SAME(+0.00%)          | SAME(+0.00%)     | SAME(+0.00%)     | SAME(+0.00%)     | SAME(+0.00%)              | SAME(+0.00%)     | SAME(+0.00%)     | SAME(+0.00%)     | SAME(+0.00%)        | SAME(+0.00%) | SAME(+0.00%)      | SAME(+0.00%) | SAME(+0.00%) | SAME(+0.00%)     | SAME(+0.00%)     | SAME(+0.00%)            | SAME(+0.00%)            | SAME(+0.00%)   | SAME(+0.00%)   | SAME(+0.00%) | SAME(+0.00%) | SAME(+0.00%)  | SAME(+0.00%)  | SAME(+0.00%)     | SAME(+0.00%) | SAME(+0.00%)   | SAME(+0.00%)  | SAME(+0.00%) | SAME(+0.00%)  | SAME(+0.00%)  | SAME(+0.00%)       | SAME(+0.00%)   | SAME(+0.00%)                | SAME(+0.00%) | SAME(+0.00%) | SAME(+0.00%)     | SAME(+0.00%)     | SAME(+0.00%)  | SAME(+0.00%)  | SAME(+0.00%)  | SAME(+0.00%)  | SAME(+0.00%)  | SAME(+0.00%)  | SAME(+0.00%)  | SAME(+0.00%)  | SAME(+0.00%)            | SAME(+0.00%)                | SAME(+0.00%)            | SAME(+0.00%)                | SAME(+0.00%)            | SAME(+0.00%)                | SAME(+0.00%)            | SAME(+0.00%)                | SAME(+0.00%)            | SAME(+0.00%)                | SAME(+0.00%)            | SAME(+0.00%)                | SAME(+0.00%)            | SAME(+0.00%)                | SAME(+0.00%)            | SAME(+0.00%)                | SAME(+0.00%)            | SAME(+0.00%)                | SAME(+0.00%)            | SAME(+0.00%)                | SAME(-0.02%)              | SAME(-0.02%)          | SAME(-0.02%)          | SAME(-0.02%)              | SAME(-0.03%)   | SAME(-0.06%)                 | SAME(-0.08%)                 | SAME(-0.08%)     | SAME(-0.08%)               | SAME(-0.08%)     | SAME(-0.09%)              | IMPR(-0.10%)                  | IMPR(-0.11%) | IMPR(-0.12%)     | IMPR(-0.13%)     | IMPR(-0.27%)    | IMPR(-0.38%)                                                 | IMPR(-0.38%)                                     | IMPR(-0.38%)                                     | IMPR(-0.42%)       | IMPR(-0.45%) | IMPR(-1.27%) | IMPR(-1.36%)          | IMPR(-1.37%)          | IMPR(-1.38%)  | IMPR(-1.61%)    | IMPR(-1.67%)            | IMPR(-1.68%) | IMPR(-1.73%) | IMPR(-1.79%) | IMPR(-1.84%)      | IMPR(-1.85%)   | IMPR(-2.07%) | IMPR(-2.12%)         | IMPR(-2.28%)             | IMPR(-2.50%) | IMPR(-2.52%)           | IMPR(-2.58%)            | IMPR(-2.64%)       | IMPR(-2.82%)  | IMPR(-2.82%)             | IMPR(-3.18%) | IMPR(-3.75%) | IMPR(-3.77%) | IMPR(-3.77%)       | IMPR(-3.78%)                 | IMPR(-3.80%) | IMPR(-3.90%) | IMPR(-3.96%)       | IMPR(-4.01%)  | IMPR(-4.02%) | IMPR(-4.05%)                 | IMPR(-4.23%)            | IMPR(-4.53%)                | IMPR(-4.53%)                              | IMPR(-4.53%)   | IMPR(-4.62%)                       | IMPR(-4.62%)                         | IMPR(-4.63%)           | IMPR(-4.73%)          | IMPR(-4.81%)    | IMPR(-4.81%)                  | IMPR(-4.92%) | IMPR(-5.04%)   | IMPR(-5.07%)   | IMPR(-5.42%)          | IMPR(-6.17%) | IMPR(-6.44%)          | IMPR(-6.67%)  | IMPR(-6.84%)         | IMPR(-7.60%)       | IMPR(-7.69%)          | IMPR(-7.85%)       | IMPR(-8.09%)            | IMPR(-8.55%)            | IMPR(-9.83%) | IMPR(-10.00%)  | IMPR(-10.44%) | IMPR(-11.71%)  | IMPR(-11.91%)          | IMPR(-11.93%)                        | IMPR(-12.30%) | IMPR(-12.39%) | IMPR(-12.40%)      | IMPR(-12.53%)      | IMPR(-13.52%) | IMPR(-20.13%)        | IMPR(-20.13%)        | IMPR(-28.99%) | IMPR(-37.34%) | IMPR(-37.34%)         | IMPR(-38.52%) | IMPR(-38.52%)         | -0.84%       | 6.17       | -4.78%      | -1.67%      | +0.00%      | +0.00%      | +0.00%      | +0.00%      | +0.00%      | +0.24%      | +2.49%      |

martien-de-jong · 2024-08-08T13:18:05Z

llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp

@@ -169,7 +173,8 @@ bool InterBlockScheduling::leaveBlock() {
    // If we are very unlucky, we may step both the latency margin and
    // the resource margin to the max. Any more indicates failure to converge,
    // and we abort to prevent an infinite loop.
-    if (BS.FixPoint.NumIters > 2 * HR->getConflictHorizon()) {
+    if (BS.FixPoint.NumIters >
+        2 * HR->getConflictHorizon() + MaxExpensiveIterations) {


nit: I think we first do MaxExpensiveIterations, then fall back to the global safety margins. Reverse the two terms here, and perhaps make the comment more precise?

martien-de-jong · 2024-08-08T13:29:30Z

llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp

+    auto Res = BS.FixPoint.PerMILatencyMargin.try_emplace(MINeedsHigherCap, 0);
+    if (BS.FixPoint.NumIters > MaxExpensiveIterations) {
+      // Increase the latency margin per instruction, unless we already iterated
+      // more than MaxExpensiveIterations without converging.


nit: I think the comment should be on the outside of this if. And then perhaps also order the branches accordingly.

martien-de-jong · 2024-08-08T13:44:32Z

llvm/test/CodeGen/AIE/aie2/schedule/pre_ra/add2d_inner.mir

+# We should see most of the VLDA.UPS instructions move down in the loop
+# BB to reduce the reg pressure and avoid spills. They can later be moved back
+# up by the post-RA scheduler. This should also make the 4 acc1024 COPY
+# instructions coalesce-able.


Could we have a [presched, RA] example that actually demonstrates reduced spilling?

I really tried to manipulate that example, but can't get it to spill without writing absolutely ugly code. I'd leave it like this if that's fine to you. In a follow-up PR where I change the MachinePipeliner, i'll add an end-to-end IR test that shows Add2D getting nicely SW pipelined.

llvm/lib/Target/AIE/AIEMachineScheduler.cpp

martien-de-jong · 2024-08-08T13:52:00Z

llvm/lib/Target/AIE/AIEMachineScheduler.cpp

-      tryPressure(TryCand.RPDelta.CurrentMax, Cand.RPDelta.CurrentMax, TryCand,
-                  Cand, RegMax, TRI, DAG->MF))
-    return TryCand.Reason != NoCand;
+    // Avoid increasing the max pressure of the entire region.


CHECK: isTrackingPressure() is trivially true here.

martien-de-jong

Nice! Especially the obvious convergence of the per-instruction latency cap.

martien-de-jong · 2024-08-08T14:13:10Z

llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp

+      // more than MaxExpensiveIterations without converging.
+      BS.FixPoint.LatencyMargin++;
+    } else {
+      ++Res.first->second;


perhaps assert that we don't exceed MaxLatency. (or ConflictHorizon)

gbossu · 2024-08-08T14:16:38Z

llvm/lib/Target/AIE/AIEMachineScheduler.cpp

@@ -40,7 +40,7 @@ static cl::opt<bool>
                          cl::desc("Track reg pressure more accurately and "
                                   "delay some instructions to avoid spills."));
 static cl::opt<unsigned> NumCriticalFreeRegs(
-    "aie-premisched-near-critical-regs", cl::init(4),
+    "aie-premisched-near-critical-regs", cl::init(2),


Note: I'm reducing the limit here, but then this get multiplied by the number of pressure units required by the reg class. E.g. the number of free units we try to maintain for W is 2, for X it is 4, and for Y it is 8.

We want to increase the safety margin for one instruciton at a time here, instead of doing it for all instructions at once.

andcarminati · 2024-08-08T15:33:40Z

llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp

@@ -37,6 +37,10 @@ static cl::opt<bool> LoopEpilogueAnalysis(
    "aie-loop-epilogue-analysis", cl::init(true),
    cl::desc("[AIE] Perform Loop/Epilogue analysis with loop scheduling"));

+static cl::opt<int> MaxExpensiveIterations(
+    "aie-loop-aware-expensive-iterations", cl::init(25),


Is there a rationale behind this number?

Yes and no. I feel anything over 50 is too much, and anything below 10 is not enough if we need to move a couple of instructions up by 2-3 cycles. So 25 felt like a good compromise. And this works well for loops with an II between 5 and 10 cycles, which is the territory of the PreRA pipeliner for us.

andcarminati · 2024-08-08T15:36:40Z

llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp

@@ -169,7 +173,8 @@ bool InterBlockScheduling::leaveBlock() {
    // If we are very unlucky, we may step both the latency margin and
    // the resource margin to the max. Any more indicates failure to converge,
    // and we abort to prevent an infinite loop.
-    if (BS.FixPoint.NumIters > 2 * HR->getConflictHorizon()) {
+    if (BS.FixPoint.NumIters >


Considering your change, does this error become more common without this increase?

Oh it never triggered, I just changed the condition to account for the extra iterations, otherwise we would fail thinking we are in an infinite loop.

andcarminati · 2024-08-08T15:43:01Z

llvm/lib/Target/AIE/AIEMachineScheduler.cpp

+  PSetThresholds.clear();
+  for (unsigned PSet = 0, EndPSet = RegionMaxPressure.size(); PSet < EndPSet;
+       ++PSet) {
+    unsigned MaxPressure = RegionMaxPressure[PSet];


nit: const unsigned MaxPressure

andcarminati

LGTM. A nice piece of work!

In a follow-up commit, the premisched will re-order the instructions to reduce the pressure and avoid spills during RA.

llvm/test/CodeGen/AIE/aie2/ra/tie-subregs-flow-3d.mir

llvm/test/CodeGen/AIE/aie2/ra/tie-subregs-flow-tmp.mir

- Reserve a certain number of registers, not regunits - Be extra careful when the region max pressure exceeds limits

gbossu requested review from abhinay-anubola, abnikant, andcarminati, khallouh, konstantinschwarz, martien-de-jong, SagarMaheshwari99 and stephenneuendorffer as code owners August 7, 2024 13:57

martien-de-jong reviewed Aug 8, 2024

View reviewed changes

llvm/lib/Target/AIE/AIEMachineScheduler.cpp Show resolved Hide resolved

martien-de-jong reviewed Aug 8, 2024

View reviewed changes

martien-de-jong previously approved these changes Aug 8, 2024

View reviewed changes

martien-de-jong reviewed Aug 8, 2024

View reviewed changes

gbossu commented Aug 8, 2024

View reviewed changes

gbossu added 2 commits August 8, 2024 15:21

[AIE2] NFC: Add baseline test for complex loop-aware sched convergence

8193c57

We want to increase the safety margin for one instruciton at a time here, instead of doing it for all instructions at once.

[AIEX] Loop-aware sched: Increase latency margin per instruction

8dc4a1b

andcarminati reviewed Aug 8, 2024

View reviewed changes

andcarminati previously approved these changes Aug 8, 2024

View reviewed changes

[AIE2] NFC: Add baseline test with critical CM reg pressure

73f1cd4

In a follow-up commit, the premisched will re-order the instructions to reduce the pressure and avoid spills during RA.

gbossu dismissed stale reviews from andcarminati and martien-de-jong via 8d997c4 August 8, 2024 16:32

gbossu force-pushed the gaetan.improve.scheds branch from 9a38355 to 8d997c4 Compare August 8, 2024 16:32

konstantinschwarz reviewed Aug 8, 2024

View reviewed changes

llvm/test/CodeGen/AIE/aie2/ra/tie-subregs-flow-3d.mir Outdated Show resolved Hide resolved

llvm/test/CodeGen/AIE/aie2/ra/tie-subregs-flow-tmp.mir Outdated Show resolved Hide resolved

[AIEX] Premisched: more conservative reg pressure reduction

71614b9

- Reserve a certain number of registers, not regunits - Be extra careful when the region max pressure exceeds limits

gbossu force-pushed the gaetan.improve.scheds branch from 8d997c4 to 71614b9 Compare August 9, 2024 07:19

andcarminati approved these changes Aug 9, 2024

View reviewed changes

martien-de-jong approved these changes Aug 9, 2024

View reviewed changes

gbossu merged commit 9a7a198 into aie-public Aug 9, 2024
8 checks passed

gbossu deleted the gaetan.improve.scheds branch August 9, 2024 09:01

gbossu mentioned this pull request Aug 26, 2024

[AIEX] More SW pipelining #170

Merged

[AIEX] Scheduler improvements #147

[AIEX] Scheduler improvements #147

Uh oh!

Conversation

gbossu commented Aug 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martien-de-jong Aug 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martien-de-jong Aug 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martien-de-jong left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gbossu Aug 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andcarminati Aug 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andcarminati left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gbossu commented Aug 7, 2024 •

edited

Loading

martien-de-jong Aug 8, 2024 •

edited

Loading

martien-de-jong Aug 8, 2024 •

edited

Loading

gbossu Aug 8, 2024 •

edited

Loading

andcarminati Aug 8, 2024 •

edited

Loading