-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIEX] Scheduler improvements #147
Conversation
@@ -169,7 +173,8 @@ bool InterBlockScheduling::leaveBlock() { | |||
// If we are very unlucky, we may step both the latency margin and | |||
// the resource margin to the max. Any more indicates failure to converge, | |||
// and we abort to prevent an infinite loop. | |||
if (BS.FixPoint.NumIters > 2 * HR->getConflictHorizon()) { | |||
if (BS.FixPoint.NumIters > | |||
2 * HR->getConflictHorizon() + MaxExpensiveIterations) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think we first do MaxExpensiveIterations, then fall back to the global safety margins. Reverse the two terms here, and perhaps make the comment more precise?
auto Res = BS.FixPoint.PerMILatencyMargin.try_emplace(MINeedsHigherCap, 0); | ||
if (BS.FixPoint.NumIters > MaxExpensiveIterations) { | ||
// Increase the latency margin per instruction, unless we already iterated | ||
// more than MaxExpensiveIterations without converging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think the comment should be on the outside of this if. And then perhaps also order the branches accordingly.
# We should see most of the VLDA.UPS instructions move down in the loop | ||
# BB to reduce the reg pressure and avoid spills. They can later be moved back | ||
# up by the post-RA scheduler. This should also make the 4 acc1024 COPY | ||
# instructions coalesce-able. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have a [presched, RA] example that actually demonstrates reduced spilling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really tried to manipulate that example, but can't get it to spill without writing absolutely ugly code. I'd leave it like this if that's fine to you. In a follow-up PR where I change the MachinePipeliner, i'll add an end-to-end IR test that shows Add2D getting nicely SW pipelined.
tryPressure(TryCand.RPDelta.CurrentMax, Cand.RPDelta.CurrentMax, TryCand, | ||
Cand, RegMax, TRI, DAG->MF)) | ||
return TryCand.Reason != NoCand; | ||
// Avoid increasing the max pressure of the entire region. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHECK: isTrackingPressure() is trivially true here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Especially the obvious convergence of the per-instruction latency cap.
// more than MaxExpensiveIterations without converging. | ||
BS.FixPoint.LatencyMargin++; | ||
} else { | ||
++Res.first->second; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps assert that we don't exceed MaxLatency. (or ConflictHorizon)
@@ -40,7 +40,7 @@ static cl::opt<bool> | |||
cl::desc("Track reg pressure more accurately and " | |||
"delay some instructions to avoid spills.")); | |||
static cl::opt<unsigned> NumCriticalFreeRegs( | |||
"aie-premisched-near-critical-regs", cl::init(4), | |||
"aie-premisched-near-critical-regs", cl::init(2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I'm reducing the limit here, but then this get multiplied by the number of pressure units required by the reg class. E.g. the number of free units we try to maintain for W is 2, for X it is 4, and for Y it is 8.
We want to increase the safety margin for one instruciton at a time here, instead of doing it for all instructions at once.
@@ -37,6 +37,10 @@ static cl::opt<bool> LoopEpilogueAnalysis( | |||
"aie-loop-epilogue-analysis", cl::init(true), | |||
cl::desc("[AIE] Perform Loop/Epilogue analysis with loop scheduling")); | |||
|
|||
static cl::opt<int> MaxExpensiveIterations( | |||
"aie-loop-aware-expensive-iterations", cl::init(25), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a rationale behind this number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and no. I feel anything over 50 is too much, and anything below 10 is not enough if we need to move a couple of instructions up by 2-3 cycles. So 25 felt like a good compromise. And this works well for loops with an II between 5 and 10 cycles, which is the territory of the PreRA pipeliner for us.
@@ -169,7 +173,8 @@ bool InterBlockScheduling::leaveBlock() { | |||
// If we are very unlucky, we may step both the latency margin and | |||
// the resource margin to the max. Any more indicates failure to converge, | |||
// and we abort to prevent an infinite loop. | |||
if (BS.FixPoint.NumIters > 2 * HR->getConflictHorizon()) { | |||
if (BS.FixPoint.NumIters > |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering your change, does this error become more common without this increase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh it never triggered, I just changed the condition to account for the extra iterations, otherwise we would fail thinking we are in an infinite loop.
PSetThresholds.clear(); | ||
for (unsigned PSet = 0, EndPSet = RegionMaxPressure.size(); PSet < EndPSet; | ||
++PSet) { | ||
unsigned MaxPressure = RegionMaxPressure[PSet]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: const unsigned MaxPressure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. A nice piece of work!
In a follow-up commit, the premisched will re-order the instructions to reduce the pressure and avoid spills during RA.
8d997c4
9a38355
to
8d997c4
Compare
- Reserve a certain number of registers, not regunits - Be extra careful when the region max pressure exceeds limits
8d997c4
to
71614b9
Compare
Pre-RA: More conservative scheduling when under high register pressure. This is very helpful for SW pipelining. I'll have another PR which makes the MachinePipeliner find more schedules, and this PR here helps us not spill
Post-RA: Change the loop-aware scheduling to have an "expensive convergence" mode, when we increase the latency safety margin per instruciton, instead of for all instructions.
QoR results below. Overall it's good. There are some regressions, but we'll get rid of them with less unrolling and more SWP.