-
Notifications
You must be signed in to change notification settings - Fork 29
[AIEX] Scheduler improvements #147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think we first do MaxExpensiveIterations, then fall back to the global safety margins. Reverse the two terms here, and perhaps make the comment more precise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think the comment should be on the outside of this if. And then perhaps also order the branches accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have a [presched, RA] example that actually demonstrates reduced spilling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really tried to manipulate that example, but can't get it to spill without writing absolutely ugly code. I'd leave it like this if that's fine to you. In a follow-up PR where I change the MachinePipeliner, i'll add an end-to-end IR test that shows Add2D getting nicely SW pipelined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHECK: isTrackingPressure() is trivially true here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Especially the obvious convergence of the per-instruction latency cap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps assert that we don't exceed MaxLatency. (or ConflictHorizon)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I'm reducing the limit here, but then this get multiplied by the number of pressure units required by the reg class. E.g. the number of free units we try to maintain for W is 2, for X it is 4, and for Y it is 8.
We want to increase the safety margin for one instruciton at a time here, instead of doing it for all instructions at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a rationale behind this number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and no. I feel anything over 50 is too much, and anything below 10 is not enough if we need to move a couple of instructions up by 2-3 cycles. So 25 felt like a good compromise. And this works well for loops with an II between 5 and 10 cycles, which is the territory of the PreRA pipeliner for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering your change, does this error become more common without this increase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh it never triggered, I just changed the condition to account for the extra iterations, otherwise we would fail thinking we are in an infinite loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: const unsigned MaxPressure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. A nice piece of work!
In a follow-up commit, the premisched will re-order the instructions to reduce the pressure and avoid spills during RA.
8d997c4
9a38355 to
8d997c4
Compare
- Reserve a certain number of registers, not regunits - Be extra careful when the region max pressure exceeds limits
8d997c4 to
71614b9
Compare
Pre-RA: More conservative scheduling when under high register pressure. This is very helpful for SW pipelining. I'll have another PR which makes the MachinePipeliner find more schedules, and this PR here helps us not spill
Post-RA: Change the loop-aware scheduling to have an "expensive convergence" mode, when we increase the latency safety margin per instruciton, instead of for all instructions.
QoR results below. Overall it's good. There are some regressions, but we'll get rid of them with less unrolling and more SWP.