-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIEX] Iterative feedback-driven post-pipeliner #359
Conversation
165bb63
to
6af25cb
Compare
@@ -59,6 +59,9 @@ class AIE2Subtarget : public AIE2GenSubtargetInfo, public AIEBaseSubtarget { | |||
StringRef FS, StringRef ABIName, const TargetMachine &TM); | |||
|
|||
bool enableMachineScheduler() const override { return true; } | |||
bool enableMachinePipeliner() const override { | |||
return AIEBaseSubtarget::enableMachinePipeliner(); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHECK: we just disable the pre-pipeliner, not the prescheduler. And 'forcing' assumes infinite willingness on the part of the postpipeliner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... but the prescheduler follows the pre-pipeliner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, I'll add a small comment
auto [It, Inserted] = UniqueAncestors.insert(P); | ||
if (Inserted) { | ||
Slots += Pred.Slots; | ||
Count++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Count could be replaced with UniqueAncestors.size()
@@ -633,6 +633,16 @@ bool PostPipeliner::scheduleOtherIterations(PostPipelinerStrategy &Strategy) { | |||
return true; | |||
} | |||
|
|||
int getMinOutputLat(ArrayRef<SDep> Nodes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: More descriptive name for Nodes. SuccDeps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, this is a general helper that just returns the minimum output latency out of the given edges. So I think it makes sense to keep the parameter generic as well. I can maybe rename to Edges
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, makes sense, Edges or Deps would probably not have triggered my comment.
{1, true, false, HeuristicRuns, {Prio::Critical, Prio::LCDLatest}}, | ||
{1, true, false, HeuristicRuns, {Prio::Liveness, Prio::Latest}}, | ||
{1, true, false, HeuristicRuns, {Prio::Latest, Prio::Liveness}}, | ||
// Bottom-up strategies | ||
{0, false, false, 2, {Prio::Critical, Prio::LCDLatest}}, | ||
{1, false, false, 2, {Prio::Critical, Prio::LCDLatest}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: we should probably weed and sort with respect to effectiveness at some point.
In particular, I hope that just NodeNum would be one of the lesser effective ones, and should be moved down. Also Critical + Latest might cover all of just Critical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, one of my plans would be to add "optimization remarks" for whatever strategy was picked. Then we can derive what works better.
I could also maybe have a mode that runs all of the heuristics for a given II, even after one has succeeded. The point would be to find the one that converges faster.
(In a future PR, maybe 😄)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the latter one would be nice. it would list all heuristics that found the best II. Totalling that over a number of represesentative benchmarks would give a good score.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see no real problems. Shout if have finalized for a formal approval.
AIELoopUtils::getPipelinerDisabled(*Block); | ||
if (!Block) | ||
return false; | ||
bool PrePipelinerDisabled = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: const bool PrePipelinerDisabled
!1 = !{i32 2, !"Debug Info Version", i32 3} | ||
!2 = !{i32 1, !"wchar_size", i32 4} | ||
!4 = !{!5, !6, i64 4} | ||
!5 = !{!"_ZTS13BfToBfpParams", !6, i64 0, !6, i64 4, !6, i64 8, !6, i64 12} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: update function name
!1 = !{i32 2, !"Debug Info Version", i32 3} | ||
!2 = !{i32 1, !"wchar_size", i32 4} | ||
!4 = !{!5} | ||
!5 = distinct !{!5, !6, !"_Z14conv2d_genericILh1EL5act_t0ELb0ELb0EEvPu6__bf16S1_S1_S1_R27conv2d_bf16_internal_params10out_mode_t: %input"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: update function name
|
||
|
||
# derived from conv2d_bf16_0 | ||
# Same allocation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHECK: the difference between conv2d_f16.mir and this file seems to be that the WAW dependencies now have "renamed" and "killed" attributes. Thus we don't have to cycle through our pointers, correct?
Maybe add this to the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this example represents whatever the LRU register re-allocator gave us. I'll update the comment. I don't even think we can reach the optimal II here.
; CHECK-NEXT: nop | ||
; CHECK-NEXT: nop | ||
; CHECK-NEXT: vlda.ups.s32.s8 cm0, s0, [p0], #32 | ||
; CHECK-NEXT: vlda.ups.s32.s8 cm1, s0, [p0], #32 | ||
; CHECK-NEXT: nop | ||
; CHECK-NEXT: add.nc lc, r0, #-4 | ||
; CHECK-NEXT: add.nc lc, r0, #-5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check: we reduced pipeline stages here correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now have one more stage, and I am still trying to chase down why that happens
6af25cb
to
584bec1
Compare
@@ -59,6 +59,9 @@ class AIE2Subtarget : public AIE2GenSubtargetInfo, public AIEBaseSubtarget { | |||
StringRef FS, StringRef ABIName, const TargetMachine &TM); | |||
|
|||
bool enableMachineScheduler() const override { return true; } | |||
bool enableMachinePipeliner() const override { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep just the base implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately AIEBaseSubTarget
isn't actually a base class of AIE2Subtarget
@@ -196,44 +196,76 @@ int PostPipeliner::fit(MachineInstr *MI, int First, int Last, int II) { | |||
return -1; | |||
} | |||
|
|||
void PostPipeliner::biasForLocalResourceContention(NodeInfo &NI, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be nice to have a description here, as the post pipeliner is getting bigger.
These are bf16/bfp16 variants for AIE2 or AIE2p
This is to give full access to the Info array and it's associated parameters Co-authored-by: Martien de Jong <[email protected]>
Dump intervals in ascii art
An SU can appear multiple times in the list of preds/succs.
When an iteration does not converge, a problematic instruciton will be identified, and its [Earliest,Latest) range will be tightened.
584bec1
to
d10386e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's get experience with this.
This uses feedback from previous iteration of a strategy to tweak/tighten the
[Earliest, Latest)
range of instructions until a solution is found.This is needed to reach the optimal II for Conv2D_bfp16 in AIE2p