Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIEX] Iterative feedback-driven post-pipeliner #359

Merged
merged 9 commits into from
Feb 20, 2025

Conversation

gbossu
Copy link
Collaborator

@gbossu gbossu commented Feb 17, 2025

This uses feedback from previous iteration of a strategy to tweak/tighten the [Earliest, Latest) range of instructions until a solution is found.

This is needed to reach the optimal II for Conv2D_bfp16 in AIE2p

@gbossu gbossu force-pushed the gaetan.postpipeliner.iterative branch from 165bb63 to 6af25cb Compare February 17, 2025 14:19
@gbossu gbossu changed the title [AIEX] Iterative feedback-drive post-pipeliner [AIEX] Iterative feedback-driven post-pipeliner Feb 18, 2025
@@ -59,6 +59,9 @@ class AIE2Subtarget : public AIE2GenSubtargetInfo, public AIEBaseSubtarget {
StringRef FS, StringRef ABIName, const TargetMachine &TM);

bool enableMachineScheduler() const override { return true; }
bool enableMachinePipeliner() const override {
return AIEBaseSubtarget::enableMachinePipeliner();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CHECK: we just disable the pre-pipeliner, not the prescheduler. And 'forcing' assumes infinite willingness on the part of the postpipeliner.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... but the prescheduler follows the pre-pipeliner

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I'll add a small comment

auto [It, Inserted] = UniqueAncestors.insert(P);
if (Inserted) {
Slots += Pred.Slots;
Count++;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Count could be replaced with UniqueAncestors.size()

@@ -633,6 +633,16 @@ bool PostPipeliner::scheduleOtherIterations(PostPipelinerStrategy &Strategy) {
return true;
}

int getMinOutputLat(ArrayRef<SDep> Nodes) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: More descriptive name for Nodes. SuccDeps?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, this is a general helper that just returns the minimum output latency out of the given edges. So I think it makes sense to keep the parameter generic as well. I can maybe rename to Edges?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, makes sense, Edges or Deps would probably not have triggered my comment.

{1, true, false, HeuristicRuns, {Prio::Critical, Prio::LCDLatest}},
{1, true, false, HeuristicRuns, {Prio::Liveness, Prio::Latest}},
{1, true, false, HeuristicRuns, {Prio::Latest, Prio::Liveness}},
// Bottom-up strategies
{0, false, false, 2, {Prio::Critical, Prio::LCDLatest}},
{1, false, false, 2, {Prio::Critical, Prio::LCDLatest}},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: we should probably weed and sort with respect to effectiveness at some point.
In particular, I hope that just NodeNum would be one of the lesser effective ones, and should be moved down. Also Critical + Latest might cover all of just Critical.

Copy link
Collaborator Author

@gbossu gbossu Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, one of my plans would be to add "optimization remarks" for whatever strategy was picked. Then we can derive what works better.

I could also maybe have a mode that runs all of the heuristics for a given II, even after one has succeeded. The point would be to find the one that converges faster.

(In a future PR, maybe 😄)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the latter one would be nice. it would list all heuristics that found the best II. Totalling that over a number of represesentative benchmarks would give a good score.

Copy link
Collaborator

@martien-de-jong martien-de-jong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no real problems. Shout if have finalized for a formal approval.

AIELoopUtils::getPipelinerDisabled(*Block);
if (!Block)
return false;
bool PrePipelinerDisabled =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const bool PrePipelinerDisabled

!1 = !{i32 2, !"Debug Info Version", i32 3}
!2 = !{i32 1, !"wchar_size", i32 4}
!4 = !{!5, !6, i64 4}
!5 = !{!"_ZTS13BfToBfpParams", !6, i64 0, !6, i64 4, !6, i64 8, !6, i64 12}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: update function name

!1 = !{i32 2, !"Debug Info Version", i32 3}
!2 = !{i32 1, !"wchar_size", i32 4}
!4 = !{!5}
!5 = distinct !{!5, !6, !"_Z14conv2d_genericILh1EL5act_t0ELb0ELb0EEvPu6__bf16S1_S1_S1_R27conv2d_bf16_internal_params10out_mode_t: %input"}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: update function name



# derived from conv2d_bf16_0
# Same allocation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CHECK: the difference between conv2d_f16.mir and this file seems to be that the WAW dependencies now have "renamed" and "killed" attributes. Thus we don't have to cycle through our pointers, correct?
Maybe add this to the comment

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this example represents whatever the LRU register re-allocator gave us. I'll update the comment. I don't even think we can reach the optimal II here.

; CHECK-NEXT: nop
; CHECK-NEXT: nop
; CHECK-NEXT: vlda.ups.s32.s8 cm0, s0, [p0], #32
; CHECK-NEXT: vlda.ups.s32.s8 cm1, s0, [p0], #32
; CHECK-NEXT: nop
; CHECK-NEXT: add.nc lc, r0, #-4
; CHECK-NEXT: add.nc lc, r0, #-5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check: we reduced pipeline stages here correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now have one more stage, and I am still trying to chase down why that happens

@gbossu gbossu force-pushed the gaetan.postpipeliner.iterative branch from 6af25cb to 584bec1 Compare February 19, 2025 11:01
@@ -59,6 +59,9 @@ class AIE2Subtarget : public AIE2GenSubtargetInfo, public AIEBaseSubtarget {
StringRef FS, StringRef ABIName, const TargetMachine &TM);

bool enableMachineScheduler() const override { return true; }
bool enableMachinePipeliner() const override {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep just the base implementation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately AIEBaseSubTarget isn't actually a base class of AIE2Subtarget

@@ -196,44 +196,76 @@ int PostPipeliner::fit(MachineInstr *MI, int First, int Last, int II) {
return -1;
}

void PostPipeliner::biasForLocalResourceContention(NodeInfo &NI,
Copy link
Collaborator

@andcarminati andcarminati Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be nice to have a description here, as the post pipeliner is getting bigger.

gbossu and others added 8 commits February 19, 2025 15:51
These are bf16/bfp16 variants for AIE2 or AIE2p
This is to give full access to the Info array and it's associated parameters

Co-authored-by: Martien de Jong <[email protected]>
An SU can appear multiple times in the list of preds/succs.
When an iteration does not converge, a problematic instruciton will be
identified, and its [Earliest,Latest) range will be tightened.
@gbossu gbossu force-pushed the gaetan.postpipeliner.iterative branch from 584bec1 to d10386e Compare February 19, 2025 16:22
Copy link
Collaborator

@martien-de-jong martien-de-jong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's get experience with this.

@gbossu gbossu merged commit 0f7b86b into aie-public Feb 20, 2025
6 checks passed
@gbossu gbossu deleted the gaetan.postpipeliner.iterative branch February 20, 2025 10:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants