-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIE2] Optimize 2D/3D memory operations #145
Conversation
const TargetInstrInfo &TII = *MBB.getParent()->getSubtarget().getInstrInfo(); | ||
bool Changed = false; | ||
|
||
// TODO: LoopUtils? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can postpone this refactoring for now to avoid conflicts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now have llvm/lib/Target/AIE/Utils/AIELoopUtils.h
. Maybe we should move the helper there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this lambda as a function to AIELoopUtils.
6063b35
to
56e87c3
Compare
Some QoR results: With
With unrolling enabled:
|
for (auto &RegMILoad : LoadUses) { | ||
Register Reg = RegMILoad.first; | ||
if (!UsedByStore.contains(Reg)) | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think checking for uses by both load and stores is a good enough heuristic, but I'm wondering if we could not be a bit more precise and detect the case where the iterators will be duplicated inside the loop.
I guess we have something like
loop:
%0:edc = PHI [entry, ...], [loop, %4]
%10:edc = PHI [entry, ...], [loop, %14]
%1:ed = reg_seq %100, %101, %102, %0
%2:vec256, %3:ep, %4:edc = VLD_2D %200:ep %1:ed
%11:ed = reg_seq %100, %101, %102, %10
%13:ep, %14:edc = VST_2D %200:ep %11:ed, ...
We could notice that the %100, %101, %102
components are used in iterators whose counter comes from different PHI nodes. I think this means the regcoalescer/twoaddresselimination will fail and will need to insert 3 copies to materializxe the %11
reg_sequence.
Food for thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @gbossu, this case is already covered and this pseudo code looks like the one that we have in the test ;-). We filter off for duplication registers that are defined inside the MBB. Maybe with the refactoring this cases will be a bit more clear.
56e87c3
to
b9722fa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! As mentioned in #145 (comment), I think checking for uses of iterators by both load and stores is a simple and good enough heuristic. We can always improve it if needed in the future.
I left some nits, address up to taste
b9722fa
to
8a1f34e
Compare
Thank you very much for the suggestions. I addressed all of them. |
8a1f34e
to
1d783a1
Compare
1d783a1
to
83a4e79
Compare
83a4e79
to
e37f896
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some nits, but I think it's good to go.
|
||
bool Changed = false; | ||
for (auto &RegMILoad : LoadUses) { | ||
Register Reg = RegMILoad.first; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: const Reg
|
||
// Used by both loads and stores. We will duplicate for | ||
// load use. | ||
MachineInstr *DefReg = MRI.getUniqueVRegDef(Reg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit const DefReg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This const is not possible because we need a non-const iterator.
if (UseMI->mayLoad()) | ||
LoadUses[Reg].insert(&MI); | ||
else | ||
UsedByStore.insert(Reg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically, the use could be in the value operand of the store, or not a store at all. If this is still the right action, the name of the set might be more generically NonLoadUses.
I can also imagine that we collect MI and NonLoadUses in one map indexed by Reg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed the set to NonLoadUses, because if can hold some padd cases, if we fail to combine.
93e9aeb
e37f896
to
93e9aeb
Compare
In AIEBaseRegisterInfo
This for loop blocks with shared virtual addressing regs between loads and stores.
93e9aeb
to
a54bbe8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still looks good to me :)
For loop blocks with a small number of memory operations to avoid spills. This PR clear a bit inner loops to help PostRA scheduling optimizations.
QoR information:
fno-unroll-loops
is used, we can observe changes like (Reduce*):To: