-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delay stack size allocation until after core code generated #1104
base: main
Are you sure you want to change the base?
Conversation
@@ -484,7 +484,7 @@ def AIE_BufferOp: AIE_Op<"buffer", [ | |||
let arguments = ( | |||
ins Index:$tile, | |||
OptionalAttr<StrAttr>:$sym_name, | |||
OptionalAttr<I32Attr>:$address, | |||
OptionalAttr<I32Attr>:$stack_relative_address, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this assuming that the stack is located at address 0 and use offsets relative to that? If so, I don't think we should make that assumption. To avoid bank conflicts, I will soon want to be able to assign buffers strategically across the memory banks at specific addresses and an address relative to the stack makes that impossible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So previously if we had stackSize = 1024, address = 1040
Now we have stack_relative_address = 16
and the stack size is set later. We can set stack size to be any value larger than what the final code actually needs (largest offset from stack pointer in object dump).
So we stack size to 1024, the program is completely unchanged from before this PR. The stack is at address 0, I don't see any reason to change that. If you mean you'd like some additional memory left as a gap between the stack and the first buffer, that is easy to insert.
These are the lines where the addresses of buffers are set:
So before address
was assigned incrementally starting at stackSize
. Now, stack_relative_address
is assigned incrementally starting at 0.
Can you provide more information of what you have in mind? Also, do you see this PR as moving further from your goal than what we currently have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with assigning buffers relative to the stack address is that we won't be able to control the exact locations of other buffers, for example:
stack size is 4096, buffer A0 (ping) has size 8192, buffer A1 (pong) has size 8192, buffer B0 has size 8192 etc
Addresses assigned will be:
Stack: 0 (bank 1)
A0: 4096 -> 12228 (bank 1)
A1: 12288 -> 20480 (bank 1 and 2)
B0: 20480 -> 28672 (bank 2)
B1: 28672 -> 36864 (bank 2 and 3)
Now, this will result in two DMAs operating on the same bank 2 at some points + the vector processor as well and this will result in bank conflicts.
What I would like to achieve is something like:
A0: 0 -> 8192 (bank 1)
A1: 8192 -> 16384 (bank 1)
B0: 16384 -> 24576 (bank 2)
B1: 24576 -> 32768 (bank 2)
This assumes the stack is located somewhere else, but even if you would assume the stack is located at address 0 and we would like to use banks 2/3 for the above buffers, that can't be achieved with an address relative to the stack:
stack: 0 (bank 1)
A0: 16384 -> 24576 (bank 2)
A1: 24576 -> 32768 (bank 2)
B0: 32768 -> 40960 (bank 3)
B1: 40960 -> 49152 (bank 3)
I know this is the case as well with the current assignment as the stack is located at address 0 and buffers are appended after that and we will need better assignment strategies to avoid that. But we can't avoid these bank conflicts without granular control of where to locate buffers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I think placing the stack somewhere other than at address 0 will be a hard task, it will be easier (for now) to start assigning buffers from the end of the memory (64K?). And the we start having banks too. Let me play around with this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, we would place all buffers after we know all sizes, but that doesn't seem easy right now as the buffer addresses are provided as an input to peano. Alternatively, we can put an upper bound on the stack size and error out if that would be exceeded, which would be an improvement already. Taking that further, we could place the stack buffer at the largest unused data memory block available and error out if that was not enough. This would work as well with more intelligent buffer placement strategies.
address
tostack_relative_address
in amdaie.buffer and aie.bufferThis sort of works, for some reason one of the i32 batch matmul tests doesn't like this change though. Also I've hard-coded all chess and ukernel tests to stack-size = 2K, I think it should be determine the stack sizes in these cases too