Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delay stack size allocation until after core code generated #1104

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

newling
Copy link
Contributor

@newling newling commented Feb 13, 2025

  1. remove stack_size attribute from aie.core
  2. rename address to stack_relative_address in amdaie.buffer and aie.buffer
  3. compute stack size from .o / .s file.
  4. propagate changes as needed.

This sort of works, for some reason one of the i32 batch matmul tests doesn't like this change though. Also I've hard-coded all chess and ukernel tests to stack-size = 2K, I think it should be determine the stack sizes in these cases too

@@ -484,7 +484,7 @@ def AIE_BufferOp: AIE_Op<"buffer", [
let arguments = (
ins Index:$tile,
OptionalAttr<StrAttr>:$sym_name,
OptionalAttr<I32Attr>:$address,
OptionalAttr<I32Attr>:$stack_relative_address,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this assuming that the stack is located at address 0 and use offsets relative to that? If so, I don't think we should make that assumption. To avoid bank conflicts, I will soon want to be able to assign buffers strategically across the memory banks at specific addresses and an address relative to the stack makes that impossible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So previously if we had stackSize = 1024, address = 1040 Now we have stack_relative_address = 16 and the stack size is set later. We can set stack size to be any value larger than what the final code actually needs (largest offset from stack pointer in object dump).

So we stack size to 1024, the program is completely unchanged from before this PR. The stack is at address 0, I don't see any reason to change that. If you mean you'd like some additional memory left as a gap between the stack and the first buffer, that is easy to insert.

These are the lines where the addresses of buffers are set:

https://github.com/nod-ai/iree-amd-aie/pull/1104/files#diff-5a1e140ca056a489d6cf9766431fb5529180ec410b726425dc8a4208f8af477eL57

So before address was assigned incrementally starting at stackSize. Now, stack_relative_address is assigned incrementally starting at 0.

Can you provide more information of what you have in mind? Also, do you see this PR as moving further from your goal than what we currently have?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with assigning buffers relative to the stack address is that we won't be able to control the exact locations of other buffers, for example:

stack size is 4096, buffer A0 (ping) has size 8192, buffer A1 (pong) has size 8192, buffer B0 has size 8192 etc

Addresses assigned will be:

Stack: 0 (bank 1)
A0: 4096 -> 12228 (bank 1)
A1: 12288 -> 20480 (bank 1 and 2)
B0: 20480 -> 28672 (bank 2)
B1: 28672 -> 36864 (bank 2 and 3)

Now, this will result in two DMAs operating on the same bank 2 at some points + the vector processor as well and this will result in bank conflicts.

What I would like to achieve is something like:

A0: 0 -> 8192 (bank 1)
A1: 8192 -> 16384 (bank 1)
B0: 16384 -> 24576 (bank 2)
B1: 24576 -> 32768 (bank 2)

This assumes the stack is located somewhere else, but even if you would assume the stack is located at address 0 and we would like to use banks 2/3 for the above buffers, that can't be achieved with an address relative to the stack:

stack: 0 (bank 1)
A0: 16384 -> 24576 (bank 2)
A1: 24576 -> 32768 (bank 2)
B0: 32768 -> 40960 (bank 3)
B1: 40960 -> 49152 (bank 3)

I know this is the case as well with the current assignment as the stack is located at address 0 and buffers are appended after that and we will need better assignment strategies to avoid that. But we can't avoid these bank conflicts without granular control of where to locate buffers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I think placing the stack somewhere other than at address 0 will be a hard task, it will be easier (for now) to start assigning buffers from the end of the memory (64K?). And the we start having banks too. Let me play around with this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we would place all buffers after we know all sizes, but that doesn't seem easy right now as the buffer addresses are provided as an input to peano. Alternatively, we can put an upper bound on the stack size and error out if that would be exceeded, which would be an improvement already. Taking that further, we could place the stack buffer at the largest unused data memory block available and error out if that was not enough. This would work as well with more intelligent buffer placement strategies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants