Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delay stack size allocation until after core code generated #1104

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions compiler/plugins/target/AMD-AIE/aie/AIEDialect.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -558,8 +558,11 @@ LogicalResult DMABDOp::verify() {

if (deviceModel.isMemTile(parentTileOp.getCol(), parentTileOp.getRow()) ||
deviceModel.isCoreTile(parentTileOp.getCol(), parentTileOp.getRow())) {
if (auto baseAddr = buffer.getAddress(); baseAddr.has_value()) {
int offsetInBytes = *baseAddr + getOffsetInBytes(*this);
std::optional<uint32_t> maybeStackRelativeAddress =
buffer.getStackRelativeAddress();
if (maybeStackRelativeAddress.has_value()) {
uint32_t baseAddr = maybeStackRelativeAddress.value();
int offsetInBytes = baseAddr + getOffsetInBytes(*this);
if (offsetInBytes % 4) {
return emitOpError(
"bd address must be 4 byte (32b) aligned; got "
Expand Down
4 changes: 2 additions & 2 deletions compiler/plugins/target/AMD-AIE/aie/AIEOps.td
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ def AIE_CoreOp: AIE_Op<"core", [
]>, Results<(outs Index)> {
let arguments = (
ins Index:$tile,
DefaultValuedAttr<I32Attr, "0x400">:$stack_size,
// DefaultValuedAttr<I32Attr, "0x400">:$stack_size,
OptionalAttr<StrAttr>:$link_with,
OptionalAttr<StrAttr>:$elf_file
);
Expand Down Expand Up @@ -484,7 +484,7 @@ def AIE_BufferOp: AIE_Op<"buffer", [
let arguments = (
ins Index:$tile,
OptionalAttr<StrAttr>:$sym_name,
OptionalAttr<I32Attr>:$address,
OptionalAttr<I32Attr>:$stack_relative_address,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this assuming that the stack is located at address 0 and use offsets relative to that? If so, I don't think we should make that assumption. To avoid bank conflicts, I will soon want to be able to assign buffers strategically across the memory banks at specific addresses and an address relative to the stack makes that impossible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So previously if we had stackSize = 1024, address = 1040 Now we have stack_relative_address = 16 and the stack size is set later. We can set stack size to be any value larger than what the final code actually needs (largest offset from stack pointer in object dump).

So we stack size to 1024, the program is completely unchanged from before this PR. The stack is at address 0, I don't see any reason to change that. If you mean you'd like some additional memory left as a gap between the stack and the first buffer, that is easy to insert.

These are the lines where the addresses of buffers are set:

https://github.com/nod-ai/iree-amd-aie/pull/1104/files#diff-5a1e140ca056a489d6cf9766431fb5529180ec410b726425dc8a4208f8af477eL57

So before address was assigned incrementally starting at stackSize. Now, stack_relative_address is assigned incrementally starting at 0.

Can you provide more information of what you have in mind? Also, do you see this PR as moving further from your goal than what we currently have?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with assigning buffers relative to the stack address is that we won't be able to control the exact locations of other buffers, for example:

stack size is 4096, buffer A0 (ping) has size 8192, buffer A1 (pong) has size 8192, buffer B0 has size 8192 etc

Addresses assigned will be:

Stack: 0 (bank 1)
A0: 4096 -> 12228 (bank 1)
A1: 12288 -> 20480 (bank 1 and 2)
B0: 20480 -> 28672 (bank 2)
B1: 28672 -> 36864 (bank 2 and 3)

Now, this will result in two DMAs operating on the same bank 2 at some points + the vector processor as well and this will result in bank conflicts.

What I would like to achieve is something like:

A0: 0 -> 8192 (bank 1)
A1: 8192 -> 16384 (bank 1)
B0: 16384 -> 24576 (bank 2)
B1: 24576 -> 32768 (bank 2)

This assumes the stack is located somewhere else, but even if you would assume the stack is located at address 0 and we would like to use banks 2/3 for the above buffers, that can't be achieved with an address relative to the stack:

stack: 0 (bank 1)
A0: 16384 -> 24576 (bank 2)
A1: 24576 -> 32768 (bank 2)
B0: 32768 -> 40960 (bank 3)
B1: 40960 -> 49152 (bank 3)

I know this is the case as well with the current assignment as the stack is located at address 0 and buffers are appended after that and we will need better assignment strategies to avoid that. But we can't avoid these bank conflicts without granular control of where to locate buffers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I think placing the stack somewhere other than at address 0 will be a hard task, it will be easier (for now) to start assigning buffers from the end of the memory (64K?). And the we start having banks too. Let me play around with this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we would place all buffers after we know all sizes, but that doesn't seem easy right now as the buffer addresses are provided as an input to peano. Alternatively, we can put an upper bound on the stack size and error out if that would be exceeded, which would be an improvement already. Taking that further, we could place the stack buffer at the largest unused data memory block available and error out if that was not enough. This would work as well with more intelligent buffer placement strategies.

OptionalAttr<I32Attr>:$mem_bank
);

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@
#include "AIEDialect.h"
#include "Passes.h"
#include "iree-amd-aie/aie_runtime/iree_aie_runtime.h"
#include "llvm/ADT/Twine.h"
#include "mlir/IR/Attributes.h"
#include "mlir/Pass/Pass.h"

#define DEBUG_TYPE "amdaie-assign-buffers-basic"
Expand Down Expand Up @@ -54,12 +52,10 @@ struct AMDAIEAssignBufferAddressesPassBasic : mlir::OperationPass<DeviceOp> {
static_cast<AMDAIEDevice>(device.getDevice()));
for (auto [tile, buffers] : tileToBuffers) {
// Leave room at the bottom of the address range for stack
int64_t address = 0;
if (auto core = getCoreOp(tile)) address += core.getStackSize();

int64_t stackRelativeAddress = 0;
for (auto buffer : buffers) {
buffer.setAddress(address);
address += getAllocationSize(buffer);
buffer.setStackRelativeAddress(stackRelativeAddress);
stackRelativeAddress += getAllocationSize(buffer);
}

int maxDataMemorySize;
Expand All @@ -69,10 +65,11 @@ struct AMDAIEAssignBufferAddressesPassBasic : mlir::OperationPass<DeviceOp> {
else
maxDataMemorySize =
deviceModel.getLocalMemorySize(tile.getCol(), tile.getRow());
if (address > maxDataMemorySize) {
if (stackRelativeAddress > maxDataMemorySize) {
InFlightDiagnostic error =
tile.emitOpError("allocated buffers exceeded available memory (")
<< address << ">" << maxDataMemorySize << ")\n";
<< stackRelativeAddress << ">" << maxDataMemorySize
<< ") even before taking into account the stack!\n";
return signalPassFailure();
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ static void bufferToStd(ModuleOp module, BufferOp buffer,
rewriter.setInsertionPointToStart(module.getBody());
StringRef symName = name(buffer).getValue();
MemRefType type = llvm::cast<MemRefType>(buffer.getType());

// Don't emit initialization for cores that don't "own" the buffer (to
// prevent duplication in the data section of the elf/object file)
rewriter.create<memref::GlobalOp>(
Expand Down
8 changes: 4 additions & 4 deletions compiler/plugins/target/AMD-AIE/aie/test/basic.mlir
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@

// CHECK-LABEL: aie.device(xcve2302) {
// CHECK: %[[TILE_2_1:.*]] = aie.tile(2, 1)
// CHECK: %[[IN:.*]] = aie.buffer(%[[TILE_2_1]]) {address = 8192 : i32, sym_name = "in"} : memref<16xi32>
// CHECK: %[[OUT:.*]] = aie.buffer(%[[TILE_2_1]]) {address = 1824 : i32, sym_name = "out"} : memref<16xi32>
// CHECK: %[[IN:.*]] = aie.buffer(%[[TILE_2_1]]) {stack_relative_address = 8192 : i32, sym_name = "in"} : memref<16xi32>
// CHECK: %[[OUT:.*]] = aie.buffer(%[[TILE_2_1]]) {stack_relative_address = 1824 : i32, sym_name = "out"} : memref<16xi32>
// CHECK: %[[LOCK_2_1:.*]] = aie.lock(%[[TILE_2_1]], 0) {init = 1 : i8}
// CHECK: %[[LOCK_2_1_0:.*]] = aie.lock(%[[TILE_2_1]], 1)
// CHECK: %[[LOCK_2_1_1:.*]] = aie.lock(%[[TILE_2_1]], 2) {init = 1 : i8}
Expand Down Expand Up @@ -45,8 +45,8 @@
module @aie_module {
aie.device(xcve2302) {
%t01 = aie.tile(2, 1)
%buf01_0 = aie.buffer(%t01) { address = 8192 : i32, sym_name = "in" } : memref<16xi32>
%buf01_1 = aie.buffer(%t01) { address = 1824 : i32, sym_name = "out" } : memref<16xi32>
%buf01_0 = aie.buffer(%t01) { stack_relative_address = 8192 : i32, sym_name = "in" } : memref<16xi32>
%buf01_1 = aie.buffer(%t01) { stack_relative_address = 1824 : i32, sym_name = "out" } : memref<16xi32>
%l01_0 = aie.lock(%t01, 0) { init = 1 : i8 }
%l01_1 = aie.lock(%t01, 1)
%l01_2 = aie.lock(%t01, 2) { init = 1 : i8 }
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

// CHECK-LABEL: aie.device(xcve2302) {
// CHECK: %[[TILE_3_1:.*]] = aie.tile(3, 1)
// CHECK: %[[A:.*]] = aie.buffer(%[[TILE_3_1]]) {address = 0 : i32, sym_name = "a"} : memref<65536xi32>
// CHECK: %[[A:.*]] = aie.buffer(%[[TILE_3_1]]) {stack_relative_address = 0 : i32, sym_name = "a"} : memref<65536xi32>
// CHECK: %[[MEMTILE_DMA_3_1:.*]] = aie.memtile_dma(%[[TILE_3_1]]) {
// CHECK: aie.end
// CHECK: }
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@

// CHECK-LABEL: aie.device(xcvc1902) {
// CHECK: %[[TILE_3_3:.*]] = aie.tile(3, 3)
// CHECK: %[[A:.*]] = aie.buffer(%[[TILE_3_3]]) {address = 1024 : i32, sym_name = "a"} : memref<16xi8>
// CHECK: %[[B:.*]] = aie.buffer(%[[TILE_3_3]]) {address = 1040 : i32, sym_name = "b"} : memref<512xi32>
// CHECK: %[[C:.*]] = aie.buffer(%[[TILE_3_3]]) {address = 3088 : i32, sym_name = "c"} : memref<16xi16>
// CHECK: %[[A:.*]] = aie.buffer(%[[TILE_3_3]]) {stack_relative_address = 0 : i32, sym_name = "a"} : memref<16xi8>
// CHECK: %[[B:.*]] = aie.buffer(%[[TILE_3_3]]) {stack_relative_address = 16 : i32, sym_name = "b"} : memref<512xi32>
// CHECK: %[[C:.*]] = aie.buffer(%[[TILE_3_3]]) {stack_relative_address = 2064 : i32, sym_name = "c"} : memref<16xi16>
// CHECK: %[[TILE_4_4:.*]] = aie.tile(4, 4)
// CHECK: %[[VAL_0:.*]] = aie.buffer(%[[TILE_4_4]]) {address = 1024 : i32, sym_name = "_anonymous0"} : memref<500xi32>
// CHECK: %[[VAL_0:.*]] = aie.buffer(%[[TILE_4_4]]) {stack_relative_address = 0 : i32, sym_name = "_anonymous0"} : memref<500xi32>
// CHECK: %[[CORE_3_3:.*]] = aie.core(%[[TILE_3_3]]) {
// CHECK: aie.end
// CHECK: }
Expand Down
8 changes: 4 additions & 4 deletions compiler/plugins/target/AMD-AIE/aie/test/user_assigned.mlir
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@

// CHECK-LABEL: aie.device(xcve2302) {
// CHECK: %[[TILE_2_1:.*]] = aie.tile(2, 1)
// CHECK: %[[IN:.*]] = aie.buffer(%[[TILE_2_1]]) {address = 8192 : i32, sym_name = "in"} : memref<16xi32>
// CHECK: %[[OUT:.*]] = aie.buffer(%[[TILE_2_1]]) {address = 1824 : i32, sym_name = "out"} : memref<16xi32>
// CHECK: %[[IN:.*]] = aie.buffer(%[[TILE_2_1]]) {stack_relative_address = 8192 : i32, sym_name = "in"} : memref<16xi32>
// CHECK: %[[OUT:.*]] = aie.buffer(%[[TILE_2_1]]) {stack_relative_address = 1824 : i32, sym_name = "out"} : memref<16xi32>
// CHECK: %[[LOCK_2_1:.*]] = aie.lock(%[[TILE_2_1]], 0) {init = 1 : i8}
// CHECK: %[[LOCK_2_1_0:.*]] = aie.lock(%[[TILE_2_1]], 1)
// CHECK: %[[LOCK_2_1_1:.*]] = aie.lock(%[[TILE_2_1]], 2) {init = 1 : i8}
Expand Down Expand Up @@ -45,8 +45,8 @@
module @aie_module {
aie.device(xcve2302) {
%t01 = aie.tile(2, 1)
%buf01_0 = aie.buffer(%t01) { address = 8192 : i32, sym_name = "in" } : memref<16xi32>
%buf01_1 = aie.buffer(%t01) { address = 1824 : i32, sym_name = "out" } : memref<16xi32>
%buf01_0 = aie.buffer(%t01) { stack_relative_address = 8192 : i32, sym_name = "in" } : memref<16xi32>
%buf01_1 = aie.buffer(%t01) { stack_relative_address = 1824 : i32, sym_name = "out" } : memref<16xi32>
%l01_0 = aie.lock(%t01, 0) { init = 1 : i8 }
%l01_1 = aie.lock(%t01, 1)
%l01_2 = aie.lock(%t01, 2) { init = 1 : i8 }
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,8 @@ def AMDAIE_BufferOp: AMDAIE_Op<"buffer", [
let summary = "Represents a buffer on an AIE tile.";
let description = [{
This operation represents a buffer on an AIE tile. The buffer can have an
optional address, indicating the location of the buffer on the tile.
optional address, indicating the location of the buffer on the tile relative
to the end of the stack.

Example:

Expand All @@ -235,13 +236,13 @@ def AMDAIE_BufferOp: AMDAIE_Op<"buffer", [

let arguments = (
ins Index:$tile,
OptionalAttr<UI32Attr>:$address
OptionalAttr<UI32Attr>:$stack_relative_address
);

let results = (outs AnyMemRef:$buffer);

let assemblyFormat = [{
`(` $tile (`,` $address^)? `)` attr-dict `:` type($buffer)
`(` $tile (`,` $stack_relative_address^)? `)` attr-dict `:` type($buffer)
}];
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,8 @@ Lock::Action toLock(LockAction l) {
}

LogicalResult configureLocksAndBd(Block &block, const TileLoc &tileLoc,
const AMDAIEDeviceModel &deviceModel) {
const AMDAIEDeviceModel &deviceModel,
int stackSize) {
FailureOr<XAie_DmaDesc> dmaTileBd = initDMADesc(deviceModel, tileLoc);
if (failed(dmaTileBd)) return failure();
std::optional<int> acqValue, relValue, acqLockId, relLockId;
Expand Down Expand Up @@ -201,8 +202,13 @@ LogicalResult configureLocksAndBd(Block &block, const TileLoc &tileLoc,
}

BufferOp bufferOp = cast<BufferOp>(bdOp.getBuffer().getDefiningOp());
if (!bufferOp.getAddress())
return bufferOp.emitError("buffer must have address assigned");
if (!bufferOp.getStackRelativeAddress().has_value()) {
return bufferOp.emitOpError(
"does not have an address relative to the end of stack assigned, "
"required at this point.");
}
auto addressRelativeToStack = bufferOp.getStackRelativeAddress().value();

// Convert `xilinx::AIE::BDDimLayoutAttr` to
// `mlir::iree_compiler::AMDAIE::BDDimLayout`.
std::optional<std::vector<BDDimLayout>> maybeDims;
Expand Down Expand Up @@ -231,11 +237,12 @@ LogicalResult configureLocksAndBd(Block &block, const TileLoc &tileLoc,
? std::optional<uint8_t>{static_cast<uint8_t>(*bdOp.getNextBdId())}
: std::nullopt;
std::optional<BDIterLayout> maybeIter = std::nullopt;

if (failed(configureDMABD(deviceModel, dmaTileBd.value(), tileLoc, validBd,
static_cast<uint8_t>(*bdOp.getBdId()), enableNextBd,
nextBdId, enablePacket, packetType, packetID,
*bufferOp.getAddress(), getLenInBytes(bdOp),
getOffsetInBytes(bdOp),
stackSize + addressRelativeToStack,
getLenInBytes(bdOp), getOffsetInBytes(bdOp),
getBufferElementTypeWidthInBytes(bdOp), maybeDims,
maybePadDims, maybeIter))) {
return failure();
Expand All @@ -244,7 +251,7 @@ LogicalResult configureLocksAndBd(Block &block, const TileLoc &tileLoc,
}

LogicalResult addInitConfig(const AMDAIEDeviceModel &deviceModel,
DeviceOp &device) {
DeviceOp &device, int stackSize) {
// Reset and unreset all cores.
for (auto tileOp : device.getOps<TileOp>()) {
TileLoc tileLoc = {tileOp.getCol(), tileOp.getRow()};
Expand Down Expand Up @@ -285,8 +292,9 @@ LogicalResult addInitConfig(const AMDAIEDeviceModel &deviceModel,
// Handle DMA ops separately.
for (Block &block : memOp->getRegion(0)) {
if (block.getOps<DMABDOp>().empty()) continue;
if (failed(configureLocksAndBd(block, tileLoc, deviceModel)))
if (failed(configureLocksAndBd(block, tileLoc, deviceModel, stackSize))) {
return failure();
}
}

for (Block &block : memOp->getRegion(0)) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ LogicalResult addAllCoreEnable(const AMDAIEDeviceModel &deviceModel,
/// Utility function to reset all cores, initialize hardware locks,
/// and configure all switchboxes.
LogicalResult addInitConfig(const AMDAIEDeviceModel &deviceModel,
xilinx::AIE::DeviceOp &device);
xilinx::AIE::DeviceOp &device, int stackSize);

} // namespace mlir::iree_compiler::AMDAIE

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ std::string utohexstr(uint32_t u) { return "0x" + llvm::utohexstr(u); }
namespace mlir::iree_compiler::AMDAIE {

LogicalResult AIETranslateToBCF(DeviceOp deviceOp, raw_ostream &output,
int tileCol, int tileRow) {
int tileCol, int tileRow, int stackSize) {
DenseMap<TileLoc, Operation *> tiles;
DenseMap<Operation *, SmallVector<BufferOp, 4>> buffers;

Expand Down Expand Up @@ -51,11 +51,9 @@ LogicalResult AIETranslateToBCF(DeviceOp deviceOp, raw_ostream &output,
output << "_reserved DMb 0x00000 " << initReserved
<< " // Don't put data in code memory\n";

int stacksize = 0;
if (auto core = getCoreOp(tile)) stacksize = core.getStackSize();
output << "_stack DM_stack "
<< utohexstr(deviceModel.getMemInternalBaseAddress()) << " "
<< utohexstr(stacksize) << " // stack for core\n";
<< utohexstr(stackSize) << " // stack for core\n";

auto doBuffer = [&](std::optional<TileLoc> tile, int offset,
const std::string &dir) {
Expand All @@ -74,7 +72,9 @@ LogicalResult AIETranslateToBCF(DeviceOp deviceOp, raw_ostream &output,
if (tiles.count(TileLoc(*tile))) {
for (auto buf : buffers[tiles[TileLoc(*tile)]]) {
std::string bufName(name(buf).getValue());
int bufferBaseAddr = buf.getAddress().value();
int bufferBaseAddr =
buf.getStackRelativeAddress().value() + stackSize;

int numBytes = getAllocationSize(buf);
output << "_symbol " << bufName << " "
<< utohexstr(offset + bufferBaseAddr) << " " << numBytes
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,18 +27,18 @@ using Path = std::filesystem::path;
namespace mlir::iree_compiler::AMDAIE {
LogicalResult generateCDOBinariesSeparately(
const AMDAIEDeviceModel &deviceModel, const Path &workDirPath,
DeviceOp &device, bool aieSim, bool enableCores) {
DeviceOp &device, bool aieSim, bool enableCores, int stackSize) {
if (failed(generateCDOBinary(workDirPath / "aie_cdo_elfs.bin",
[&deviceModel, &device, &workDirPath, &aieSim] {
return addAllAieElfs(deviceModel, device,
workDirPath, aieSim);
})))
return failure();

if (failed(generateCDOBinary(workDirPath / "aie_cdo_init.bin",
[&deviceModel, &device] {
return addInitConfig(deviceModel, device);
})))
if (failed(generateCDOBinary(
workDirPath / "aie_cdo_init.bin", [&deviceModel, &device, stackSize] {
return addInitConfig(deviceModel, device, stackSize);
})))
return failure();

if (enableCores && !device.getOps<CoreOp>().empty() &&
Expand All @@ -53,15 +53,15 @@ LogicalResult generateCDOBinariesSeparately(

LogicalResult AIETranslateToCDODirect(xilinx::AIE::DeviceOp device,
llvm::StringRef workDirPath,
bool bigEndian, bool emitUnified,
bool cdoDebug, bool aieSim,
bool enableCores) {
int stackSize, bool bigEndian,
bool emitUnified, bool cdoDebug,
bool aieSim, bool enableCores) {
AMDAIEDeviceModel deviceModel = getDeviceModel(device.getDevice());
byte_ordering endianness =
bigEndian ? byte_ordering::Big_Endian : byte_ordering::Little_Endian;
DEBUG_WITH_TYPE("aie-cdo-driver-debug", cdoDebug = true);
initializeCDOGenerator(endianness, cdoDebug);
return generateCDOBinariesSeparately(deviceModel, Path(workDirPath.str()),
device, aieSim, enableCores);
device, aieSim, enableCores, stackSize);
}
} // namespace mlir::iree_compiler::AMDAIE
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,16 @@ using namespace xilinx::AIE;
// are accessed from.
static void writeLDScriptMap(raw_ostream &output, BufferOp buf, int offset) {
std::string bufName(name(buf).getValue());
int bufferBaseAddr = buf.getAddress().value();
int bufferBaseAddr = buf.getStackRelativeAddress().value();
int numBytes = getAllocationSize(buf);
output << ". = 0x" << llvm::utohexstr(offset + bufferBaseAddr) << ";\n";
output << bufName << " = .;\n";
output << ". += 0x" << llvm::utohexstr(numBytes) << ";\n";
}

LogicalResult mlir::iree_compiler::AMDAIE::AIETranslateToLdScript(
DeviceOp deviceOp, raw_ostream &output, int tileCol, int tileRow) {
DeviceOp deviceOp, raw_ostream &output, int tileCol, int tileRow,
int stackSize) {
DenseMap<TileLoc, Operation *> tiles;
DenseMap<Operation *, SmallVector<BufferOp, 4>> buffers;

Expand All @@ -37,10 +38,9 @@ LogicalResult mlir::iree_compiler::AMDAIE::AIETranslateToLdScript(
TileLoc srcCoord = {tile.getCol(), tile.getRow()};

// Figure out how much memory we have left for random allocations
auto core = getCoreOp(tile);
int max = core.getStackSize();
int max = stackSize;
for (auto buf : buffers[tiles[srcCoord]]) {
int bufferBaseAddr = buf.getAddress().value();
int bufferBaseAddr = buf.getStackRelativeAddress().value() + stackSize;
int numBytes = getAllocationSize(buf);
max = std::max(max, bufferBaseAddr + numBytes);
}
Expand Down Expand Up @@ -92,12 +92,13 @@ SECTIONS
*(.chesstypeannotationtab)
}
)THESCRIPT";

auto doBuffer = [&](std::optional<TileLoc> tile, int offset,
const std::string &dir) {
if (tile) {
if (tiles.count({tile->col, tile->row}))
for (auto buf : buffers[tiles[{tile->col, tile->row}]])
writeLDScriptMap(output, buf, offset);
writeLDScriptMap(output, buf, offset + stackSize);
} else {
output << "/* No tile with memory exists to the " << dir << ". */\n";
output << ". = 0x" << llvm::utohexstr(offset) << ";\n";
Expand All @@ -113,8 +114,7 @@ SECTIONS
output << "_sp_start_value_DM_stack = .;\n";

if (auto core = getCoreOp(tile))
output << ". += 0x" << llvm::utohexstr(core.getStackSize())
<< "; /* stack */\n";
output << ". += 0x" << llvm::utohexstr(stackSize) << "; /* stack */\n";
else
output << "/* no stack allocated */\n";

Expand Down
Loading
Loading