Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SplitLogicalObjFifo] Fix split-logicalobjfifo pass to analyse unique producers/consumers ObjFifos #1060

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

Abhishek-Varma
Copy link
Contributor

@Abhishek-Varma Abhishek-Varma commented Jan 27, 2025

In order to decide the split factor we were basing the inference solely on the number of columns available.
Because of this, for 4x8 array and for the new pipeline, we were splitting L2 buffers :-

  1. LHS - 8 times.
  2. RHS - 8 times.
  3. OUT - 8 times.

As a result, the tiles being assigned were :-

  1. LHS : (0,0) -> (7,0)
  2. RHS : (0,0) -> (7,0)
  3. OUT : (0,0) -> (7,0)

This causes exhaustion of DMA channels. Refer to this thread for the discussion thread.

The L2 buffer split should be :-

  1. LHS : (0,0) -> (3,0)
  2. RHS : (0,0) -> (7,0)
  3. OUT : (0,0) -> (7,0)

So that later on when the tiles are being assigned, the expected no. of tile assignments for LHS/RHS/OUT matches the corresponding L2 buffers.

This PR aims to analyse the number of unique producers/consumers ObjFifos for the ObjFifo being split..

e2e CI test for Matmul both with/without ukernel via pack-peel-4-level-tiling pipeline targeting 4x8 array on Strix have been added.

Signed-off-by: Abhishek Varma [email protected]

@Abhishek-Varma Abhishek-Varma changed the title [DO NOT REVIEW] Fix split-logicalobjfifo pass to analyse unique L2<->L1 DMAs [SplitLogicalObjFifo] Fix split-logicalobjfifo pass to analyse unique L2<->L1 DMAs Jan 27, 2025
@Abhishek-Varma Abhishek-Varma marked this pull request as ready for review January 27, 2025 17:37
@Abhishek-Varma Abhishek-Varma force-pushed the avarma_fix_split_lof_for_new_pipeline branch from def3a83 to 0904a94 Compare January 29, 2025 07:01
@Abhishek-Varma Abhishek-Varma requested a review from jtuyls January 29, 2025 11:06
@Abhishek-Varma Abhishek-Varma force-pushed the avarma_fix_split_lof_for_new_pipeline branch from 26ef401 to 1703271 Compare January 29, 2025 11:33
@Abhishek-Varma Abhishek-Varma requested a review from jtuyls January 29, 2025 14:48
@Abhishek-Varma Abhishek-Varma force-pushed the avarma_fix_split_lof_for_new_pipeline branch from 531ee50 to da1c62d Compare January 29, 2025 14:49
@Abhishek-Varma Abhishek-Varma force-pushed the avarma_fix_split_lof_for_new_pipeline branch from 2e57be9 to 8c3f762 Compare January 29, 2025 15:54
@Abhishek-Varma Abhishek-Varma requested a review from jtuyls January 29, 2025 15:54
Copy link
Contributor

@yzhang93 yzhang93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This revision is much more concise and cleaner. I have more comments on the test.

@Abhishek-Varma Abhishek-Varma force-pushed the avarma_fix_split_lof_for_new_pipeline branch from 3594447 to 42868e6 Compare January 30, 2025 06:30
@Abhishek-Varma Abhishek-Varma requested a review from jtuyls January 30, 2025 10:23
Copy link
Collaborator

@jtuyls jtuyls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one nit and also update the PR title to producers/consumers instead of L1/L2.

@Abhishek-Varma Abhishek-Varma changed the title [SplitLogicalObjFifo] Fix split-logicalobjfifo pass to analyse unique L2<->L1 DMAs [SplitLogicalObjFifo] Fix split-logicalobjfifo pass to analyse unique producers/consumers DMAs Jan 30, 2025
@Abhishek-Varma Abhishek-Varma changed the title [SplitLogicalObjFifo] Fix split-logicalobjfifo pass to analyse unique producers/consumers DMAs [SplitLogicalObjFifo] Fix split-logicalobjfifo pass to analyse unique producers/consumers ObjFifos Jan 30, 2025
Copy link
Contributor

@yzhang93 yzhang93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! I have some final comments.

Comment on lines +501 to +504
// Although we have 8 columns, L2 LHS buffers needs to be split into only 1, L2 RHS into 2 and
// L2 OUT into 1.
// This is because we decide the split factor for the L2 ObjectFifo depending on :-
// GCD(unique producer/consumer for the respective ObjectFifos being split, number of columns)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description is much better. But I think if the split factor is 1, it's not splitting. Besides, you could emphasize the purpose of test. Something like:

This test demonstrates the case when the factor is not simply decided by the number of columns but the number of unique producers/consumers. In the example, although we are using 8 AIE columns, L2 LHS and output buffers are not split because there's only one producer/consumer, while L2 RHS buffer is split into 2 because there are 2 producers/consumers.

// CHECK: amdaie.dma_cpy_nd(%{{.*}}[0, 0] [256, 512] [4096, 1], %[[LOF_OUT_L2]][0, 0, 0, 0] [8, 32, 16, 32] [1024, 32, 8192, 1]) :
// CHECK: }
#executable_target_amdaie_pdi_fb = #hal.executable.target<"amd-aie", "amdaie-pdi-fb", {num_cols = 8 : i32, num_rows = 4 : i32, target_device = "npu4", ukernels = "none"}>
#translation = #iree_codegen.translation_info<pipeline = Custom>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: #translation can be removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants