Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[shortfin] Implement async alloc/dealloc of buffers. #507

Merged
merged 1 commit into from
Feb 19, 2025

Conversation

stellaraccident
Copy link
Contributor

@stellaraccident stellaraccident commented Nov 14, 2024

  • Device allocations are now async, queue ordered alloc/dealloc.
  • Program invocations asynchronously deallocate function call results if it can. If it ever cannot, then a small tracy zone SyncImportTimelineResource will be emitted per result that cannot be async deallocated.
  • Adds ProgramInvocation.assume_no_alias instance boolean to disable the assumption which allows async deallocation to work.
  • Adds global ProgramIncovation.global_no_alias property to control process-wide.

This is a very fiddly optimization which requires (esp in multi-device cases) a number of things to line up. Tested on amdgpu and CPU with a number of sample workloads (with logging enabled and visually confirmed).

See #980 for detailed analysis and further work required.

@stellaraccident stellaraccident force-pushed the shortfin_async_alloc branch 2 times, most recently from f79a1ae to 172f914 Compare February 19, 2025 05:16
commit 3902491d7789cd7a6f0ef2bb1572abefff1073cf
Author: Stella Laurenzo <[email protected]>
Date:   Tue Feb 18 20:56:02 2025 -0800

    Flip flag off.

commit 1880f4962b14071c7897a1c18770c246243784b7
Author: Stella Laurenzo <[email protected]>
Date:   Tue Feb 18 20:48:51 2025 -0800

    Disable logging.

commit b29b3c5
Author: Stella Laurenzo <[email protected]>
Date:   Tue Feb 18 20:30:40 2025 -0800

    Allow no alias override.

commit e5514c3
Author: Stella Laurenzo <[email protected]>
Date:   Tue Feb 18 19:04:14 2025 -0800

    Do whole inv result tracking.

commit 878ed0c
Author: Stella Laurenzo <[email protected]>
Date:   Tue Feb 18 12:41:43 2025 -0800

    Fix header

commit 4f2f6b8
Merge: c527839 8b806bf
Author: Stella Laurenzo <[email protected]>
Date:   Tue Feb 18 12:35:35 2025 -0800

    Merge branch 'main' of github.com:nod-ai/sharktank into shortfin_async_alloc

commit c527839
Merge: e3f1eac 51cf2f4
Author: Stella Laurenzo <[email protected]>
Date:   Wed Nov 13 17:41:51 2024 -0800

    Merge branch 'main' into shortfin_async_alloc

commit e3f1eac
Author: Stella Laurenzo <[email protected]>
Date:   Wed Nov 13 17:09:43 2024 -0800

    [shortfin] Implement async alloc/dealloc of buffers.

    This has been a todo since day one. For device buffers, this now properly stream orders the alloc/dealloc.

Fix npe

Enable logging

Disable async device alloc.

Fix dealloc barrier.

Put dealloca on timeline

Properly queue order dealloca.
@stellaraccident stellaraccident merged commit b299af3 into main Feb 19, 2025
37 of 40 checks passed
@stellaraccident stellaraccident deleted the shortfin_async_alloc branch February 19, 2025 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants