Skip to content

opt: Replace COO with CSR sparse tensors for bfloat16/float16 support & faster speedups [IN PROGRESS]#557

Open
wz-ml wants to merge 20 commits intodecoderesearch:mainfrom
wz-ml:willz/smm-optimization
Open

opt: Replace COO with CSR sparse tensors for bfloat16/float16 support & faster speedups [IN PROGRESS]#557
wz-ml wants to merge 20 commits intodecoderesearch:mainfrom
wz-ml:willz/smm-optimization

Conversation

@wz-ml
Copy link
Copy Markdown
Contributor

@wz-ml wz-ml commented Oct 5, 2025

Description

WIP for further optimization experimentation. Stuff that works:

  • CSR format
    • Supports bfloat16 & float16 on Ampere
    • Speedups are significant on Ampere
    • Supports bfloat16 & float16 on Hopper
    • Speedups are significant on Hopper
  • Gather-reduce implementation (experiment)

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

You have tested formatting, typing and tests

  • I have run make check-ci to check format and linting. (you can run make format to format code if needed.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants