Skip to content

v0.9.0

Compare
Choose a tag to compare
@awni awni released this 28 Mar 23:19
· 508 commits to main since this release
d8cb312

Highlights:

  • Fast partial RoPE (used by Phi-2)
  • Fast gradients for RoPE, RMSNorm, and LayerNorm

Core

  • More overhead reductions
  • Partial fast RoPE (fast Phi-2)
  • Better buffer donation for copy
  • Type hierarchy and issubdtype
  • Fast VJPs for RoPE, RMSNorm, and LayerNorm

NN

  • Module.set_dtype
  • Chaining in nn.Module (model.freeze().update(…))

Bugfixes

  • Fix set item bugs
  • Fix scatter vjp
  • Check shape integer overlow on array construction
  • Fix bug with module attributes
  • Fix two bugs for odd shaped QMV
  • Fix GPU sort for large sizes
  • Fix bug in negative padding for convolutions
  • Fix bug in multi-stream race condition for graph evaluation
  • Fix random normal generation for half precision