LMDeploy Release V0.1.0a2
What's Changed
💥 Improvements
- Unify prefill & decode passes by @lzhangzz in #775
- add cuda12.1 build check ci by @irexyc in #782
- auto upload cuda12.1 python pkg to release when create new tag by @irexyc in #784
- Report the inference benchmark of models with different size by @lvhan028 in #794
- Add chat template for Yi by @AllentDan in #779
🐞 Bug fixes
- Fix early-exit condition in attention kernel by @lzhangzz in #788
- Fix missed arguments when benchmark static inference performance by @lvhan028 in #787
- fix extra colon in InternLMChat7B template by @C1rN09 in #796
- Fix local kv head num by @lvhan028 in #806
📚 Documentations
🌐 Other
New Contributors
Full Changelog: v0.1.0a1...v0.1.0a2