LMDeploy Release V0.0.9
Highlight
- Support InternLM 20B, including FP16, W4A16, and W4KV8
What's Changed
🚀 Features
💥 Improvements
- Reduce gil switching by @irexyc in #407
- Profile token generation with more settings by @AllentDan in #364
🐞 Bug fixes
- Fix disk space limit for building docker image by @RunningLeon in #404
- more general pypi ci by @irexyc in #412
- Fix build.md by @pangsg in #411
- Fix memory leak by @irexyc in #415
- Fix token count bug by @AllentDan in #416
- [Fix] Support actual seqlen in flash-attention2 by @grimoire in #418
- [Fix] output[-1] when output is empty by @wangruohui in #405
🌐 Other
- rename readthedocs config file by @RunningLeon in #429
- bump version to v0.0.9 by @lvhan028 in #428
New Contributors
Full Changelog: v0.0.8...v0.0.9