LMDeploy Release V0.0.4

lvhan028 released this 14 Aug 11:35

· 979 commits to main since this release

8cdcb2a

Highlight

Support 4-bit LLM quantization and inference. Check this guide for detailed information.

What's Changed

🚀 Features

Blazing fast W4A16 inference by @lzhangzz in #202
Support AWQ by @pppppM in #108 and @AllentDan in #228

💥 Improvements

Add release note template by @lvhan028 in #211
feat(quantization): kv cache use asymmetric by @tpoisonooo in #218

🐞 Bug fixes

Fix TIS client got-no-space-result side effect brought by PR #197 by @lvhan028 in #222

📚 Documentations

Update W4A16 News by @pppppM in #227
Check-in user guide for w4a16 LLM deployment by @lvhan028 in #224

Full Changelog: v0.0.3...v0.0.4

Contributors

lvhan028, tpoisonooo, and 3 other contributors

Assets 2