LMDeploy Release V0.0.4
Highlight
- Support 4-bit LLM quantization and inference. Check this guide for detailed information.
What's Changed
🚀 Features
- Blazing fast W4A16 inference by @lzhangzz in #202
- Support AWQ by @pppppM in #108 and @AllentDan in #228
💥 Improvements
- Add release note template by @lvhan028 in #211
- feat(quantization): kv cache use asymmetric by @tpoisonooo in #218
🐞 Bug fixes
📚 Documentations
- Update W4A16 News by @pppppM in #227
- Check-in user guide for w4a16 LLM deployment by @lvhan028 in #224
Full Changelog: v0.0.3...v0.0.4