Releases · InternLM/lmdeploy · GitHub

15 Aug 07:40

lvhan028

LMDeploy Release V0.0.5

What's Changed

🐞 Bug fixes

Fix wrong RPATH using the absolute path instead of relative one by @irexyc in #239

Full Changelog: v0.0.4...v0.0.5

Contributors

irexyc

Assets 2

14 Aug 11:35

lvhan028

LMDeploy Release V0.0.4

Highlight

Support 4-bit LLM quantization and inference. Check this guide for detailed information.

What's Changed

🚀 Features

Blazing fast W4A16 inference by @lzhangzz in #202
Support AWQ by @pppppM in #108 and @AllentDan in #228

💥 Improvements

Add release note template by @lvhan028 in #211
feat(quantization): kv cache use asymmetric by @tpoisonooo in #218

🐞 Bug fixes

Fix TIS client got-no-space-result side effect brought by PR #197 by @lvhan028 in #222

📚 Documentations

Update W4A16 News by @pppppM in #227
Check-in user guide for w4a16 LLM deployment by @lvhan028 in #224

Full Changelog: v0.0.3...v0.0.4

Contributors

lvhan028, tpoisonooo, and 3 other contributors

Assets 2

09 Aug 09:55

lvhan028

LMDeploy Release V0.0.3

What's Changed

🚀 Features

Support tensor parallelism without offline splitting model weights by @grimoire in #158
Add script to split HuggingFace model to the smallest sharded checkpoints by @LZHgrla in #199
Add non-stream inference api for chatbot by @lvhan028 in #200

💥 Improvements

Add issue/pr templates by @lvhan028 in #184
Remove unused code to reduce binary size by @lzhangzz in #181
Support serving with gradio without communicating to TIS by @AllentDan in #162
Improve postprocessing in TIS serving by applying Incremental de-tokenizing by @lvhan028 in #197
Support multi-session chat by @wangruohui in #178

🐞 Bug fixes

Fix build test error and move turbmind csrc test cases to tests/csrc by @lvhan028 in #188
Fix launching client error by moving lmdeploy/turbomind/utils.py to lmdeploy/utils.py by @lvhan028 in #191

📚 Documentations

Update README.md by @tpoisonooo in #187
Translate turbomind.md by @xin-li-67 in #173

New Contributors

@LZHgrla made their first contribution in #199

Full Changelog: v0.0.2...v0.0.3

Contributors

grimoire, lvhan028, and 6 other contributors

Assets 2

28 Jul 07:11

lvhan028

LMDeploy Release V0.0.2

What's Changed

🚀 Features

Add lmdeploy python package built scripts and CI workflow by @irexyc in #163, #164, #170
Support LLama-2 with GQA by @lzhangzz in #147 and @grimoire in #160
Add Llama-2 chat template by @grimoire in #140
Add decode-only forward pass by @lzhangzz in #153
Support tensor parallelism in turbomind's python API by @grimoire #82
Support w pack qkv by @tpoisonooo in #83

💥 Improvements

Refactor the chat template of supported models using factory pattern by @lvhan028 in #144 and @streamsunshine in #174
Add profile throughput benchmark by @grimoire in #146
Remove slicing reponse and add resume api by @streamsunshine in #154
Support DeepSpeed on autoTP and kernel injection by @KevinNuNu and @wangruohui in #138
Add github action for publishing docker image by @RunningLeon in #148

🐞 Bug fixes

Fix getting package root path error in python3.9 by @lvhan028 in #157
Return carriage caused overwriting at the same line by @wangruohui in #143
Fix the offset during streaming chat by @lvhan028 in #142
Fix concatenate bug in benchmark serving script by @rollroll90 in #134
Fix attempted_relative_import by @KevinNuNu in #125

📚 Documentations

Translate en/quantization.md into Chinese by @xin-li-67 in #166
Check-in benchmark on real conversation data by @lvhan028 in #156
Fix typo and missing dependant packages in REAME and requirements.txt by @vansin in #123, @APX103 in #109, @AllentDan in #119 and @del-zhenwu in #124
Add turbomind's architecture documentation by @lzhangzz in #101

New Contributors

@streamsunshine @del-zhenwu @APX103 @xin-li-67 @KevinNuNu @rollroll90

Contributors

grimoire, lvhan028, and 13 other contributors

Assets 6