LMDeploy Release V0.0.2

lvhan028 released this 28 Jul 07:11

· 1006 commits to main since this release

7e0b75b

What's Changed

🚀 Features

Add lmdeploy python package built scripts and CI workflow by @irexyc in #163, #164, #170
Support LLama-2 with GQA by @lzhangzz in #147 and @grimoire in #160
Add Llama-2 chat template by @grimoire in #140
Add decode-only forward pass by @lzhangzz in #153
Support tensor parallelism in turbomind's python API by @grimoire #82
Support w pack qkv by @tpoisonooo in #83

💥 Improvements

Refactor the chat template of supported models using factory pattern by @lvhan028 in #144 and @streamsunshine in #174
Add profile throughput benchmark by @grimoire in #146
Remove slicing reponse and add resume api by @streamsunshine in #154
Support DeepSpeed on autoTP and kernel injection by @KevinNuNu and @wangruohui in #138
Add github action for publishing docker image by @RunningLeon in #148

🐞 Bug fixes

Fix getting package root path error in python3.9 by @lvhan028 in #157
Return carriage caused overwriting at the same line by @wangruohui in #143
Fix the offset during streaming chat by @lvhan028 in #142
Fix concatenate bug in benchmark serving script by @rollroll90 in #134
Fix attempted_relative_import by @KevinNuNu in #125

📚 Documentations

Translate en/quantization.md into Chinese by @xin-li-67 in #166
Check-in benchmark on real conversation data by @lvhan028 in #156
Fix typo and missing dependant packages in REAME and requirements.txt by @vansin in #123, @APX103 in #109, @AllentDan in #119 and @del-zhenwu in #124
Add turbomind's architecture documentation by @lzhangzz in #101

New Contributors

@streamsunshine @del-zhenwu @APX103 @xin-li-67 @KevinNuNu @rollroll90

Contributors

grimoire, lvhan028, and 13 other contributors

Assets 6