LMDeploy Release V0.2.5
What's Changed
🚀 Features
- Support mistral and sliding window attention by @grimoire in #1075
- torch engine support chatglm3 by @grimoire in #1159
- Support qwen1.5 in pytorch engine by @grimoire in #1160
- Support mixtral for pytorch engine by @RunningLeon in #1133
- Support torch deepseek moe by @grimoire in #1163
- Support gemma model in pytorch engine by @grimoire in #1184
- Auto backend for pipeline and serve when backend is not set to pytorch explicitly by @RunningLeon in #1211
💥 Improvements
- Fix argument error by @ispobock in #1193
- Use LifoQueue for turbomind async_stream_infer by @AllentDan in #1179
- Update interactive output len strategy and response by @AllentDan in #1164
- Support
min_new_tokens
generation config in pytorch engine by @grimoire in #1096 - Batched sampling by @grimoire in #1197
- refactor the logic of getting
model_name
by @AllentDan in #1188 - Add parameter
max_prefill_token_num
by @lvhan028 in #1203 - optmize baichuan in pytorch engine by @grimoire in #1223
- check model required transformers version by @grimoire in #1220
- torch optmize chatglm3 by @grimoire in #1215
- Async torch engine by @grimoire in #1206
- remove unused kernel in pytorch engine by @grimoire in #1237
🐞 Bug fixes
- Fix session length for profile generation by @ispobock in #1181
- fix torch engine infer by @RunningLeon in #1185
- fix module map by @grimoire in #1205
- [Fix] Correct session length warning by @AllentDan in #1207
- Fix all devices occupation when applying tp to torch engine by updating device map by @grimoire in #1172
- Fix falcon chatglm2 template by @grimoire in #1168
- [Fix] Avoid AsyncEngine running the same session id by @AllentDan in #1219
- Fix
None
session_len by @lvhan028 in #1230 - fix multinomial sampling by @grimoire in #1228
- fix returning logits in prefill phase of pytorch engine by @grimoire in #1209
- optimize pytorch engine inference with falcon model by @grimoire in #1234
- fix bf16 multinomial sampling by @grimoire in #1239
- reduce torchengine prefill mem usage by @grimoire in #1240
📚 Documentations
- auto generate pipeline api for readthedocs by @RunningLeon in #1186
- Added tutorial document for deploying lmdeploy on Jetson series boards. by @BestAnHongjun in #1192
- update doc index by @zhyncs in #1241
🌐 Other
- Add PR test workflow and check-in more testcases by @zhulinJulia24 in #1208
- fix pytest version by @zhulinJulia24 in #1236
- bump version to v0.2.5 by @lvhan028 in #1235
New Contributors
- @ispobock made their first contribution in #1181
- @BestAnHongjun made their first contribution in #1192
Full Changelog: v0.2.4...v0.2.5