Releases · InternLM/lmdeploy

23 Nov 13:05

lvhan028

v0.1.0a0

a7c5007

LMDeploy Release V0.1.0a0

What's Changed

🚀 Features

Add extra_requires to reduce dependencies by @RunningLeon in #580
TurboMind 2 by @lzhangzz in #590
Support loading hf model directly by @irexyc in #685

💥 Improvements

Fix Tokenizer encode by @AllentDan in #645
Optimize for throughput by @lzhangzz in #701
Replace mmengine with mmengine-lite by @zhouzaida in #715

🐞 Bug fixes

Fix init of batch state by @lzhangzz in #682
fix turbomind stream canceling by @grimoire in #686
[Fix] Fix load_checkpoint_in_model bug by @HIT-cwh in #690
Fix wrong eos_id and bos_id obtained through grpc api by @lvhan028 in #644
Fix cache/output length calculation by @lzhangzz in #738
[Fix] Skip empty batch by @lzhangzz in #747

📚 Documentations

[Docs] Update Supported Matrix by @pppppM in #679
[Docs] Update KV8 Docs by @pppppM in #681
[Doc] Update restful api doc by @AllentDan in #662
Check-in user guide about turbomind config by @lvhan028 in #680

🌐 Other

bump version to v0.1.0a0 by @lvhan028 in #709

New Contributors

@zhouzaida made their first contribution in #715

Full Changelog: v0.0.14...v0.1.0a0

Contributors

grimoire, lvhan028, and 7 other contributors

Assets 2

09 Nov 12:13

lvhan028

v0.0.14

7b20cfd

LMDeploy Release V0.0.14

What's Changed

💥 Improvements

Improve api_server and webui usage by @AllentDan in #544
fix: gradio gr.Button.update deprecated after 4.0.0 by @hscspring in #637
add cli to list the supported model names by @RunningLeon in #639
Refactor model conversion by @irexyc in #296
[Enchance] internlm message to prompt by @Harold-lkk in #499
update turbomind session_len with model.session_len by @AllentDan in #634
Manage session id using random int for gradio local mode by @aisensiy in #553
Add UltraCM and WizardLM chat templates by @AllentDan in #599
Add check env sub command by @RunningLeon in #654

🐞 Bug fixes

[Fix] Qwen's quantization results are abnormal & Baichuan cannot be quantized by @pppppM in #605
FIX: fix stop_session func bug by @yunzhongyan0 in #578
fix benchmark serving computation mistake by @AllentDan in #630
fix Tokenizer load error when the path of the being-converted model is not writable by @irexyc in #669
fix tokenizer_info when convert the model by @irexyc in #661

🌐 Other

bump version to v0.0.14 by @lvhan028 in #663

New Contributors

@hscspring made their first contribution in #637
@yunzhongyan0 made their first contribution in #578

Full Changelog: v0.0.13...v0.0.14

Contributors

aisensiy, lvhan028, and 7 other contributors

Assets 2

30 Oct 06:35

lvhan028

v0.0.13

56942c4

LMDeploy Release V0.0.13

What's Changed

🚀 Features

Add more user-friendly CLI by @RunningLeon in #541

💥 Improvements

support inference a batch of prompts by @AllentDan in #467

📚 Documentations

Add "build from docker" section by @lvhan028 in #602

🌐 Other

bump version to v0.0.13 by @lvhan028 in #620

Full Changelog: v0.0.12...v0.0.13

Contributors

lvhan028, RunningLeon, and AllentDan

Assets 2

24 Oct 04:23

lvhan028

v0.0.12

96f1b8e

LMDeploy Release V0.0.12

What's Changed

🚀 Features

add solar chat template by @AllentDan in #576 and #587

💥 Improvements

change model_format to qwen when model_name starts with qwen by @lvhan028 in #575
robust incremental decode for leading space by @AllentDan in #581

🐞 Bug fixes

avoid splitting chinese characters during decoding by @AllentDan in #566
Revert "[Docs] Simplify build.md" by @pppppM in #586
Fix crash and remove sys_instruct from chat.py and client.py by @irexyc in #591

🌐 Other

bump version to v0.0.12 by @lvhan028 in #604

Full Changelog: v0.0.11...v0.0.12

Contributors

lvhan028, irexyc, and 2 other contributors

Assets 2

17 Oct 06:19

lvhan028

v0.0.11

bb3cce9

LMDeploy Release V0.0.11

What's Changed

🚀 Features

Support CORS for openai api server by @aisensiy in #481

💥 Improvements

make IPv6 compatible, safe run for coroutine interrupting by @AllentDan in #487
support deploy qwen-14b-chat by @irexyc in #482
add tp hint for deployment by @irexyc in #555
Move tokenizer.py to the folder of lmdeploy by @grimoire in #543

🐞 Bug fixes

Change shared_instance type from weakptr to shared_ptr by @lvhan028 in #507
[Fix] Set the default value of step being 0 by @lvhan028 in #532
[bug] fix mismatched shape for decoder output tensor by @akhoroshev in #517
Fix typing of openai protocol. by @mokeyish in #554

📚 Documentations

Fix typo in docs/en/pytorch.md by @shahrukhx01 in #539
[Doc] update huggingface internlm-chat-7b model url by @AllentDan in #546
[doc] Update benchmark command in w4a16.md by @del-zhenwu in #500

🌐 Other

free runner disk by @irexyc in #552
bump version to v0.0.11 by @lvhan028 in #567

New Contributors

@shahrukhx01 made their first contribution in #539
@mokeyish made their first contribution in #554

Full Changelog: v0.0.10...v0.0.11

Contributors

aisensiy, grimoire, and 7 other contributors

Assets 2

26 Sep 12:52

lvhan028

v0.0.10

b58a9df

LMDeploy Release V0.0.10

What's Changed

💥 Improvements

[feature] Graceful termination of background threads in LlamaV2 by @akhoroshev in #458
expose stop words and filter eoa by @AllentDan in #352

🐞 Bug fixes

Fix side effect brought by supporting codellama: sequence_start is always true when calling model.get_prompt by @lvhan028 in #466
Miss meta instruction of internlm-chat model by @lvhan028 in #470
[bug] Fix race condition by @akhoroshev in #460
Fix compatibility issues with Pydantic 2 by @aisensiy in #465
fix benchmark serving cannot use Qwen tokenizer by @AllentDan in #443
Fix memory leak by @lvhan028 in #488

📚 Documentations

Fix typo in README.md by @eltociear in #462

🌐 Other

bump version to v0.0.10 by @lvhan028 in #474

New Contributors

@eltociear made their first contribution in #462
@akhoroshev made their first contribution in #458
@aisensiy made their first contribution in #465

Full Changelog: v0.0.9...v0.0.10

Contributors

aisensiy, lvhan028, and 3 other contributors

Assets 2

20 Sep 08:10

lvhan028

v0.0.9

0be9e7a

LMDeploy Release V0.0.9

Highlight

Support InternLM 20B, including FP16, W4A16, and W4KV8

What's Changed

🚀 Features

Support InternLM 20B by @lvhan028 in #440

💥 Improvements

Reduce gil switching by @irexyc in #407
Profile token generation with more settings by @AllentDan in #364

🐞 Bug fixes

Fix disk space limit for building docker image by @RunningLeon in #404
more general pypi ci by @irexyc in #412
Fix build.md by @pangsg in #411
Fix memory leak by @irexyc in #415
Fix token count bug by @AllentDan in #416
[Fix] Support actual seqlen in flash-attention2 by @grimoire in #418
[Fix] output[-1] when output is empty by @wangruohui in #405

🌐 Other

rename readthedocs config file by @RunningLeon in #429
bump version to v0.0.9 by @lvhan028 in #428

New Contributors

@pangsg made their first contribution in #411

Full Changelog: v0.0.8...v0.0.9

Contributors

grimoire, lvhan028, and 5 other contributors

Assets 2

11 Sep 15:34

lvhan028

v0.0.8

450757b

LMDeploy Release V0.0.8

Highlight

Support Baichuan2-7B-Base and Baichuan2-7B-Chat
Support all features of Code Llama: code completion, infilling, chat / instruct, and python specialist

What's Changed

🚀 Features

Support baichuan2-chat chat template by @wangruohui in #378
Support codellama by @lvhan028 in #359

🐞 Bug fixes

[Fix] when using stream is False, continuous batching doesn't work by @sleepwalker2017 in #346
[Fix] Set max dynamic smem size for decoder MHA to support context length > 8k by @lvhan028 in #377
Fix exceed session len core dump for chat and generate by @AllentDan in #366
[Fix] update puyu model by @Harold-lkk in #399

📚 Documentations

[Docs] Fix quantization docs link by @LZHgrla in #367
[Docs] Simplify build.md by @pppppM in #370
[Docs] Update lmdeploy logo by @lvhan028 in #372

New Contributors

@sleepwalker2017 made their first contribution in #346

Full Changelog: v0.0.7...v0.0.8

Contributors

lvhan028, wangruohui, and 5 other contributors

Assets 2

04 Sep 06:39

lvhan028

v0.0.7

d065f3e

LMDeploy Release V0.0.7

Highlights

Flash attention 2 is supported, boosting context decoding speed by approximately 45%
Token_id decoding has been optimized for better efficiency
The gemm-tunned script has been packed in the PyPI package

What's Changed

🚀 Features

Add flashattention2 by @grimoire in #196

💥 Improvements

add llama_gemm to wheel by @irexyc in #320
Decode generated token_ids incrementally by @AllentDan in #309

🐞 Bug fixes

Fix turbomind import error on windows by @irexyc in #316
Fix profile_serving hung issue by @lvhan028 in #344

📚 Documentations

Fix readthedocs building by @RunningLeon in #321
fix(kvint8): update doc by @tpoisonooo in #315
Update FAQ for restful api by @AllentDan in #319

Full Changelog: v0.0.6...v0.0.7

Contributors

grimoire, lvhan028, and 4 other contributors

Assets 2

25 Aug 13:30

lvhan028

v0.0.6

cfabbbd

LMDeploy Release V0.0.6

Highlights

Support Qwen-7B with dynamic NTK scaling and logN scaling in turbomind
Support tensor parallelism for W4A16
Add OpenAI-like RESTful API
Support Llama-2 70B 4-bit quantization

What's Changed

🚀 Features

Profiling tool for huggingface and deepspeed models by @wangruohui in #161
Support windows platform by @irexyc in #209
Qwen-7B, dynamic NTK scaling and logN scaling support in turbomind by @lzhangzz in #230
Add Restful API by @AllentDan in #223
Support context decoding with DP in pytorch by @wangruohui in #193

💥 Improvements

Support TP for W4A16 by @lzhangzz in #262
Pass chat template args including meta_prompt to model(7785142) by @AllentDan in #225
Enable the Gradio server to call inference services through the RESTful API by @AllentDan in #287

🐞 Bug fixes

Adjust dependency of gradio server by @AllentDan in #236
Implement movmatrix using warp shuffling for CUDA < 11.8 by @lzhangzz in #267
Add 'accelerate' to requirement list by @lvhan028 in #261
Fix building with CUDA 11.3 by @lzhangzz in #280
Pad tok_embedding and output weights to make their shape divisible by TP by @lvhan028 in #285
Fix llama2 70b & qwen quantization error by @pppppM in #273
Import turbomind in gradio server only when it is needed by @AllentDan in #303

📚 Documentations

Remove specified version in user guide by @lvhan028 in #241
docs(quantzation): update description by @tpoisonooo in #253 and #272
Check-in FAQ by @lvhan028 in #256
add readthedocs by @RunningLeon in #208

🌐 Other

Update workflow for building docker image by @RunningLeon in #282
Change to github-hosted runner for building docker image by @RunningLeon in #291

Known issues

4-bit Qwen-7b model inference failed. #307 is addressing this issue.

Full Changelog: v0.0.5...v0.0.6

Contributors

lvhan028, tpoisonooo, and 6 other contributors

Assets 2

Releases: InternLM/lmdeploy

LMDeploy Release V0.1.0a0

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Contributors

LMDeploy Release V0.0.14

What's Changed

💥 Improvements

🐞 Bug fixes

🌐 Other

New Contributors

Contributors

LMDeploy Release V0.0.13

What's Changed

🚀 Features

💥 Improvements

📚 Documentations

🌐 Other

Contributors

LMDeploy Release V0.0.12

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

🌐 Other

Contributors

LMDeploy Release V0.0.11

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Contributors

LMDeploy Release V0.0.10

What's Changed

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Contributors

LMDeploy Release V0.0.9

Highlight

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

🌐 Other

New Contributors

Contributors

LMDeploy Release V0.0.8

Highlight

What's Changed

🚀 Features

🐞 Bug fixes

📚 Documentations

New Contributors

Contributors

LMDeploy Release V0.0.7

Highlights

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

Contributors

LMDeploy Release V0.0.6

Highlights

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations