Skip to content

Releases: InternLM/lmdeploy

LMDeploy Release V0.1.0a0

23 Nov 13:05
a7c5007
Compare
Choose a tag to compare

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Full Changelog: v0.0.14...v0.1.0a0

LMDeploy Release V0.0.14

09 Nov 12:13
7b20cfd
Compare
Choose a tag to compare

What's Changed

💥 Improvements

🐞 Bug fixes

  • [Fix] Qwen's quantization results are abnormal & Baichuan cannot be quantized by @pppppM in #605
  • FIX: fix stop_session func bug by @yunzhongyan0 in #578
  • fix benchmark serving computation mistake by @AllentDan in #630
  • fix Tokenizer load error when the path of the being-converted model is not writable by @irexyc in #669
  • fix tokenizer_info when convert the model by @irexyc in #661

🌐 Other

New Contributors

Full Changelog: v0.0.13...v0.0.14

LMDeploy Release V0.0.13

30 Oct 06:35
56942c4
Compare
Choose a tag to compare

What's Changed

🚀 Features

💥 Improvements

📚 Documentations

🌐 Other

Full Changelog: v0.0.12...v0.0.13

LMDeploy Release V0.0.12

24 Oct 04:23
96f1b8e
Compare
Choose a tag to compare

What's Changed

🚀 Features

💥 Improvements

  • change model_format to qwen when model_name starts with qwen by @lvhan028 in #575
  • robust incremental decode for leading space by @AllentDan in #581

🐞 Bug fixes

  • avoid splitting chinese characters during decoding by @AllentDan in #566
  • Revert "[Docs] Simplify build.md" by @pppppM in #586
  • Fix crash and remove sys_instruct from chat.py and client.py by @irexyc in #591

🌐 Other

Full Changelog: v0.0.11...v0.0.12

LMDeploy Release V0.0.11

17 Oct 06:19
bb3cce9
Compare
Choose a tag to compare

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

  • Change shared_instance type from weakptr to shared_ptr by @lvhan028 in #507
  • [Fix] Set the default value of step being 0 by @lvhan028 in #532
  • [bug] fix mismatched shape for decoder output tensor by @akhoroshev in #517
  • Fix typing of openai protocol. by @mokeyish in #554

📚 Documentations

🌐 Other

New Contributors

Full Changelog: v0.0.10...v0.0.11

LMDeploy Release V0.0.10

26 Sep 12:52
b58a9df
Compare
Choose a tag to compare

What's Changed

💥 Improvements

🐞 Bug fixes

  • Fix side effect brought by supporting codellama: sequence_start is always true when calling model.get_prompt by @lvhan028 in #466
  • Miss meta instruction of internlm-chat model by @lvhan028 in #470
  • [bug] Fix race condition by @akhoroshev in #460
  • Fix compatibility issues with Pydantic 2 by @aisensiy in #465
  • fix benchmark serving cannot use Qwen tokenizer by @AllentDan in #443
  • Fix memory leak by @lvhan028 in #488

📚 Documentations

🌐 Other

New Contributors

Full Changelog: v0.0.9...v0.0.10

LMDeploy Release V0.0.9

20 Sep 08:10
0be9e7a
Compare
Choose a tag to compare

Highlight

  • Support InternLM 20B, including FP16, W4A16, and W4KV8

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

🌐 Other

New Contributors

Full Changelog: v0.0.8...v0.0.9

LMDeploy Release V0.0.8

11 Sep 15:34
450757b
Compare
Choose a tag to compare

Highlight

  • Support Baichuan2-7B-Base and Baichuan2-7B-Chat
  • Support all features of Code Llama: code completion, infilling, chat / instruct, and python specialist

What's Changed

🚀 Features

🐞 Bug fixes

  • [Fix] when using stream is False, continuous batching doesn't work by @sleepwalker2017 in #346
  • [Fix] Set max dynamic smem size for decoder MHA to support context length > 8k by @lvhan028 in #377
  • Fix exceed session len core dump for chat and generate by @AllentDan in #366
  • [Fix] update puyu model by @Harold-lkk in #399

📚 Documentations

New Contributors

Full Changelog: v0.0.7...v0.0.8

LMDeploy Release V0.0.7

04 Sep 06:39
d065f3e
Compare
Choose a tag to compare

Highlights

  • Flash attention 2 is supported, boosting context decoding speed by approximately 45%
  • Token_id decoding has been optimized for better efficiency
  • The gemm-tunned script has been packed in the PyPI package

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

Full Changelog: v0.0.6...v0.0.7

LMDeploy Release V0.0.6

25 Aug 13:30
cfabbbd
Compare
Choose a tag to compare

Highlights

  • Support Qwen-7B with dynamic NTK scaling and logN scaling in turbomind
  • Support tensor parallelism for W4A16
  • Add OpenAI-like RESTful API
  • Support Llama-2 70B 4-bit quantization

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

  • Adjust dependency of gradio server by @AllentDan in #236
  • Implement movmatrix using warp shuffling for CUDA < 11.8 by @lzhangzz in #267
  • Add 'accelerate' to requirement list by @lvhan028 in #261
  • Fix building with CUDA 11.3 by @lzhangzz in #280
  • Pad tok_embedding and output weights to make their shape divisible by TP by @lvhan028 in #285
  • Fix llama2 70b & qwen quantization error by @pppppM in #273
  • Import turbomind in gradio server only when it is needed by @AllentDan in #303

📚 Documentations

🌐 Other

Known issues

  • 4-bit Qwen-7b model inference failed. #307 is addressing this issue.

Full Changelog: v0.0.5...v0.0.6