Release LMDeploy Release v0.6.2 · InternLM/lmdeploy

Highlights

PyTorch engine supports graph mode on ascend platform, doubling the inference speed
Support llama3.2-vision models in PyTorch engine
Support Mixtral in TurboMind engine, achieving 20+ RPS using SharedGPT dataset with 2 A100-80G GPUs

[Feature] Add argument to disable FastAPI docs by @mouweng in #2540
add check for device with cap 7.x by @grimoire in #2535
Add tool role for langchain usage by @AllentDan in #2558
Fix llama3.2-1b inference error by handling tie_word_embedding by @grimoire in #2568
Add a workaround for saving internvl2 with latest transformers by @AllentDan in #2583
optimize paged attention on triton3 by @grimoire in #2553
refactor for multi backends in dlinfer by @CyCle1024 in #2619
Copy sglang/bench_serving.py to lmdeploy as serving benchmark script by @lvhan028 in #2620
Add barrier to prevent TP nccl kernel waiting. by @grimoire in #2607
[ascend] refactor fused_moe on ascend platform by @yao-fengchen in #2613
[ascend] support paged_prefill_attn when batch > 1 by @yao-fengchen in #2612
Raise an error for the wrong chat template by @AllentDan in #2618
refine pre-post-process by @jinminxi104 in #2632
small block_m for sm7.x by @grimoire in #2626
update check for triton by @grimoire in #2641
Support llama3.2 LLM models in turbomind engine by @lvhan028 in #2596
Check whether device support bfloat16 by @lvhan028 in #2653
Add warning message about do_sample to alert BC by @lvhan028 in #2654
update ascend dockerfile by @CyCle1024 in #2661
fix supported model list in ascend graph mode by @jinminxi104 in #2669
remove dlinfer version by @CyCle1024 in #2672

set outlines<0.1.0 by @AllentDan in #2559
fix: make exit_flag verification for ascend more general by @CyCle1024 in #2588
set capture mode thread_local by @grimoire in #2560
Add distributed context in pytorch engine to support torchrun by @grimoire in #2615
Fix error in python3.8. by @Reinerzhou in #2646
Align UT with triton fill_kv_cache_quant kernel by @AllentDan in #2644
miss device_type when checking is_bf16_supported on ascend platform by @lvhan028 in #2663
fix syntax in Dockerfile_aarch64_ascend by @CyCle1024 in #2664
Set history_cross_kv_seqlens to 0 by default by @AllentDan in #2666
fix build error in ascend dockerfile by @CyCle1024 in #2667
bugfix: llava-hf/llava-interleave-qwen-7b-hf (#2497) by @deepindeed2022 in #2657
fix inference mode error for qwen2-vl by @irexyc in #2668

Add instruction for downloading models from openmind hub by @cookieyyds in #2577
Fix spacing in ascend user guide by @Superskyyy in #2601
Update get_started tutorial about deploying on ascend platform by @jinminxi104 in #2655
Update ascend get_started tutorial about installing nnal by @jinminxi104 in #2662

[ci] add oc infer test in stable test by @zhulinJulia24 in #2523
update copyright by @lvhan028 in #2579
[Doc]: Lock sphinx version by @RunningLeon in #2594
[ci] use local requirements for test workflow by @zhulinJulia24 in #2569
[ci] add pytorch kvint testcase into function regresstion by @zhulinJulia24 in #2584
[ci] React dailytest workflow by @zhulinJulia24 in #2617
[ci] fix restful script by @zhulinJulia24 in #2635
[ci] add internlm2_5_7b_batch_1 into evaluation testcase by @zhulinJulia24 in #2631
match torch and torch_vision version by @grimoire in #2649
Bump version to v0.6.2 by @lvhan028 in #2659

Full Changelog: v0.6.1...v0.6.2