Release LMDeploy Release V0.5.1 · InternLM/lmdeploy

What's Changed

misc: rm unnecessary files by @zhyncs in #1875
drop stop words by @grimoire in #1823
Add usage in stream response by @fbzhong in #1876
Optimize sampling on pytorch engine. by @grimoire in #1853
Remove deprecated chat cli and vl examples by @lvhan028 in #1899
vision model use tp number of gpu by @irexyc in #1854
misc: add default api_server_url for api_client by @zhyncs in #1922
misc: add transformers version check for TurboMind Tokenizer by @zhyncs in #1917
fix: append _stats when size > 0 by @zhyncs in #1809
refactor: update awq linear and rm legacy by @zhyncs in #1940
feat: add gpu topo for check_env by @zhyncs in #1944
fix transformers version check for InternVL2 by @zhyncs in #1952
Upgrade gradio by @AllentDan in #1930
refactor sampling layer setup by @irexyc in #1912
Add exception handler to imge encoder by @irexyc in #2010
Avoid the same session id for openai endpoint by @AllentDan in #1995

Fix error link reference by @zihaomu in #1881
Fix internlm-xcomposer2-vl awq search scale by @AllentDan in #1890
fix SamplingDecodeTest and SamplingDecodeTest2 unittest failure by @zhyncs in #1874
Fix smem size for fused split-kv reduction by @lzhangzz in #1909
fix llama3 chat template by @AllentDan in #1956
fix: set PYTHONIOENCODING to UTF-8 before start tritonserver by @zhyncs in #1971
Fix internvl2-40b model export by @irexyc in #1979
fix logprobs by @irexyc in #1968
fix unexpected argument error when deploying "cogvlm-chat-hf" by @AllentDan in #1982
fix mixtral and mistral cache_position by @zhyncs in #1941
Fix the session_len assignment logic by @lvhan028 in #2007
Fix logprobs openai api by @irexyc in #1985
Fix internvl2-40b awq inference by @AllentDan in #2023
Fix side effect of #1995 by @AllentDan in #2033

Full Changelog: v0.5.0...v0.5.1