LMDeploy Release V0.5.1
What's Changed
🚀 Features
- Support phi3-vision by @RunningLeon in #1845
- Support internvl2 chat template by @AllentDan in #1911
- support gemma2 in pytorch engine by @grimoire in #1924
- Add tools to api_server for InternLM2 model by @AllentDan in #1763
- support internvl2-1b by @RunningLeon in #1983
- feat: support llama2 and internlm2 on 910B by @yao-fengchen in #2011
- Support glm 4v by @RunningLeon in #1947
- support internlm-xcomposer2d5-7b by @irexyc in #1932
- add chat template for codegeex4 by @RunningLeon in #2013
💥 Improvements
- misc: rm unnecessary files by @zhyncs in #1875
- drop stop words by @grimoire in #1823
- Add usage in stream response by @fbzhong in #1876
- Optimize sampling on pytorch engine. by @grimoire in #1853
- Remove deprecated chat cli and vl examples by @lvhan028 in #1899
- vision model use tp number of gpu by @irexyc in #1854
- misc: add default api_server_url for api_client by @zhyncs in #1922
- misc: add transformers version check for TurboMind Tokenizer by @zhyncs in #1917
- fix: append _stats when size > 0 by @zhyncs in #1809
- refactor: update awq linear and rm legacy by @zhyncs in #1940
- feat: add gpu topo for check_env by @zhyncs in #1944
- fix transformers version check for InternVL2 by @zhyncs in #1952
- Upgrade gradio by @AllentDan in #1930
- refactor sampling layer setup by @irexyc in #1912
- Add exception handler to imge encoder by @irexyc in #2010
- Avoid the same session id for openai endpoint by @AllentDan in #1995
🐞 Bug fixes
- Fix error link reference by @zihaomu in #1881
- Fix internlm-xcomposer2-vl awq search scale by @AllentDan in #1890
- fix SamplingDecodeTest and SamplingDecodeTest2 unittest failure by @zhyncs in #1874
- Fix smem size for fused split-kv reduction by @lzhangzz in #1909
- fix llama3 chat template by @AllentDan in #1956
- fix: set PYTHONIOENCODING to UTF-8 before start tritonserver by @zhyncs in #1971
- Fix internvl2-40b model export by @irexyc in #1979
- fix logprobs by @irexyc in #1968
- fix unexpected argument error when deploying "cogvlm-chat-hf" by @AllentDan in #1982
- fix mixtral and mistral cache_position by @zhyncs in #1941
- Fix the session_len assignment logic by @lvhan028 in #2007
- Fix logprobs openai api by @irexyc in #1985
- Fix internvl2-40b awq inference by @AllentDan in #2023
- Fix side effect of #1995 by @AllentDan in #2033
📚 Documentations
- docs: update faq for turbomind so not found by @zhyncs in #1877
- [Doc]: Change to sphinx-book-theme in readthedocs by @RunningLeon in #1880
- docs: update compatibility section in README by @zhyncs in #1946
- docs: update kv quant doc by @zhyncs in #1977
- docs: sync the core features in README to index.rst by @zhyncs in #1988
- Fix table rendering for readthedocs by @RunningLeon in #1998
- docs: fix Ada compatibility by @zhyncs in #2016
- update xcomposer2d5 docs by @irexyc in #2037
🌐 Other
- [ci] add internlm2.5 models into testcase by @zhulinJulia24 in #1928
- bump version to v0.5.1 by @lvhan028 in #2022
New Contributors
Full Changelog: v0.5.0...v0.5.1