LMDeploy Release V0.5.0
What's Changed
🚀 Features
- support MiniCPM-Llama3-V 2.5 by @irexyc in #1708
- [Feature]: Support llava for pytorch engine by @RunningLeon in #1641
- Device dispatcher by @grimoire in #1775
- Add GLM-4-9B-Chat by @lzhangzz in #1724
- Torch deepseek v2 by @grimoire in #1621
- Support internvl-chat for pytorch engine by @RunningLeon in #1797
- Add interfaces to the pipeline to obtain logits and ppl by @irexyc in #1652
- [Feature]: Support cogvlm-chat by @RunningLeon in #1502
💥 Improvements
- support mistral and llava_mistral in turbomind by @lvhan028 in #1579
- Add health endpoint by @AllentDan in #1679
- upgrade the version of the dependency package peft by @grimoire in #1687
- Follow the conventional model_name by @AllentDan in #1677
- API Image URL fetch timeout by @vody-am in #1684
- Support internlm-xcomposer2-4khd-7b awq by @AllentDan in #1666
- update dockerfile and docs by @RunningLeon in #1715
- lazy import VLAsyncEngine to avoid bringing in VLMs dependencies when deploying LLMs by @lvhan028 in #1714
- feat: align with OpenAI temperature range by @zhyncs in #1733
- feat: align with OpenAI temperature range in api server by @zhyncs in #1734
- Refactor converter about get_input_model_registered_name and get_output_model_registered_name_and_config by @lvhan028 in #1702
- Refine max_new_tokens logic to improve user experience by @AllentDan in #1705
- Refactor loading weights by @grimoire in #1603
- refactor config by @grimoire in #1751
- Add anomaly handler by @lzhangzz in #1780
- Encode raw image file to base64 by @irexyc in #1773
- skip inference for oversized inputs by @grimoire in #1769
- fix: prevent numpy breakage by @zhyncs in #1791
- More accurate time logging for ImageEncoder and fix concurrent image processing corruption by @irexyc in #1765
- Optimize kernel launch for triton2.2.0 and triton2.3.0 by @grimoire in #1499
- feat: auto set awq model_format from hf by @zhyncs in #1799
- check driver mismatch by @grimoire in #1811
- PyTorchEngine adapts to the latest internlm2 modeling. by @grimoire in #1798
- AsyncEngine create cancel task in exception. by @grimoire in #1807
- compat internlm2 for pytorch engine by @RunningLeon in #1825
- Add model revision & download_dir to cli by @irexyc in #1814
- fix image encoder request queue by @irexyc in #1837
- Harden stream callback by @lzhangzz in #1838
- Support Qwen2-1.5b awq by @AllentDan in #1793
- remove chat template config in turbomind engine by @irexyc in #1161
- misc: align PyTorch Engine temprature with TurboMind by @zhyncs in #1850
- docs: update cache-max-entry-count help message by @zhyncs in #1892
🐞 Bug fixes
- fix typos by @irexyc in #1690
- [Bugfix] fix internvl-1.5-chat vision model preprocess and freeze weights by @DefTruth in #1741
- lock setuptools version in dockerfile by @RunningLeon in #1770
- Fix openai package can not use proxy stream mode by @AllentDan in #1692
- Fix finish_reason by @AllentDan in #1768
- fix uncached stop words by @grimoire in #1754
- [side-effect]Fix param
--cache-max-entry-count
is not taking effect (#1758) by @QwertyJack in #1778 - support qwen2 1.5b by @lvhan028 in #1782
- fix falcon attention by @grimoire in #1761
- Refine AsyncEngine exception handler by @AllentDan in #1789
- [side-effect] fix weight_type caused by PR #1702 by @lvhan028 in #1795
- fix best_match_model by @irexyc in #1812
- Fix Request completed log by @irexyc in #1821
- fix qwen-vl-chat hung by @irexyc in #1824
- Detokenize with prompt token ids by @AllentDan in #1753
- Update engine.py to fix small typos by @WANGSSSSSSS in #1829
- [side-effect] bring back "--cap" argument in chat cli by @lvhan028 in #1859
- Fix vl session-len by @AllentDan in #1860
- fix gradio vl "stop_words" by @irexyc in #1873
- fix qwen2 cache_position for PyTorch Engine when transformers>4.41.2 by @zhyncs in #1886
- fix model name matching for internvl by @RunningLeon in #1867
📚 Documentations
- docs: add BentoLMDeploy in README by @zhyncs in #1736
- [Doc]: Update docs for internlm2.5 by @RunningLeon in #1887
🌐 Other
- add longtext generation benchmark by @zhulinJulia24 in #1694
- add qwen2 model into testcase by @zhulinJulia24 in #1772
- fix pr test for newest internlm2 model by @zhulinJulia24 in #1806
- react test evaluation config by @zhulinJulia24 in #1861
- bump version to v0.5.0 by @lvhan028 in #1852
New Contributors
- @DefTruth made their first contribution in #1741
- @QwertyJack made their first contribution in #1778
- @WANGSSSSSSS made their first contribution in #1829
Full Changelog: v0.4.2...v0.5.0