LMDeploy Release V0.2.0
What's Changed
🚀 Features
- Support internlm2 by @lvhan028 in #963
- [Feature] Add params config for api server web_ui by @amulil in #735
- [Feature]Merge
lmdeploy lite calibrate
andlmdeploy lite auto_awq
by @pppppM in #849 - Compute cross entropy loss given a list of input tokens by @lvhan028 in #830
- Support QoS in api_server by @sallyjunjun in #877
- Refactor torch inference engine by @lvhan028 in #871
- add image chat demo by @irexyc in #874
- check-in generation config by @lvhan028 in #902
- check-in ModelConfig by @AllentDan in #907
- pytorch engine config by @grimoire in #908
- Check-in turbomind engine config by @irexyc in #909
- S-LoRA support by @grimoire in #894
- add init in adapters by @grimoire in #923
- Refactor LLM inference pipeline API by @AllentDan in #916
- Refactor gradio and api_server by @AllentDan in #918
- Add request distributor server by @AllentDan in #903
- Upgrade lmdeploy cli by @RunningLeon in #922
💥 Improvements
- add top_k value for /v1/completions and update the documents by @AllentDan in #870
- export "num_tokens_per_iter", "max_prefill_iters" and etc when converting a model by @lvhan028 in #845
- Move
api_server
dependencies from serve.txt to runtime.txt by @lvhan028 in #879 - Refactor benchmark bash script by @lvhan028 in #884
- Add test case for function regression by @zhulinJulia24 in #844
- Update test triton CI by @RunningLeon in #893
- Update dockerfile by @RunningLeon in #891
- Perform fuzzy matching on chat template according to model path by @AllentDan in #839
- support accessing lmdeploy version by lmdeploy.version_info by @lvhan028 in #910
- Remove
flash-attn
dependency of lmdeploy lite module by @lvhan028 in #917 - Improve setup by removing pycuda dependency and adding cuda runtime and cublas to RPATH by @irexyc in #912
- remove unused settings in turbomind engine config by @irexyc in #921
- Cleanup fixed attributes in turbomind engine config by @irexyc in #928
- fix get_gpu_mem by @grimoire in #934
- remove instance_num argument by @AllentDan in #931
- Fix matching results of several chat templates like llama2, solar, yi and so on by @AllentDan in #925
- add pytorch random sampling by @grimoire in #930
- suppress turbomind chat warning by @irexyc in #937
- modify type hint of api to avoid import _turbomind by @AllentDan in #936
- accelerate pytorch benchmark by @grimoire in #946
- Remove
tp
from pipline argument list by @lvhan028 in #947 - set gradio default value the same as chat.py by @AllentDan in #949
- print help for cli in case of failure by @RunningLeon in #955
- return dataclass for pipeline by @AllentDan in #952
- set random seed when it is None by @AllentDan in #958
- avoid run get_logger when import lmdeploy by @RunningLeon in #956
- support mlp s-lora by @grimoire in #957
- skip resume logic for pytorch backend by @AllentDan in #968
- Add ci for ut by @RunningLeon in #966
🐞 Bug fixes
- add tritonclient req by @RunningLeon in #872
- Fix uninitialized parameter by @lvhan028 in #875
- Fix overflow by @irexyc in #897
- Fix data offset by @AllentDan in #900
- Fix context decoding stuck issue when tp > 1 by @irexyc in #904
- [Fix] set scaling_factor 1 forcefully when sequence length is less than max_pos_emb by @lvhan028 in #911
- fix pytorch llama2 with new transformers by @grimoire in #914
- fix local variable 'output_ids' referenced before assignment by @irexyc in #919
- fix pipeline stop_words type error by @AllentDan in #929
- pass stop words to openai api by @AllentDan in #887
- fix profile generation multiprocessing error by @AllentDan in #933
- Miss init.py in modeling folder by @lvhan028 in #951
- fix cli with special arg names by @RunningLeon in #959
- fix logger in tokenizer by @RunningLeon in #960
📚 Documentations
- Improve user guide by @lvhan028 in #899
- Add user guide about pytorch engine by @grimoire in #915
- Update supported models and add quick start section in README by @lvhan028 in #926
- Fix scripts in benchmark doc by @panli889 in #941
- Update get_started and w4a16 tutorials by @lvhan028 in #945
- Add more docstring to api_server and proxy_server by @AllentDan in #965
- stable api_server benchmark result by a non-zero await by @AllentDan in #885
- fix pytorch backend can not properly stop by @AllentDan in #962
- [Fix] Fix
calibrate
bug whentransformers>4.36
by @pppppM in #967
🌐 Other
New Contributors
- @amulil made their first contribution in #735
- @zhulinJulia24 made their first contribution in #844
- @sallyjunjun made their first contribution in #877
- @panli889 made their first contribution in #941
Full Changelog: v0.1.0...v0.2.0