Releases: ModelTC/lightllm
Releases · ModelTC/lightllm
3.0.1.bak
This tag is used for the last merge before the reconstruction backup.
What's Changed
- 【Feature】PD Mode Support by @hiworldwzj in #607
- refact quantization, support torchao quant and vllm w8a8(int/fp), support mix quantization. by @shihaobai in #596
- Deepseek2 Support PD mode by @hiworldwzj in #614
- add vllm pynccl for cuda graph compatibility by @WANDY666 in #615
- optimze decode mla att by @shihaobai in #616
- upgrade deepseek kv copy & fix test/model_infer.py by @shihaobai in #617
- complete all_reduce and test by @WANDY666 in #619
- LIGHTLLM_PYNCCL_ENABLE default False by @WANDY666 in #620
- [model] support Qwen2.5-RM by @sufubao in #621
- add DP framework support. by @hiworldwzj in #622
- fix deepseekv2-lite tp>1 by @shihaobai in #623
- Deepseek fix for moe fp8 by @shihaobai in #624
- fix shm conflict error by @hiworldwzj in #625
- Refactor reduce by @shihaobai in #626
- fix mem alloc by @hiworldwzj in #627
- update pd master mode time out to 30s. by @hiworldwzj in #629
- fix set quantization by @shihaobai in #631
- feat: add cc, acc method for deepseek2 by @blueswhen in #618
- fix fp8 weight quant need contiguous tensor by @hiworldwzj in #632
- add qwen backend for internvl by @shihaobai in #635
- complete test module by @WANDY666 in #634
- fix test by @hiworldwzj in #636
- Fix batch test by @WANDY666 in #637
- Fix acc by @shihaobai in #638
- fix typo by @shihaobai in #639
- better pickle mode by @hiworldwzj in #641
- fix log by @hiworldwzj in #642
- fix rpyc tcp delay by @hiworldwzj in #645
- fix rpyc tcp delay by @hiworldwzj in #646
- add init.py by @hiworldwzj in #648
- fix format by @hiworldwzj in #649
- bug fix for max len prefill check error by @hiworldwzj in #650
- fix_openai_chat_bug by @SangChengC in #651
- rpyc and zmq use unix socket. by @hiworldwzj in #653
- fix reduce stuck on h100 with graph by @shihaobai in #654
- [misc] Support deepseek splitfuse mode. by @sufubao in #652
- refactor reduce by @shihaobai in #655
- update transformers by @WANDY666 in #656
- [kernel] Remove splitfuse kernel by @sufubao in #657
- opt: refactor some code for acc by @blueswhen in #643
- Compatible with lower versions of torch by @shihaobai in #658
- independent of vllm by @WANDY666 in #647
- Static quant for vllm-w8a8 by @shihaobai in #659
- fix api_start.py by @hiworldwzj in #660
- [BugFix] Fix silu kernel by @sufubao in #661
- [Feature] improve p d mode performance. by @hiworldwzj in #664
- [feature]moe etp done, without group greed. (Charlotteroes main) by @WANDY666 in #665
- 【Improve】 pd mode prell and decode node use parral to handle batch reqs. by @hiworldwzj in #666
- pd mode. batch kv trans. by @hiworldwzj in #667
- gqa attention add Tuning code by @hiworldwzj in #668
- Fix deepseek & update v1/completions by @shihaobai in #671
- Deepseek rope fix by @shihaobai in #672
- fix mistral13b tp by @shihaobai in #673
- fix init by @shihaobai in #674
- support for dp + tp end2end by @shihaobai in #670
- fix bug for requrements.txt by @hiworldwzj in #677
- add generation cfg parse by @shihaobai in #675
- misc: update python requirements by @WuSiYu in #680
- Vit triton (fp + quant / tp + dp), custom image pre_process by @shihaobai in #663
- add kernel config tuning way to get better performance. by @hiworldwzj in #681
- add H800 grouped moe kernel configs. by @hiworldwzj in #682
- fix tunning code. by @hiworldwzj in #683
- fix kernel config json load. by @hiworldwzj in #684
- fix config search type error. by @hiworldwzj in #685
- add A800 grouped moe kernel json configs. by @hiworldwzj in #686
- add H800 and A800 mla decode configs. by @hiworldwzj in #687
- remove vllm fuse_moe kernel and add moe_sum_reduce, moe_silu_and_mul kernel. by @hiworldwzj in #688
- update configs. by @hiworldwzj in #689
- update health check by @shihaobai in #690
- pd mode use p2p triton kernel to manage kv trans && refactor deepseekv2 code by @hiworldwzj in #691
- add custom allgather(into tensor) by @shihaobai in #692
- better cuda graph mla decode kernel. by @hiworldwzj in #694
- update config and fix dp router error. by @hiworldwzj in #695
- udpate mla decode attention. by @hiworldwzj in #696
- overlap post sample. by @hiworldwzj in #697
- add sample_param logs and env: HEALTH_TIMEOUT by @shihaobai in #698
- [debug]: add some debug log by @shihaobai in #699
- feat: add _context_attention_kernel_with_CC in deepseek2 by @blueswhen in #693
- refactor quantization for static quantized weight loading and add deepseek_v3 by @shihaobai in #702
- fix: vit config by @shihaobai in #704
- Add a switch to custom_allgather import by @shihaobai in #705
- fix bug for waiting queue. by @hiworldwzj in #706
- refine log by @shihaobai in #707
- feat add bmm_scaled_fp8 by @blueswhen in #703
- fix vit quantcfg init error. by @hiworldwzj in #709
- fix openai stream & update chat template by @shihaobai in #710
- docs: add blog link by @PannenetsF in #713
- docs: redesign the arch by @PannenetsF in #714
- parse stop_sequences by @shihaobai in #715
- Vsm llama pr by @PannenetsF in #700
New Contributors
- @blueswhen made their first contribution in #618
Full Changelog: 3.0.0...3.0.1.bak
3.0.0
Version 3.0.0
Updates
- Support for CUDA Graph
- Improved Memory Management
- Startup Optimization
- Use
--mem_fraction
- Automatically read
eos_id
anddata_type
information fromconfig.json
- Use
- Refactor of Multimodal Code