Releases · ModelTC/lightllm

This tag is used for the last merge before the reconstruction backup.

What's Changed

【Feature】PD Mode Support by @hiworldwzj in #607
refact quantization, support torchao quant and vllm w8a8(int/fp), support mix quantization. by @shihaobai in #596
Deepseek2 Support PD mode by @hiworldwzj in #614
add vllm pynccl for cuda graph compatibility by @WANDY666 in #615
optimze decode mla att by @shihaobai in #616
upgrade deepseek kv copy & fix test/model_infer.py by @shihaobai in #617
complete all_reduce and test by @WANDY666 in #619
LIGHTLLM_PYNCCL_ENABLE default False by @WANDY666 in #620
[model] support Qwen2.5-RM by @sufubao in #621
add DP framework support. by @hiworldwzj in #622
fix deepseekv2-lite tp>1 by @shihaobai in #623
Deepseek fix for moe fp8 by @shihaobai in #624
fix shm conflict error by @hiworldwzj in #625
Refactor reduce by @shihaobai in #626
fix mem alloc by @hiworldwzj in #627
update pd master mode time out to 30s. by @hiworldwzj in #629
fix set quantization by @shihaobai in #631
feat: add cc, acc method for deepseek2 by @blueswhen in #618
fix fp8 weight quant need contiguous tensor by @hiworldwzj in #632
add qwen backend for internvl by @shihaobai in #635
complete test module by @WANDY666 in #634
fix test by @hiworldwzj in #636
Fix batch test by @WANDY666 in #637
Fix acc by @shihaobai in #638
fix typo by @shihaobai in #639
better pickle mode by @hiworldwzj in #641
fix log by @hiworldwzj in #642
fix rpyc tcp delay by @hiworldwzj in #645
fix rpyc tcp delay by @hiworldwzj in #646
add init.py by @hiworldwzj in #648
fix format by @hiworldwzj in #649
bug fix for max len prefill check error by @hiworldwzj in #650
fix_openai_chat_bug by @SangChengC in #651
rpyc and zmq use unix socket. by @hiworldwzj in #653
fix reduce stuck on h100 with graph by @shihaobai in #654
[misc] Support deepseek splitfuse mode. by @sufubao in #652
refactor reduce by @shihaobai in #655
update transformers by @WANDY666 in #656
[kernel] Remove splitfuse kernel by @sufubao in #657
opt: refactor some code for acc by @blueswhen in #643
Compatible with lower versions of torch by @shihaobai in #658
independent of vllm by @WANDY666 in #647
Static quant for vllm-w8a8 by @shihaobai in #659
fix api_start.py by @hiworldwzj in #660
[BugFix] Fix silu kernel by @sufubao in #661
[Feature] improve p d mode performance. by @hiworldwzj in #664
[feature]moe etp done, without group greed. (Charlotteroes main) by @WANDY666 in #665
【Improve】 pd mode prell and decode node use parral to handle batch reqs. by @hiworldwzj in #666
pd mode. batch kv trans. by @hiworldwzj in #667
gqa attention add Tuning code by @hiworldwzj in #668
Fix deepseek & update v1/completions by @shihaobai in #671
Deepseek rope fix by @shihaobai in #672
fix mistral13b tp by @shihaobai in #673
fix init by @shihaobai in #674
support for dp + tp end2end by @shihaobai in #670
fix bug for requrements.txt by @hiworldwzj in #677
add generation cfg parse by @shihaobai in #675
misc: update python requirements by @WuSiYu in #680
Vit triton (fp + quant / tp + dp), custom image pre_process by @shihaobai in #663
add kernel config tuning way to get better performance. by @hiworldwzj in #681
add H800 grouped moe kernel configs. by @hiworldwzj in #682
fix tunning code. by @hiworldwzj in #683
fix kernel config json load. by @hiworldwzj in #684
fix config search type error. by @hiworldwzj in #685
add A800 grouped moe kernel json configs. by @hiworldwzj in #686
add H800 and A800 mla decode configs. by @hiworldwzj in #687
remove vllm fuse_moe kernel and add moe_sum_reduce, moe_silu_and_mul kernel. by @hiworldwzj in #688
update configs. by @hiworldwzj in #689
update health check by @shihaobai in #690
pd mode use p2p triton kernel to manage kv trans && refactor deepseekv2 code by @hiworldwzj in #691
add custom allgather(into tensor) by @shihaobai in #692
better cuda graph mla decode kernel. by @hiworldwzj in #694
update config and fix dp router error. by @hiworldwzj in #695
udpate mla decode attention. by @hiworldwzj in #696
overlap post sample. by @hiworldwzj in #697
add sample_param logs and env: HEALTH_TIMEOUT by @shihaobai in #698
[debug]: add some debug log by @shihaobai in #699
feat: add _context_attention_kernel_with_CC in deepseek2 by @blueswhen in #693
refactor quantization for static quantized weight loading and add deepseek_v3 by @shihaobai in #702
fix: vit config by @shihaobai in #704
Add a switch to custom_allgather import by @shihaobai in #705
fix bug for waiting queue. by @hiworldwzj in #706
refine log by @shihaobai in #707
feat add bmm_scaled_fp8 by @blueswhen in #703
fix vit quantcfg init error. by @hiworldwzj in #709
fix openai stream & update chat template by @shihaobai in #710
docs: add blog link by @PannenetsF in #713
docs: redesign the arch by @PannenetsF in #714
parse stop_sequences by @shihaobai in #715
Vsm llama pr by @PannenetsF in #700