Roadmap

- [ ] Support Qwen3 VL
- [ ] Integrate FlashMLA
- [ ] Integrate DeepGEMM
- [ ] Async schedule
- [ ] Use flashinfer sample kernel
- [ ] Release gLLM to PyPI
- [x] Optimize Input data creation #24 #75 #76
- [x] CUDA graph #134 #135 #136 #137 #139 #140 #141
- [x] Profile run at begain #137
- [x] Integrate Token Throttling into vLLM https://github.com/vllm-project/vllm/issues/20298
- [x] Support VLLM #108
- [x] Support Deepseek V2/3 #130 #104 #103
- [x] Quantization #128 #129 #91 #94
- [x] EP #79
- [x] Support MoE model #52
- [x] Preempt Seqs when KV cache is used up 
- [x] Chunked prefill #23 
- [x] TP #72 
- [x] PP #15 #16 #17
- [x] Add Multi node support #40
- [x] Tuning MoE kernel configuration #72
- [x] Abort requests #68
- [x] Upgrade pytorch version #71

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Roadmap #20

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Roadmap #20

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions