Make SGLang go brrr
The examples in this repo primarily focus on the H100 and B200 GPUs. All benchmarks are proudly powered by genai-bench➚.
In production serving, features such as EAGLE 3, function calling, JSON mode, reasoning parser, metrics, and crash dump are usually enabled to improve performance, reliability, monitoring, and troubleshooting. The EAGLE 3 training pipeline is proudly powered by SpecForge➚. For Kubernetes deployment, we recommend using OME, a production-ready and scalable orchestration framework designed for LLM serving.