Skip to content

Conversation

@smarterclayton
Copy link
Contributor

No description provided.

@netlify
Copy link

netlify bot commented Oct 10, 2025

Deploy Preview for elaborate-kangaroo-25e1ee ready!

Name Link
🔨 Latest commit abe26b4
🔍 Latest deploy log https://app.netlify.com/projects/elaborate-kangaroo-25e1ee/deploys/68e983f6f7f34b00082844e3
😎 Deploy Preview https://deploy-preview-105--elaborate-kangaroo-25e1ee.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@smarterclayton smarterclayton force-pushed the blog branch 2 times, most recently from 1b2442e to 2cf16af Compare October 10, 2025 18:17

# llm-d 0.3: Wider Well-Lit Paths for Scalable Inference

In our [0.2 release](https://llm-d.ai/blog/llm-d-v0.2-our-first-well-lit-paths), we introduced the first *well-lit paths*, tested blueprints for scaling inference on Kubernetes. With 0.3 release, we double down on the mission: to provide a fast path to deploying high performance, hardware-agnostic, easy to operationalize, at scale inference.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are definitely nits/opinions, but:

Suggested change
In our [0.2 release](https://llm-d.ai/blog/llm-d-v0.2-our-first-well-lit-paths), we introduced the first *well-lit paths*, tested blueprints for scaling inference on Kubernetes. With 0.3 release, we double down on the mission: to provide a fast path to deploying high performance, hardware-agnostic, easy to operationalize, at scale inference.
In our [0.2 release](https://llm-d.ai/blog/llm-d-v0.2-our-first-well-lit-paths), we introduced the first *well-lit paths*, tested blueprints for scaling inference on Kubernetes. With 0.3 release, we double down on the [mission](#commit-to-the-mission): to provide a fast path to deploying high performance, hardware-agnostic, easy to operationalize, at-scale inference.


Serving LLM is complex \- our documentation and configuration should be simple. Quickstarts have been streamlined and renamed to guides, with fewer options and more context around the key decisions you need to make. They are now located in the main repository and treated as living documents alongside our growing documentation for common scenarios. Since llm-d is about exposing the key tradeoffs and exposing useful patterns, we’ve split out the key prerequisites for each guide \- cluster configuration, client setup, and gateway choice \- into their own sections, and replaced our all-in-one installer scripts with better step by step instructions.

As more cluster providers integrate into llm-d we’ve expanded the documentation for infrastructure with provider specific troubleshooting, configuration, and testing. This release adds documentation and steps for CoreWeave, Digital Ocean, Google Kubernetes Engine, and OpenShift.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
As more cluster providers integrate into llm-d we’ve expanded the documentation for infrastructure with provider specific troubleshooting, configuration, and testing. This release adds documentation and steps for CoreWeave, Digital Ocean, Google Kubernetes Engine, and OpenShift.
To support more cluster providers integrating into llm-d, we’ve expanded the documentation for infrastructure with provider specific troubleshooting, configuration, and testing. This release adds documentation and steps for CoreWeave, Digital Ocean, Google Kubernetes Engine, and OpenShift.

As more cluster providers integrate into llm-d we’ve expanded the documentation for infrastructure with provider specific troubleshooting, configuration, and testing. This release adds documentation and steps for CoreWeave, Digital Ocean, Google Kubernetes Engine, and OpenShift.
Guides now include curated Inference Gateway installs and static manifests for clarity, with overlays available for benchmarking sweeps. RBAC patterns were refactored toward namespace scope for smoother multi-tenancy

**Why it matters:** With 0.3, experimenting with intelligent scheduling or disaggregation is as simple as running a documented guide. The control plane is more transparent, reproducible, and extensible, independently of the platform you are running.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Why it matters:** With 0.3, experimenting with intelligent scheduling or disaggregation is as simple as running a documented guide. The control plane is more transparent, reproducible, and extensible, independently of the platform you are running.
**Why it matters:** With 0.3, experimenting with intelligent scheduling or disaggregation is as simple as running a documented guide. The control plane is more transparent, reproducible, and extensible; independent of the platform you are running.

@chcost chcost self-assigned this Oct 10, 2025
@chcost
Copy link

chcost commented Oct 10, 2025

/lgtm


### **Wide-EP Performance**

The wide-EP path, which parallelizes across experts to maximize throughput, has reached **2.7k tokens/s per GPU** in community benchmarks on H200 clusters.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2.2k

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decode tokens per second

@robertgshaw2-redhat
Copy link

LGTM, pending nit on the topline performance number

@petecheslock
Copy link
Collaborator

Added a truncation comment, but feel free to adjust the location, this just ensures the main /blog landing page isn't just the entire blog but a list of them.

@petecheslock
Copy link
Collaborator

Link failure build can be ignored, that's just the link for this blog and will be "fixed" when it goes live.

Signed-off-by: Clayton Coleman <[email protected]>
@smarterclayton smarterclayton merged commit 6a18cef into llm-d:main Oct 10, 2025
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants