-
Notifications
You must be signed in to change notification settings - Fork 19
Release blog for 0.3 #105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release blog for 0.3 #105
Conversation
✅ Deploy Preview for elaborate-kangaroo-25e1ee ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
1b2442e to
2cf16af
Compare
|
|
||
| # llm-d 0.3: Wider Well-Lit Paths for Scalable Inference | ||
|
|
||
| In our [0.2 release](https://llm-d.ai/blog/llm-d-v0.2-our-first-well-lit-paths), we introduced the first *well-lit paths*, tested blueprints for scaling inference on Kubernetes. With 0.3 release, we double down on the mission: to provide a fast path to deploying high performance, hardware-agnostic, easy to operationalize, at scale inference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are definitely nits/opinions, but:
| In our [0.2 release](https://llm-d.ai/blog/llm-d-v0.2-our-first-well-lit-paths), we introduced the first *well-lit paths*, tested blueprints for scaling inference on Kubernetes. With 0.3 release, we double down on the mission: to provide a fast path to deploying high performance, hardware-agnostic, easy to operationalize, at scale inference. | |
| In our [0.2 release](https://llm-d.ai/blog/llm-d-v0.2-our-first-well-lit-paths), we introduced the first *well-lit paths*, tested blueprints for scaling inference on Kubernetes. With 0.3 release, we double down on the [mission](#commit-to-the-mission): to provide a fast path to deploying high performance, hardware-agnostic, easy to operationalize, at-scale inference. |
blog/2025-10-10_llm-d-v0.3-expanded-hardware-faster-perf-and-igw-ga.md
Outdated
Show resolved
Hide resolved
blog/2025-10-10_llm-d-v0.3-expanded-hardware-faster-perf-and-igw-ga.md
Outdated
Show resolved
Hide resolved
blog/2025-10-10_llm-d-v0.3-expanded-hardware-faster-perf-and-igw-ga.md
Outdated
Show resolved
Hide resolved
|
|
||
| Serving LLM is complex \- our documentation and configuration should be simple. Quickstarts have been streamlined and renamed to guides, with fewer options and more context around the key decisions you need to make. They are now located in the main repository and treated as living documents alongside our growing documentation for common scenarios. Since llm-d is about exposing the key tradeoffs and exposing useful patterns, we’ve split out the key prerequisites for each guide \- cluster configuration, client setup, and gateway choice \- into their own sections, and replaced our all-in-one installer scripts with better step by step instructions. | ||
|
|
||
| As more cluster providers integrate into llm-d we’ve expanded the documentation for infrastructure with provider specific troubleshooting, configuration, and testing. This release adds documentation and steps for CoreWeave, Digital Ocean, Google Kubernetes Engine, and OpenShift. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| As more cluster providers integrate into llm-d we’ve expanded the documentation for infrastructure with provider specific troubleshooting, configuration, and testing. This release adds documentation and steps for CoreWeave, Digital Ocean, Google Kubernetes Engine, and OpenShift. | |
| To support more cluster providers integrating into llm-d, we’ve expanded the documentation for infrastructure with provider specific troubleshooting, configuration, and testing. This release adds documentation and steps for CoreWeave, Digital Ocean, Google Kubernetes Engine, and OpenShift. |
blog/2025-10-10_llm-d-v0.3-expanded-hardware-faster-perf-and-igw-ga.md
Outdated
Show resolved
Hide resolved
| As more cluster providers integrate into llm-d we’ve expanded the documentation for infrastructure with provider specific troubleshooting, configuration, and testing. This release adds documentation and steps for CoreWeave, Digital Ocean, Google Kubernetes Engine, and OpenShift. | ||
| Guides now include curated Inference Gateway installs and static manifests for clarity, with overlays available for benchmarking sweeps. RBAC patterns were refactored toward namespace scope for smoother multi-tenancy | ||
|
|
||
| **Why it matters:** With 0.3, experimenting with intelligent scheduling or disaggregation is as simple as running a documented guide. The control plane is more transparent, reproducible, and extensible, independently of the platform you are running. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| **Why it matters:** With 0.3, experimenting with intelligent scheduling or disaggregation is as simple as running a documented guide. The control plane is more transparent, reproducible, and extensible, independently of the platform you are running. | |
| **Why it matters:** With 0.3, experimenting with intelligent scheduling or disaggregation is as simple as running a documented guide. The control plane is more transparent, reproducible, and extensible; independent of the platform you are running. |
blog/2025-10-10_llm-d-v0.3-expanded-hardware-faster-perf-and-igw-ga.md
Outdated
Show resolved
Hide resolved
blog/2025-10-10_llm-d-v0.3-expanded-hardware-faster-perf-and-igw-ga.md
Outdated
Show resolved
Hide resolved
blog/2025-10-10_llm-d-v0.3-expanded-hardware-faster-perf-and-igw-ga.md
Outdated
Show resolved
Hide resolved
blog/2025-10-10_llm-d-v0.3-expanded-hardware-faster-perf-and-igw-ga.md
Outdated
Show resolved
Hide resolved
|
/lgtm |
|
|
||
| ### **Wide-EP Performance** | ||
|
|
||
| The wide-EP path, which parallelizes across experts to maximize throughput, has reached **2.7k tokens/s per GPU** in community benchmarks on H200 clusters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2.2k
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decode tokens per second
blog/2025-10-10_llm-d-v0.3-expanded-hardware-faster-perf-and-igw-ga.md
Outdated
Show resolved
Hide resolved
|
LGTM, pending nit on the topline performance number |
|
Added a truncation comment, but feel free to adjust the location, this just ensures the main /blog landing page isn't just the entire blog but a list of them. |
|
Link failure build can be ignored, that's just the link for this blog and will be "fixed" when it goes live. |
Signed-off-by: Clayton Coleman <[email protected]>
Link to release Signed-off-by: Clayton Coleman <[email protected]>
No description provided.