Release blog for 0.3 #105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

smarterclayton merged 2 commits into llm-d:main from smarterclayton:blog

Oct 10, 2025

Contributor

smarterclayton commented Oct 10, 2025

No description provided.

netlify bot commented Oct 10, 2025 •

edited

Loading

✅ Deploy Preview for elaborate-kangaroo-25e1ee ready!

Name	Link
🔨 Latest commit	`abe26b4`
🔍 Latest deploy log	https://app.netlify.com/projects/elaborate-kangaroo-25e1ee/deploys/68e983f6f7f34b00082844e3
😎 Deploy Preview	https://deploy-preview-105--elaborate-kangaroo-25e1ee.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

smarterclayton force-pushed the blog branch 2 times, most recently from 1b2442e to 2cf16af Compare

October 10, 2025 18:17

kfswain reviewed

View reviewed changes

blog/2025-10-10_llm-d-v0.3-expanded-hardware-faster-perf-and-igw-ga.md Outdated


		# llm-d 0.3: Wider Well-Lit Paths for Scalable Inference

		In our [0.2 release](https://llm-d.ai/blog/llm-d-v0.2-our-first-well-lit-paths), we introduced the first well-lit paths, tested blueprints for scaling inference on Kubernetes. With 0.3 release, we double down on the mission: to provide a fast path to deploying high performance, hardware-agnostic, easy to operationalize, at scale inference.

kfswain Oct 10, 2025

These are definitely nits/opinions, but:

Suggested change

      
            In our [0.2 release](https://llm-d.ai/blog/llm-d-v0.2-our-first-well-lit-paths), we introduced the first *well-lit paths*, tested blueprints for scaling inference on Kubernetes. With 0.3 release, we double down on the mission: to provide a fast path to deploying high performance, hardware-agnostic, easy to operationalize, at scale inference.
          
            In our [0.2 release](https://llm-d.ai/blog/llm-d-v0.2-our-first-well-lit-paths), we introduced the first *well-lit paths*, tested blueprints for scaling inference on Kubernetes. With 0.3 release, we double down on the [mission](#commit-to-the-mission): to provide a fast path to deploying high performance, hardware-agnostic, easy to operationalize, at-scale inference.

blog/2025-10-10_llm-d-v0.3-expanded-hardware-faster-perf-and-igw-ga.md Outdated Show resolved Hide resolved

blog/2025-10-10_llm-d-v0.3-expanded-hardware-faster-perf-and-igw-ga.md Outdated Show resolved Hide resolved

blog/2025-10-10_llm-d-v0.3-expanded-hardware-faster-perf-and-igw-ga.md Outdated Show resolved Hide resolved

blog/2025-10-10_llm-d-v0.3-expanded-hardware-faster-perf-and-igw-ga.md Outdated


		Serving LLM is complex \- our documentation and configuration should be simple. Quickstarts have been streamlined and renamed to guides, with fewer options and more context around the key decisions you need to make. They are now located in the main repository and treated as living documents alongside our growing documentation for common scenarios. Since llm-d is about exposing the key tradeoffs and exposing useful patterns, we’ve split out the key prerequisites for each guide \- cluster configuration, client setup, and gateway choice \- into their own sections, and replaced our all-in-one installer scripts with better step by step instructions.

		As more cluster providers integrate into llm-d we’ve expanded the documentation for infrastructure with provider specific troubleshooting, configuration, and testing. This release adds documentation and steps for CoreWeave, Digital Ocean, Google Kubernetes Engine, and OpenShift.

kfswain Oct 10, 2025

Suggested change

      
            As more cluster providers integrate into llm-d we’ve expanded the documentation for infrastructure with provider specific troubleshooting, configuration, and testing. This release adds documentation and steps for CoreWeave, Digital Ocean, Google Kubernetes Engine, and OpenShift.   
          
            To support more cluster providers integrating into llm-d, we’ve expanded the documentation for infrastructure with provider specific troubleshooting, configuration, and testing. This release adds documentation and steps for CoreWeave, Digital Ocean, Google Kubernetes Engine, and OpenShift.

blog/2025-10-10_llm-d-v0.3-expanded-hardware-faster-perf-and-igw-ga.md Outdated Show resolved Hide resolved

blog/2025-10-10_llm-d-v0.3-expanded-hardware-faster-perf-and-igw-ga.md

+              As more cluster providers integrate into llm-d we’ve expanded the documentation for infrastructure with provider specific troubleshooting, configuration, and testing. This release adds documentation and steps for CoreWeave, Digital Ocean, Google Kubernetes Engine, and OpenShift.
+              Guides now include curated Inference Gateway installs and static manifests for clarity, with overlays available for benchmarking sweeps. RBAC patterns were refactored toward namespace scope for smoother multi-tenancy
+              **Why it matters:** With 0.3, experimenting with intelligent scheduling or disaggregation is as simple as running a documented guide. The control plane is more transparent, reproducible, and extensible, independently of the platform you are running.

kfswain Oct 10, 2025

Suggested change

      
            **Why it matters:** With 0.3, experimenting with intelligent scheduling or disaggregation is as simple as running a documented guide. The control plane is more transparent, reproducible, and extensible, independently of the platform you are running. 
          
            **Why it matters:** With 0.3, experimenting with intelligent scheduling or disaggregation is as simple as running a documented guide. The control plane is more transparent, reproducible, and extensible; independent of the platform you are running.

blog/2025-10-10_llm-d-v0.3-expanded-hardware-faster-perf-and-igw-ga.md Outdated Show resolved Hide resolved

blog/2025-10-10_llm-d-v0.3-expanded-hardware-faster-perf-and-igw-ga.md Outdated Show resolved Hide resolved

kfswain reviewed

View reviewed changes

blog/2025-10-10_llm-d-v0.3-expanded-hardware-faster-perf-and-igw-ga.md Outdated Show resolved Hide resolved

blog/2025-10-10_llm-d-v0.3-expanded-hardware-faster-perf-and-igw-ga.md Outdated Show resolved Hide resolved

chcost self-assigned this

chcost commented Oct 10, 2025

/lgtm

smarterclayton force-pushed the blog branch from 2cf16af to 331c8cc Compare

October 10, 2025 19:47

robertgshaw2-redhat reviewed

View reviewed changes

blog/2025-10-10_llm-d-v0.3-expanded-hardware-faster-perf-and-igw-ga.md Outdated


		### Wide-EP Performance

		The wide-EP path, which parallelizes across experts to maximize throughput, has reached 2.7k tokens/s per GPU in community benchmarks on H200 clusters.

robertgshaw2-redhat Oct 10, 2025

2.2k

robertgshaw2-redhat Oct 10, 2025

decode tokens per second

robertgshaw2-redhat reviewed

View reviewed changes

blog/2025-10-10_llm-d-v0.3-expanded-hardware-faster-perf-and-igw-ga.md Outdated Show resolved Hide resolved

robertgshaw2-redhat commented Oct 10, 2025

LGTM, pending nit on the topline performance number

Collaborator

petecheslock commented Oct 10, 2025

Added a truncation comment, but feel free to adjust the location, this just ensures the main /blog landing page isn't just the entire blog but a list of them.

petecheslock approved these changes

View reviewed changes

Collaborator

petecheslock commented Oct 10, 2025

Link failure build can be ignored, that's just the link for this blog and will be "fixed" when it goes live.


          Release blog for 0.3

0c49aff

Signed-off-by: Clayton Coleman <[email protected]>

smarterclayton force-pushed the blog branch from 2029db8 to 0c49aff Compare

October 10, 2025 21:57


          Update 2025-10-10_llm-d-v0.3-expanded-hardware-faster-perf-and-igw-ga.md

abe26b4

Link to release

Signed-off-by: Clayton Coleman <[email protected]>

smarterclayton merged commit 6a18cef into llm-d:main

5 of 6 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet