Implement upstream inference gateway integration with separated vLLM components (fixes #312) #321

jeremyeder · 2025-06-13T23:44:03Z

Summary

This PR implements a modular architecture that leverages upstream inference gateway charts while maintaining existing llm-d patterns, fully addressing issue #312.

Key Changes

🆕 New Charts:

llm-d-vllm: Dedicated chart for vLLM model serving components
llm-d-umbrella: Orchestration chart combining upstream inferencepool with vLLM

🏗️ Architecture Benefits:

Modular Design: Clean separation between inference gateway and model serving
Upstream Integration: Leverages official Gateway API Inference Extension charts from kubernetes-sigs/gateway-api-inference-extension
Backward Compatibility: Maintains existing deployment patterns and CRDs
Enhanced Routing: Intelligent load balancing and endpoint selection via InferencePool

🧪 Testing & Validation:

✅ Comprehensive Test Suite:

4 test templates across both charts with proper Helm test annotations
YAML syntax validation for both charts
Template rendering validation with variable substitution
ModelService functionality testing for vLLM components
Integration testing for umbrella chart orchestration
Helper function validation (all required functions present)

✅ Test Results:

All test templates have valid YAML syntax and Pod structure
All tests include required helm.sh/hook annotations with proper execution ordering
Both charts have complete helper function libraries
Charts are deployment-ready with helm install and helm test support
Full compliance with Helm best practices

Files Added

charts/llm-d-vllm/ - Complete vLLM model serving chart (9 files)
charts/llm-d-umbrella/ - Umbrella orchestration chart (10 files)
charts/IMPLEMENTATION_SUMMARY.md - Complete architecture documentation

Test Plan

YAML syntax validation passes
Template rendering validation passes
Helper functions work correctly
Helm test annotations are proper
Charts follow existing style patterns
Integration with upstream inferencepool chart validated

Migration Path

The implementation provides a clear migration path from the monolithic llm-d chart to the new modular architecture while maintaining full backward compatibility.

Implementation Details

See charts/IMPLEMENTATION_SUMMARY.md for complete architectural overview, benefits achieved, and future enhancement opportunities.

Closes #312

…components Addresses issue llm-d#312 by creating a modular architecture that leverages upstream inference gateway charts while maintaining existing llm-d patterns. ## New Charts: - **llm-d-vllm**: Dedicated vLLM model serving components - **llm-d-umbrella**: Orchestration chart using upstream inferencepool ## Key Benefits: - True upstream integration with kubernetes-sigs/gateway-api-inference-extension - Modular design with clean separation of concerns - Intelligent load balancing and endpoint selection via InferencePool - Maintains backward compatibility with existing deployments ## Validation: - Comprehensive test suite with 4 test templates - Helm dependency build and lint pass successfully - Deployment-ready charts following existing patterns Uses correct OCI registry: us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/charts Fixes vLLM capitalization throughout codebase

ahg-g

sorry, posted a message on the wrong PR :)

ahg-g · 2025-07-01T17:55:42Z

charts/IMPLEMENTATION_SUMMARY.md

+
+### 2. `llm-d-umbrella` Chart
+
+**Purpose**: Combines upstream InferencePool with vLLM chart


I am not totally against an llm-d umbrella chart, we could have that; but I believe it is key to have instructions to deploy the two core components of vllm-d independently:

A helm chart to deploy the vllm server (with the side car and set up with the right flags)

Instructions to deploy an inference gateway (InferencePool resource+vllm-d EPP image) via the upstream chart [1] that points to the vllm deployment above.

[1] https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/config/charts/inferencepool

This allows composing with customers existing infra (most already have a gateway deployed for example) and composes with the IGW much better.

jeremyeder force-pushed the feature/upstream-inference-gateway-integration branch from 39b6b4e to 963d9fb Compare June 14, 2025 00:02

ahg-g reviewed Jun 17, 2025

View reviewed changes

nekomeowww mentioned this pull request Jun 20, 2025

Added Gateway API Inference Extension (GIE) installation for quickstart docs #332

Open

ahg-g reviewed Jul 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement upstream inference gateway integration with separated vLLM components (fixes #312) #321

Implement upstream inference gateway integration with separated vLLM components (fixes #312) #321

Uh oh!

jeremyeder commented Jun 13, 2025

Uh oh!

ahg-g left a comment •

edited

Loading

Uh oh!

ahg-g Jul 1, 2025

Uh oh!

Uh oh!


		### 2. `llm-d-umbrella` Chart

		Purpose: Combines upstream InferencePool with vLLM chart

Implement upstream inference gateway integration with separated vLLM components (fixes #312) #321

Are you sure you want to change the base?

Implement upstream inference gateway integration with separated vLLM components (fixes #312) #321

Uh oh!

Conversation

jeremyeder commented Jun 13, 2025

Summary

Key Changes

🆕 New Charts:

🏗️ Architecture Benefits:

🧪 Testing & Validation:

Files Added

Test Plan

Migration Path

Implementation Details

Uh oh!

ahg-g left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahg-g Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ahg-g left a comment •

edited

Loading