Skip to content

Implement upstream inference gateway integration with separated vLLM components (fixes #312) #321

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jeremyeder
Copy link
Member

Summary

This PR implements a modular architecture that leverages upstream inference gateway charts while maintaining existing llm-d patterns, fully addressing issue #312.

Key Changes

🆕 New Charts:

  • llm-d-vllm: Dedicated chart for vLLM model serving components
  • llm-d-umbrella: Orchestration chart combining upstream inferencepool with vLLM

🏗️ Architecture Benefits:

  • Modular Design: Clean separation between inference gateway and model serving
  • Upstream Integration: Leverages official Gateway API Inference Extension charts from kubernetes-sigs/gateway-api-inference-extension
  • Backward Compatibility: Maintains existing deployment patterns and CRDs
  • Enhanced Routing: Intelligent load balancing and endpoint selection via InferencePool

🧪 Testing & Validation:

Comprehensive Test Suite:

  • 4 test templates across both charts with proper Helm test annotations
  • YAML syntax validation for both charts
  • Template rendering validation with variable substitution
  • ModelService functionality testing for vLLM components
  • Integration testing for umbrella chart orchestration
  • Helper function validation (all required functions present)

Test Results:

  • All test templates have valid YAML syntax and Pod structure
  • All tests include required helm.sh/hook annotations with proper execution ordering
  • Both charts have complete helper function libraries
  • Charts are deployment-ready with helm install and helm test support
  • Full compliance with Helm best practices

Files Added

  • charts/llm-d-vllm/ - Complete vLLM model serving chart (9 files)
  • charts/llm-d-umbrella/ - Umbrella orchestration chart (10 files)
  • charts/IMPLEMENTATION_SUMMARY.md - Complete architecture documentation

Test Plan

  • YAML syntax validation passes
  • Template rendering validation passes
  • Helper functions work correctly
  • Helm test annotations are proper
  • Charts follow existing style patterns
  • Integration with upstream inferencepool chart validated

Migration Path

The implementation provides a clear migration path from the monolithic llm-d chart to the new modular architecture while maintaining full backward compatibility.

Implementation Details

See charts/IMPLEMENTATION_SUMMARY.md for complete architectural overview, benefits achieved, and future enhancement opportunities.

Closes #312

…components

Addresses issue llm-d#312 by creating a modular architecture that leverages upstream
inference gateway charts while maintaining existing llm-d patterns.

## New Charts:
- **llm-d-vllm**: Dedicated vLLM model serving components
- **llm-d-umbrella**: Orchestration chart using upstream inferencepool

## Key Benefits:
- True upstream integration with kubernetes-sigs/gateway-api-inference-extension
- Modular design with clean separation of concerns
- Intelligent load balancing and endpoint selection via InferencePool
- Maintains backward compatibility with existing deployments

## Validation:
- Comprehensive test suite with 4 test templates
- Helm dependency build and lint pass successfully
- Deployment-ready charts following existing patterns

Uses correct OCI registry: us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/charts

Fixes vLLM capitalization throughout codebase
@jeremyeder jeremyeder force-pushed the feature/upstream-inference-gateway-integration branch from 39b6b4e to 963d9fb Compare June 14, 2025 00:02
Copy link

@ahg-g ahg-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, posted a message on the wrong PR :)


### 2. `llm-d-umbrella` Chart

**Purpose**: Combines upstream InferencePool with vLLM chart
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not totally against an llm-d umbrella chart, we could have that; but I believe it is key to have instructions to deploy the two core components of vllm-d independently:

  1. A helm chart to deploy the vllm server (with the side car and set up with the right flags)
  2. Instructions to deploy an inference gateway (InferencePool resource+vllm-d EPP image) via the upstream chart [1] that points to the vllm deployment above.

[1] https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/config/charts/inferencepool

This allows composing with customers existing infra (most already have a gateway deployed for example) and composes with the IGW much better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use the upstream inference gateway helm charts
2 participants