Skip to content

Added Gateway API Inference Extension (GIE) installation for quickstart docs #332

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

nekomeowww
Copy link

Summary

Currently, if Gateway API Inference Extension (GIE) not installed in a fresh new testing used cluster, the following error will throw when following the guide of Quickstart, and for the helm installed release, the created llm-d-modelservice will fail due to timeout on waiting CRD points to InferencePool.inference.networking.x-k8s.io:

{
  "level": "error",
  "ts": "2025-06-20T11:02:53.466245232Z",
  "logger": "controller-runtime.source.EventHandler",
  "caller": "source/kind.go:71",
  "msg": "if kind is a CRD, it should be installed before calling Start",
  "kind": "InferencePool.inference.networking.x-k8s.io",
  "error": "no matches for kind \"InferencePool\" in version \"inference.networking.x-k8s.io/v1alpha2\"",
  "stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind[...]).Start.func1.1\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/source/kind.go:71\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/loop.go:53\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/loop.go:54\nk8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/poll.go:33\nsigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind[...]).Start.func1\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/source/kind.go:64"
}

While I do discovered that #321 is currently pending to be merged for adding the Inference Extension (GIE) as a sub chart, for others who currently trying out llm-d, it's obvious that missing pieces of GIE will result in errors.

This Pull Reuqest temporarily add a new section for asking to install GIE before installation.

Related

Related to #312.
Workaround for #321

@nerdalert
Copy link
Member

nerdalert commented Jun 22, 2025

Hi @nekomeowww thanks for taking a look! Can you checkout the ci-deps.sh script in chart-dependencies/ci-deps.sh that installs the latest GAIE v0.3.0 latest release CRDs with https://github.com/llm-d/llm-d-inference-scheduler/blob/main/deploy/components/crds-gie/kustomization.yaml. Not sure where the v0.8.0 GIE release in your patch came from?

Im pasting the logs from a quick install from main. Can you share what you are getting? Thanks!

$ minikube start     --driver docker     --container-runtime docker     --gpus all     --memory no-limit --cpus no-limit
😄  minikube v1.35.0 on  24.04
🎉  minikube 1.36.0 is available! Download it: https://github.com/kubernetes/minikube/releases/tag/v1.36.0
💡  To disable this notice, run: 'minikube config set WantUpdateNotification false'

✨  Using the docker driver based on existing profile
👍  Starting "minikube" primary control-plane node in "minikube" cluster
🚜  Pulling base image v0.0.46 ...
🤷  docker "minikube" container is missing, will recreate.
🔥  Creating docker container (CPUs=no-limit, Memory=no-limit) ...
🐳  Preparing Kubernetes v1.32.0 on Docker 27.4.1 ...
    ▪ Generating certificates and keys ...
    ▪ Booting up control plane ...
    ▪ Configuring RBAC rules ...
🔗  Configuring bridge CNI (Container Networking Interface) ...
🔎  Verifying Kubernetes components...
    ▪ Using image nvcr.io/nvidia/k8s-device-plugin:v0.17.0
    ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟  Enabled addons: nvidia-device-plugin, storage-provisioner, default-storageclass
🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default
@ip-172-31-39-185:~/v3-vanilla/llm-d-deployer/quickstart$ ./llmd-installer.sh --values-file examples/base/slim/base-slim.yaml --disable-metrics-collection
ℹ️  📂 Setting up script environment...
ℹ️  kubectl can reach to a running Kubernetes cluster.
✅ HF_TOKEN validated
ℹ️  🏗️ Installing GAIE Kubernetes infrastructure…
✅ 📜 Base CRDs: Installing...
customresourcedefinition.apiextensions.k8s.io/gatewayclasses.gateway.networking.k8s.io created
customresourcedefinition.apiextensions.k8s.io/gateways.gateway.networking.k8s.io created
customresourcedefinition.apiextensions.k8s.io/grpcroutes.gateway.networking.k8s.io created
customresourcedefinition.apiextensions.k8s.io/httproutes.gateway.networking.k8s.io created
customresourcedefinition.apiextensions.k8s.io/referencegrants.gateway.networking.k8s.io created
✅ 🚪 GAIE CRDs: Installing...
customresourcedefinition.apiextensions.k8s.io/inferencemodels.inference.networking.x-k8s.io created
customresourcedefinition.apiextensions.k8s.io/inferencepools.inference.networking.x-k8s.io created
✅ 🎒 Gateway provider 'istio': Installing...
Release "istio-base" does not exist. Installing it now.
Pulled: gcr.io/istio-testing/charts/base:1.26-alpha.9befed2f1439d883120f8de70fd70d84ca0ebc3d
Digest: sha256:d022e2d190b4acdb5abbf160e34a63b5bbd94d4c76ef8ec46bcb48c1e6e6c9c5
NAME: istio-base
LAST DEPLOYED: Sun Jun 22 02:37:07 2025
NAMESPACE: istio-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Istio base successfully installed!

To learn more about the release, try:
  $ helm status istio-base -n istio-system
  $ helm get all istio-base -n istio-system
Release "istiod" does not exist. Installing it now.
Pulled: gcr.io/istio-testing/charts/istiod:1.26-alpha.9befed2f1439d883120f8de70fd70d84ca0ebc3d
Digest: sha256:b6a16f2823041fe7410e0fc089283902dd4795e9e1eaac2406f985fba111993a
NAME: istiod
LAST DEPLOYED: Sun Jun 22 02:37:09 2025
NAMESPACE: istio-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
"istiod" successfully installed!

To learn more about the release, try:
  $ helm status istiod -n istio-system
  $ helm get all istiod -n istio-system

Next steps:
  * Deploy a Gateway: https://istio.io/latest/docs/setup/additional-setup/gateway/
  * Try out our tasks to get started on common configurations:
    * https://istio.io/latest/docs/tasks/traffic-management
    * https://istio.io/latest/docs/tasks/security/
    * https://istio.io/latest/docs/tasks/policy-enforcement/
  * Review the list of actively supported releases, CVE publications and our hardening guide:
    * https://istio.io/latest/docs/releases/supported-releases/
    * https://istio.io/latest/news/security/
    * https://istio.io/latest/docs/ops/best-practices/security/

For further documentation see https://istio.io website
✅ GAIE infra applied
ℹ️  📦 Creating namespace llm-d...
namespace/llm-d created
Context "minikube" modified.
✅ Namespace ready
ℹ️  🔹 Using merged values: /tmp/tmp.s53EvFotBU
ℹ️  🔐 Creating/updating HF token secret...
secret/llm-d-hf-token created
✅ HF token secret created
ℹ️  Fetching OCP proxy UID...
ℹ️  No OpenShift SCC annotation found; defaulting PROXY_UID=0
ℹ️  📜 Applying modelservice CRD...
customresourcedefinition.apiextensions.k8s.io/modelservices.llm-d.ai created
✅ ModelService CRD applied
ℹ️  ⏭️ Model download to PVC skipped: BYO model via HF repo_id selected.
protocol hf chosen - models will be downloaded JIT in inferencing pods.
"bitnami" already exists with the same configuration, skipping
ℹ️  🛠️ Building Helm chart dependencies...
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "prometheus-community" chart repository
...Successfully got an update from the "bitnami" chart repository
Update Complete. ⎈Happy Helming!⎈
Saving 2 charts
Downloading common from repo https://charts.bitnami.com/bitnami
Downloading redis from repo https://charts.bitnami.com/bitnami
Pulled: registry-1.docker.io/bitnamicharts/redis:20.13.4
Digest: sha256:6a389e13237e8e639ec0d445e785aa246b57bfce711b087033a196a291d5c8d7
Deleting outdated charts
✅ Dependencies built
ℹ️  Metrics collection disabled by user request
ℹ️  Metrics collection disabled by user request
ℹ️  🚚 Deploying llm-d chart with /tmp/tmp.s53EvFotBU...
Release "llm-d" does not exist. Installing it now.
NAME: llm-d
LAST DEPLOYED: Sun Jun 22 02:37:25 2025
NAMESPACE: llm-d
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thank you for installing llm-d.

Your release is named `llm-d`.

To learn more about the release, try:

$ helm status llm-d
$ helm get all llm-d

Following presets are available to your users:


| Name                                                              | Description                                                                                     |
| ----------------------------------------------------------------- | ----------------------------------------------------------------------------------------------- |
| basic-gpu-preset                                                  | Basic gpu inference                                                                             |
| basic-gpu-with-nixl-preset                                        | GPU inference with NIXL P/D KV transfer and cache offloading                                    |
| basic-gpu-with-nixl-and-redis-lookup-preset                       | GPU inference with NIXL P/D KV transfer, cache offloading and Redis lookup server               |
| basic-sim-preset                                                  | Basic simulation                                                                                |
✅ llm-d deployed
✅ 🎉 Installation complete.

@ip-172-31-39-185:~/v3-vanilla/llm-d-deployer/quickstart$ kubectl get pods     # --all-namespaces
NAME                                             READY   STATUS            RESTARTS   AGE
llm-d-inference-gateway-istio-5f968c59f5-ht7hj   1/1     Running           0          87s
llm-d-modelservice-567b57d87-8qhl6               1/1     Running           0          87s
qwen-qwen3-0-6b-decode-6cdd8996b9-nfk2l          1/2     PodInitializing   0          81s
qwen-qwen3-0-6b-epp-7568cb7c6b-grchw             1/1     Running           0          81s
@ip-172-31-39-185:~/v3-vanilla/llm-d-deployer/quickstart$ k logs llm-d-modelservice-567b57d87-8qhl6
{"level":"info","ts":"2025-06-22T02:37:31.899143389Z","logger":"setup","caller":"cmd/root.go:272","msg":"starting manager"}
{"level":"info","ts":"2025-06-22T02:37:31.89954771Z","caller":"manager/server.go:83","msg":"starting server","name":"health probe","addr":"[::]:8081"}
{"level":"info","ts":"2025-06-22T02:37:31.899727265Z","caller":"controller/controller.go:204","msg":"Starting EventSource","controller":"modelservice","controllerGroup":"llm-d.ai","controllerKind":"ModelService","source":"kind source: *v1.Service"}
{"level":"info","ts":"2025-06-22T02:37:31.899748685Z","caller":"controller/controller.go:204","msg":"Starting EventSource","controller":"modelservice","controllerGroup":"llm-d.ai","controllerKind":"ModelService","source":"kind source: *v1.ServiceAccount"}
{"level":"info","ts":"2025-06-22T02:37:31.899775226Z","caller":"controller/controller.go:204","msg":"Starting EventSource","controller":"modelservice","controllerGroup":"llm-d.ai","controllerKind":"ModelService","source":"kind source: *v1alpha1.ModelService"}
{"level":"info","ts":"2025-06-22T02:37:31.899773946Z","caller":"controller/controller.go:204","msg":"Starting EventSource","controller":"modelservice","controllerGroup":"llm-d.ai","controllerKind":"ModelService","source":"kind source: *v1.Deployment"}
{"level":"info","ts":"2025-06-22T02:37:31.899802347Z","caller":"controller/controller.go:204","msg":"Starting EventSource","controller":"modelservice","controllerGroup":"llm-d.ai","controllerKind":"ModelService","source":"kind source: *v1.RoleBinding"}
{"level":"info","ts":"2025-06-22T02:37:31.899796596Z","caller":"controller/controller.go:204","msg":"Starting EventSource","controller":"modelservice","controllerGroup":"llm-d.ai","controllerKind":"ModelService","source":"kind source: *v1.ConfigMap"}
{"level":"info","ts":"2025-06-22T02:37:31.899827507Z","caller":"controller/controller.go:204","msg":"Starting EventSource","controller":"modelservice","controllerGroup":"llm-d.ai","controllerKind":"ModelService","source":"kind source: *v1alpha2.InferenceModel"}
{"level":"info","ts":"2025-06-22T02:37:31.899739985Z","caller":"controller/controller.go:204","msg":"Starting EventSource","controller":"modelservice","controllerGroup":"llm-d.ai","controllerKind":"ModelService","source":"kind source: *v1alpha1.ModelService"}
{"level":"info","ts":"2025-06-22T02:37:31.899862458Z","caller":"controller/controller.go:204","msg":"Starting EventSource","controller":"modelservice","controllerGroup":"llm-d.ai","controllerKind":"ModelService","source":"kind source: *v1alpha2.InferencePool"}
{"level":"info","ts":"2025-06-22T02:37:32.007953463Z","caller":"controller/controller.go:239","msg":"Starting Controller","controller":"modelservice","controllerGroup":"llm-d.ai","controllerKind":"ModelService"}
{"level":"info","ts":"2025-06-22T02:37:32.008074346Z","caller":"controller/controller.go:248","msg":"Starting workers","controller":"modelservice","controllerGroup":"llm-d.ai","controllerKind":"ModelService","worker count":1}


@ip-172-31-39-185:~/v3-vanilla/llm-d-deployer/quickstart$ kubectl get inferencepool -o yaml
apiVersion: v1
items:
- apiVersion: inference.networking.x-k8s.io/v1alpha2
  kind: InferencePool
  metadata:
    creationTimestamp: "2025-06-22T03:14:23Z"
    generation: 1
    labels:
      llm-d.ai/inferenceServing: "true"
      llm-d.ai/model: qwen-qwen3-0-6b
    name: qwen-qwen3-0-6b-inference-pool
    namespace: llm-d
    ownerReferences:
    - apiVersion: llm-d.ai/v1alpha1
      kind: ModelService
      name: qwen-qwen3-0-6b
      uid: 830198b9-6f50-4e22-935e-bc07f690d57b
    resourceVersion: "3859"
    uid: aa8b7f07-a15c-4fe5-b649-833eff3b521e
  spec:
    extensionRef:
      failureMode: FailClose
      group: ""
      kind: Service
      name: qwen-qwen3-0-6b-epp-service
    selector:
      llm-d.ai/inferenceServing: "true"
      llm-d.ai/model: qwen-qwen3-0-6b
    targetPortNumber: 8000
kind: List
metadata:
  resourceVersion: ""


@ip-172-31-39-185:~/v3-vanilla/llm-d-deployer/quickstart$ ./test-request.sh
Namespace: llm-d
Model ID:  none; will be discover from first entry in /v1/models

1 -> Fetching available models from the decode pod at 10.244.0.13…
{"object":"list","data":[{"id":"Qwen/Qwen3-0.6B","object":"model","created":1750562211,"owned_by":"vllm","root":"Qwen/Qwen3-0.6B","parent":null,"max_model_len":40960,"permission":[{"id":"modelperm-bfb2112f67e14f429b43fe654b276ef5","object":"model_permission","created":1750562211,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}pod "curl-1183" deleted

Discovered model to use: Qwen/Qwen3-0.6B

2 -> Sending a completion request to the decode pod at 10.244.0.13…
{"id":"cmpl-45e564a36ecf418385cd4727d33b107f","object":"text_completion","created":1750562214,"model":"Qwen/Qwen3-0.6B","choices":[{"index":0,"text":" Can you help me with my homework? How do I find the equation of a","logprobs":null,"finish_reason":"length","stop_reason":null,"prompt_logprobs":null}],"usage":{"prompt_tokens":4,"total_tokens":20,"completion_tokens":16,"prompt_tokens_details":null},"kv_transfer_params":null}pod "curl-8689" deleted

3 -> Fetching available models via the gateway at llm-d-inference-gateway-istio.llm-d.svc.cluster.local…
{"object":"list","data":[{"id":"Qwen/Qwen3-0.6B","object":"model","created":1750562217,"owned_by":"vllm","root":"Qwen/Qwen3-0.6B","parent":null,"max_model_len":40960,"permission":[{"id":"modelperm-35373a792629416f8dbbc45acbdb29a4","object":"model_permission","created":1750562217,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}pod "curl-6990" deleted


4 -> Sending a completion request via the gateway at llm-d-inference-gateway-istio.llm-d.svc.cluster.local with model 'Qwen/Qwen3-0.6B'…
{"id":"cmpl-a9d16da9-fe1d-4b57-8cff-d628b93f1e7e","object":"text_completion","created":1750562220,"model":"Qwen/Qwen3-0.6B","choices":[{"index":0,"text":" What are your key characteristics?\n\nI'm a language model, but I also have","logprobs":null,"finish_reason":"length","stop_reason":null,"prompt_logprobs":null}],"usage":{"prompt_tokens":4,"total_tokens":20,"completion_tokens":16,"prompt_tokens_details":null},"kv_transfer_params":null}pod "curl-5627" deleted

All tests complete.

@nekomeowww
Copy link
Author

Hi @nekomeowww thanks for taking a look! Can you checkout the ci-deps.sh script in chart-dependencies/ci-deps.sh that installs the latest GAIE v0.3.0 latest release CRDs with https://github.com/llm-d/llm-d-inference-scheduler/blob/main/deploy/components/crds-gie/kustomization.yaml. Not sure where the v0.8.0 GIE release in your patch came from?

Im pasting the logs from a quick install from main. Can you share what you are getting? Thanks!

Ah sure, thanks for pointing that out, I got the script from Getting started - Kubernetes Gateway API Inference Extension section, and it appears that I pasted and modified the wrong version for it.

Let me do a real quick test for what you suggested, thanks.

@nekomeowww
Copy link
Author

nekomeowww commented Jun 22, 2025

I tried to install llm-d on my development use macOS over minikube, the output

❯ ./llmd-installer.sh
ℹ️  📂 Setting up script environment...
ℹ️  kubectl can reach to a running Kubernetes cluster.
✅ HF_TOKEN validated
ℹ️  🏗️ Installing GAIE Kubernetes infrastructure…
✅ 📜 Base CRDs: Installing...
customresourcedefinition.apiextensions.k8s.io/gatewayclasses.gateway.networking.k8s.io created
customresourcedefinition.apiextensions.k8s.io/gateways.gateway.networking.k8s.io created
customresourcedefinition.apiextensions.k8s.io/grpcroutes.gateway.networking.k8s.io created
customresourcedefinition.apiextensions.k8s.io/httproutes.gateway.networking.k8s.io created
customresourcedefinition.apiextensions.k8s.io/referencegrants.gateway.networking.k8s.io created
✅ 🚪 GAIE CRDs: Installing...
customresourcedefinition.apiextensions.k8s.io/inferencemodels.inference.networking.x-k8s.io created
customresourcedefinition.apiextensions.k8s.io/inferencepools.inference.networking.x-k8s.io created
✅ 🎒 Gateway provider '': Installing...
Release "istio-base" does not exist. Installing it now.
Pulled: gcr.io/istio-testing/charts/base:1.26-alpha.9befed2f1439d883120f8de70fd70d84ca0ebc3d
Digest: sha256:d022e2d190b4acdb5abbf160e34a63b5bbd94d4c76ef8ec46bcb48c1e6e6c9c5
NAME: istio-base
LAST DEPLOYED: Sun Jun 22 13:25:23 2025
NAMESPACE: istio-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Istio base successfully installed!

To learn more about the release, try:
  $ helm status istio-base -n istio-system
  $ helm get all istio-base -n istio-system
Release "istiod" does not exist. Installing it now.
Pulled: gcr.io/istio-testing/charts/istiod:1.26-alpha.9befed2f1439d883120f8de70fd70d84ca0ebc3d
Digest: sha256:b6a16f2823041fe7410e0fc089283902dd4795e9e1eaac2406f985fba111993a

Indeed showed that GAIE CRDs are installed correctly without any problem.

But when I tried llm-d several days ago, since I was installing it into our existing cluster (with NVIDIA GPUs for testing and development purpose), with Istio already installed and configured by other team members (we got many other gateway CRs configured for other teams), after reading a small portion of ./llmd-installer.sh, I was thinking that perhaps I should not use this script due to installing to existing cluster (may introduce resources conflicts), so I decided to do it manually, sadly I missed up the ../chart-dependencies/ci-deps.sh apply part obviously, the GAIE was missing. So it was some misunderstanding caused by myself, sorry for taking your time to review and try.

To address this issue, would it be better if I could write another documentations to install it step by step? (A documentation version of llm-d-installer.sh) Maybe I could help to list out the actual dependencies llm-d required and used, so engineers or contributors can understand how llm-d can be integrated into their exsiting systems easily.

@nerdalert
Copy link
Member

Hi @nekomeowww apologies for the slow reply. We definitely have a more manual documenting opportunity coming up once we land #321 if interested. Cheers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants