Added Gateway API Inference Extension (GIE) installation for quickstart docs #332

nekomeowww · 2025-06-20T11:25:25Z

Summary

Currently, if Gateway API Inference Extension (GIE) not installed in a fresh new testing used cluster, the following error will throw when following the guide of Quickstart, and for the helm installed release, the created llm-d-modelservice will fail due to timeout on waiting CRD points to InferencePool.inference.networking.x-k8s.io:

{
  "level": "error",
  "ts": "2025-06-20T11:02:53.466245232Z",
  "logger": "controller-runtime.source.EventHandler",
  "caller": "source/kind.go:71",
  "msg": "if kind is a CRD, it should be installed before calling Start",
  "kind": "InferencePool.inference.networking.x-k8s.io",
  "error": "no matches for kind \"InferencePool\" in version \"inference.networking.x-k8s.io/v1alpha2\"",
  "stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind[...]).Start.func1.1\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/source/kind.go:71\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/loop.go:53\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/loop.go:54\nk8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/poll.go:33\nsigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind[...]).Start.func1\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/source/kind.go:64"
}

While I do discovered that #321 is currently pending to be merged for adding the Inference Extension (GIE) as a sub chart, for others who currently trying out llm-d, it's obvious that missing pieces of GIE will result in errors.

This Pull Reuqest temporarily add a new section for asking to install GIE before installation.

Hi @nekomeowww thanks for taking a look! Can you checkout the ci-deps.sh script in chart-dependencies/ci-deps.sh that installs the latest GAIE v0.3.0 latest release CRDs with https://github.com/llm-d/llm-d-inference-scheduler/blob/main/deploy/components/crds-gie/kustomization.yaml. Not sure where the v0.8.0 GIE release in your patch came from?

Im pasting the logs from a quick install from main. Can you share what you are getting? Thanks!

$ minikube start     --driver docker     --container-runtime docker     --gpus all     --memory no-limit --cpus no-limit
😄  minikube v1.35.0 on  24.04
🎉  minikube 1.36.0 is available! Download it: https://github.com/kubernetes/minikube/releases/tag/v1.36.0
💡  To disable this notice, run: 'minikube config set WantUpdateNotification false'

✨  Using the docker driver based on existing profile
👍  Starting "minikube" primary control-plane node in "minikube" cluster
🚜  Pulling base image v0.0.46 ...
🤷  docker "minikube" container is missing, will recreate.
🔥  Creating docker container (CPUs=no-limit, Memory=no-limit) ...
🐳  Preparing Kubernetes v1.32.0 on Docker 27.4.1 ...
    ▪ Generating certificates and keys ...
    ▪ Booting up control plane ...
    ▪ Configuring RBAC rules ...
🔗  Configuring bridge CNI (Container Networking Interface) ...
🔎  Verifying Kubernetes components...
    ▪ Using image nvcr.io/nvidia/k8s-device-plugin:v0.17.0
    ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟  Enabled addons: nvidia-device-plugin, storage-provisioner, default-storageclass
🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default
@ip-172-31-39-185:~/v3-vanilla/llm-d-deployer/quickstart$ ./llmd-installer.sh --values-file examples/base/slim/base-slim.yaml --disable-metrics-collection
ℹ️  📂 Setting up script environment...
ℹ️  kubectl can reach to a running Kubernetes cluster.
✅ HF_TOKEN validated
ℹ️  🏗️ Installing GAIE Kubernetes infrastructure…
✅ 📜 Base CRDs: Installing...
customresourcedefinition.apiextensions.k8s.io/gatewayclasses.gateway.networking.k8s.io created
customresourcedefinition.apiextensions.k8s.io/gateways.gateway.networking.k8s.io created
customresourcedefinition.apiextensions.k8s.io/grpcroutes.gateway.networking.k8s.io created
customresourcedefinition.apiextensions.k8s.io/httproutes.gateway.networking.k8s.io created
customresourcedefinition.apiextensions.k8s.io/referencegrants.gateway.networking.k8s.io created
✅ 🚪 GAIE CRDs: Installing...
customresourcedefinition.apiextensions.k8s.io/inferencemodels.inference.networking.x-k8s.io created
customresourcedefinition.apiextensions.k8s.io/inferencepools.inference.networking.x-k8s.io created
✅ 🎒 Gateway provider 'istio': Installing...
Release "istio-base" does not exist. Installing it now.
Pulled: gcr.io/istio-testing/charts/base:1.26-alpha.9befed2f1439d883120f8de70fd70d84ca0ebc3d
Digest: sha256:d022e2d190b4acdb5abbf160e34a63b5bbd94d4c76ef8ec46bcb48c1e6e6c9c5
NAME: istio-base
LAST DEPLOYED: Sun Jun 22 02:37:07 2025
NAMESPACE: istio-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Istio base successfully installed!

To learn more about the release, try:
  $ helm status istio-base -n istio-system
  $ helm get all istio-base -n istio-system
Release "istiod" does not exist. Installing it now.
Pulled: gcr.io/istio-testing/charts/istiod:1.26-alpha.9befed2f1439d883120f8de70fd70d84ca0ebc3d
Digest: sha256:b6a16f2823041fe7410e0fc089283902dd4795e9e1eaac2406f985fba111993a
NAME: istiod
LAST DEPLOYED: Sun Jun 22 02:37:09 2025
NAMESPACE: istio-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
"istiod" successfully installed!

To learn more about the release, try:
  $ helm status istiod -n istio-system
  $ helm get all istiod -n istio-system

Next steps:
  * Deploy a Gateway: https://istio.io/latest/docs/setup/additional-setup/gateway/
  * Try out our tasks to get started on common configurations:
    * https://istio.io/latest/docs/tasks/traffic-management
    * https://istio.io/latest/docs/tasks/security/
    * https://istio.io/latest/docs/tasks/policy-enforcement/
  * Review the list of actively supported releases, CVE publications and our hardening guide:
    * https://istio.io/latest/docs/releases/supported-releases/
    * https://istio.io/latest/news/security/
    * https://istio.io/latest/docs/ops/best-practices/security/

For further documentation see https://istio.io website
✅ GAIE infra applied
ℹ️  📦 Creating namespace llm-d...
namespace/llm-d created
Context "minikube" modified.
✅ Namespace ready
ℹ️  🔹 Using merged values: /tmp/tmp.s53EvFotBU
ℹ️  🔐 Creating/updating HF token secret...
secret/llm-d-hf-token created
✅ HF token secret created
ℹ️  Fetching OCP proxy UID...
ℹ️  No OpenShift SCC annotation found; defaulting PROXY_UID=0
ℹ️  📜 Applying modelservice CRD...
customresourcedefinition.apiextensions.k8s.io/modelservices.llm-d.ai created
✅ ModelService CRD applied
ℹ️  ⏭️ Model download to PVC skipped: BYO model via HF repo_id selected.
protocol hf chosen - models will be downloaded JIT in inferencing pods.
"bitnami" already exists with the same configuration, skipping
ℹ️  🛠️ Building Helm chart dependencies...
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "prometheus-community" chart repository
...Successfully got an update from the "bitnami" chart repository
Update Complete. ⎈Happy Helming!⎈
Saving 2 charts
Downloading common from repo https://charts.bitnami.com/bitnami
Downloading redis from repo https://charts.bitnami.com/bitnami
Pulled: registry-1.docker.io/bitnamicharts/redis:20.13.4
Digest: sha256:6a389e13237e8e639ec0d445e785aa246b57bfce711b087033a196a291d5c8d7
Deleting outdated charts
✅ Dependencies built
ℹ️  Metrics collection disabled by user request
ℹ️  Metrics collection disabled by user request
ℹ️  🚚 Deploying llm-d chart with /tmp/tmp.s53EvFotBU...
Release "llm-d" does not exist. Installing it now.
NAME: llm-d
LAST DEPLOYED: Sun Jun 22 02:37:25 2025
NAMESPACE: llm-d
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thank you for installing llm-d.

Your release is named `llm-d`.

To learn more about the release, try:

$ helm status llm-d
$ helm get all llm-d

Following presets are available to your users:


| Name                                                              | Description                                                                                     |
| ----------------------------------------------------------------- | ----------------------------------------------------------------------------------------------- |
| basic-gpu-preset                                                  | Basic gpu inference                                                                             |
| basic-gpu-with-nixl-preset                                        | GPU inference with NIXL P/D KV transfer and cache offloading                                    |
| basic-gpu-with-nixl-and-redis-lookup-preset                       | GPU inference with NIXL P/D KV transfer, cache offloading and Redis lookup server               |
| basic-sim-preset                                                  | Basic simulation                                                                                |
✅ llm-d deployed
✅ 🎉 Installation complete.

@ip-172-31-39-185:~/v3-vanilla/llm-d-deployer/quickstart$ kubectl get pods     # --all-namespaces
NAME                                             READY   STATUS            RESTARTS   AGE
llm-d-inference-gateway-istio-5f968c59f5-ht7hj   1/1     Running           0          87s
llm-d-modelservice-567b57d87-8qhl6               1/1     Running           0          87s
qwen-qwen3-0-6b-decode-6cdd8996b9-nfk2l          1/2     PodInitializing   0          81s
qwen-qwen3-0-6b-epp-7568cb7c6b-grchw             1/1     Running           0          81s
@ip-172-31-39-185:~/v3-vanilla/llm-d-deployer/quickstart$ k logs llm-d-modelservice-567b57d87-8qhl6
{"level":"info","ts":"2025-06-22T02:37:31.899143389Z","logger":"setup","caller":"cmd/root.go:272","msg":"starting manager"}
{"level":"info","ts":"2025-06-22T02:37:31.89954771Z","caller":"manager/server.go:83","msg":"starting server","name":"health probe","addr":"[::]:8081"}
{"level":"info","ts":"2025-06-22T02:37:31.899727265Z","caller":"controller/controller.go:204","msg":"Starting EventSource","controller":"modelservice","controllerGroup":"llm-d.ai","controllerKind":"ModelService","source":"kind source: *v1.Service"}
{"level":"info","ts":"2025-06-22T02:37:31.899748685Z","caller":"controller/controller.go:204","msg":"Starting EventSource","controller":"modelservice","controllerGroup":"llm-d.ai","controllerKind":"ModelService","source":"kind source: *v1.ServiceAccount"}
{"level":"info","ts":"2025-06-22T02:37:31.899775226Z","caller":"controller/controller.go:204","msg":"Starting EventSource","controller":"modelservice","controllerGroup":"llm-d.ai","controllerKind":"ModelService","source":"kind source: *v1alpha1.ModelService"}
{"level":"info","ts":"2025-06-22T02:37:31.899773946Z","caller":"controller/controller.go:204","msg":"Starting EventSource","controller":"modelservice","controllerGroup":"llm-d.ai","controllerKind":"ModelService","source":"kind source: *v1.Deployment"}
{"level":"info","ts":"2025-06-22T02:37:31.899802347Z","caller":"controller/controller.go:204","msg":"Starting EventSource","controller":"modelservice","controllerGroup":"llm-d.ai","controllerKind":"ModelService","source":"kind source: *v1.RoleBinding"}
{"level":"info","ts":"2025-06-22T02:37:31.899796596Z","caller":"controller/controller.go:204","msg":"Starting EventSource","controller":"modelservice","controllerGroup":"llm-d.ai","controllerKind":"ModelService","source":"kind source: *v1.ConfigMap"}
{"level":"info","ts":"2025-06-22T02:37:31.899827507Z","caller":"controller/controller.go:204","msg":"Starting EventSource","controller":"modelservice","controllerGroup":"llm-d.ai","controllerKind":"ModelService","source":"kind source: *v1alpha2.InferenceModel"}
{"level":"info","ts":"2025-06-22T02:37:31.899739985Z","caller":"controller/controller.go:204","msg":"Starting EventSource","controller":"modelservice","controllerGroup":"llm-d.ai","controllerKind":"ModelService","source":"kind source: *v1alpha1.ModelService"}
{"level":"info","ts":"2025-06-22T02:37:31.899862458Z","caller":"controller/controller.go:204","msg":"Starting EventSource","controller":"modelservice","controllerGroup":"llm-d.ai","controllerKind":"ModelService","source":"kind source: *v1alpha2.InferencePool"}
{"level":"info","ts":"2025-06-22T02:37:32.007953463Z","caller":"controller/controller.go:239","msg":"Starting Controller","controller":"modelservice","controllerGroup":"llm-d.ai","controllerKind":"ModelService"}
{"level":"info","ts":"2025-06-22T02:37:32.008074346Z","caller":"controller/controller.go:248","msg":"Starting workers","controller":"modelservice","controllerGroup":"llm-d.ai","controllerKind":"ModelService","worker count":1}


@ip-172-31-39-185:~/v3-vanilla/llm-d-deployer/quickstart$ kubectl get inferencepool -o yaml
apiVersion: v1
items:
- apiVersion: inference.networking.x-k8s.io/v1alpha2
  kind: InferencePool
  metadata:
    creationTimestamp: "2025-06-22T03:14:23Z"
    generation: 1
    labels:
      llm-d.ai/inferenceServing: "true"
      llm-d.ai/model: qwen-qwen3-0-6b
    name: qwen-qwen3-0-6b-inference-pool
    namespace: llm-d
    ownerReferences:
    - apiVersion: llm-d.ai/v1alpha1
      kind: ModelService
      name: qwen-qwen3-0-6b
      uid: 830198b9-6f50-4e22-935e-bc07f690d57b
    resourceVersion: "3859"
    uid: aa8b7f07-a15c-4fe5-b649-833eff3b521e
  spec:
    extensionRef:
      failureMode: FailClose
      group: ""
      kind: Service
      name: qwen-qwen3-0-6b-epp-service
    selector:
      llm-d.ai/inferenceServing: "true"
      llm-d.ai/model: qwen-qwen3-0-6b
    targetPortNumber: 8000
kind: List
metadata:
  resourceVersion: ""


@ip-172-31-39-185:~/v3-vanilla/llm-d-deployer/quickstart$ ./test-request.sh
Namespace: llm-d
Model ID:  none; will be discover from first entry in /v1/models

1 -> Fetching available models from the decode pod at 10.244.0.13…
{"object":"list","data":[{"id":"Qwen/Qwen3-0.6B","object":"model","created":1750562211,"owned_by":"vllm","root":"Qwen/Qwen3-0.6B","parent":null,"max_model_len":40960,"permission":[{"id":"modelperm-bfb2112f67e14f429b43fe654b276ef5","object":"model_permission","created":1750562211,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}pod "curl-1183" deleted

Discovered model to use: Qwen/Qwen3-0.6B

2 -> Sending a completion request to the decode pod at 10.244.0.13…
{"id":"cmpl-45e564a36ecf418385cd4727d33b107f","object":"text_completion","created":1750562214,"model":"Qwen/Qwen3-0.6B","choices":[{"index":0,"text":" Can you help me with my homework? How do I find the equation of a","logprobs":null,"finish_reason":"length","stop_reason":null,"prompt_logprobs":null}],"usage":{"prompt_tokens":4,"total_tokens":20,"completion_tokens":16,"prompt_tokens_details":null},"kv_transfer_params":null}pod "curl-8689" deleted

3 -> Fetching available models via the gateway at llm-d-inference-gateway-istio.llm-d.svc.cluster.local…
{"object":"list","data":[{"id":"Qwen/Qwen3-0.6B","object":"model","created":1750562217,"owned_by":"vllm","root":"Qwen/Qwen3-0.6B","parent":null,"max_model_len":40960,"permission":[{"id":"modelperm-35373a792629416f8dbbc45acbdb29a4","object":"model_permission","created":1750562217,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}pod "curl-6990" deleted


4 -> Sending a completion request via the gateway at llm-d-inference-gateway-istio.llm-d.svc.cluster.local with model 'Qwen/Qwen3-0.6B'…
{"id":"cmpl-a9d16da9-fe1d-4b57-8cff-d628b93f1e7e","object":"text_completion","created":1750562220,"model":"Qwen/Qwen3-0.6B","choices":[{"index":0,"text":" What are your key characteristics?\n\nI'm a language model, but I also have","logprobs":null,"finish_reason":"length","stop_reason":null,"prompt_logprobs":null}],"usage":{"prompt_tokens":4,"total_tokens":20,"completion_tokens":16,"prompt_tokens_details":null},"kv_transfer_params":null}pod "curl-5627" deleted

All tests complete.

quickstart/README-minikube.md

quickstart/README.md

Signed-off-by: Neko <[email protected]>

nekomeowww · 2025-06-22T05:13:41Z

Hi @nekomeowww thanks for taking a look! Can you checkout the ci-deps.sh script in chart-dependencies/ci-deps.sh that installs the latest GAIE v0.3.0 latest release CRDs with https://github.com/llm-d/llm-d-inference-scheduler/blob/main/deploy/components/crds-gie/kustomization.yaml. Not sure where the v0.8.0 GIE release in your patch came from?

Im pasting the logs from a quick install from main. Can you share what you are getting? Thanks!

Ah sure, thanks for pointing that out, I got the script from Getting started - Kubernetes Gateway API Inference Extension section, and it appears that I pasted and modified the wrong version for it.

Let me do a real quick test for what you suggested, thanks.

nekomeowww · 2025-06-22T05:33:20Z

I tried to install llm-d on my development use macOS over minikube, the output

❯ ./llmd-installer.sh
ℹ️  📂 Setting up script environment...
ℹ️  kubectl can reach to a running Kubernetes cluster.
✅ HF_TOKEN validated
ℹ️  🏗️ Installing GAIE Kubernetes infrastructure…
✅ 📜 Base CRDs: Installing...
customresourcedefinition.apiextensions.k8s.io/gatewayclasses.gateway.networking.k8s.io created
customresourcedefinition.apiextensions.k8s.io/gateways.gateway.networking.k8s.io created
customresourcedefinition.apiextensions.k8s.io/grpcroutes.gateway.networking.k8s.io created
customresourcedefinition.apiextensions.k8s.io/httproutes.gateway.networking.k8s.io created
customresourcedefinition.apiextensions.k8s.io/referencegrants.gateway.networking.k8s.io created
✅ 🚪 GAIE CRDs: Installing...
customresourcedefinition.apiextensions.k8s.io/inferencemodels.inference.networking.x-k8s.io created
customresourcedefinition.apiextensions.k8s.io/inferencepools.inference.networking.x-k8s.io created
✅ 🎒 Gateway provider '': Installing...
Release "istio-base" does not exist. Installing it now.
Pulled: gcr.io/istio-testing/charts/base:1.26-alpha.9befed2f1439d883120f8de70fd70d84ca0ebc3d
Digest: sha256:d022e2d190b4acdb5abbf160e34a63b5bbd94d4c76ef8ec46bcb48c1e6e6c9c5
NAME: istio-base
LAST DEPLOYED: Sun Jun 22 13:25:23 2025
NAMESPACE: istio-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Istio base successfully installed!

To learn more about the release, try:
  $ helm status istio-base -n istio-system
  $ helm get all istio-base -n istio-system
Release "istiod" does not exist. Installing it now.
Pulled: gcr.io/istio-testing/charts/istiod:1.26-alpha.9befed2f1439d883120f8de70fd70d84ca0ebc3d
Digest: sha256:b6a16f2823041fe7410e0fc089283902dd4795e9e1eaac2406f985fba111993a

Indeed showed that GAIE CRDs are installed correctly without any problem.

But when I tried llm-d several days ago, since I was installing it into our existing cluster (with NVIDIA GPUs for testing and development purpose), with Istio already installed and configured by other team members (we got many other gateway CRs configured for other teams), after reading a small portion of ./llmd-installer.sh, I was thinking that perhaps I should not use this script due to installing to existing cluster (may introduce resources conflicts), so I decided to do it manually, sadly I missed up the ../chart-dependencies/ci-deps.sh apply part obviously, the GAIE was missing. So it was some misunderstanding caused by myself, sorry for taking your time to review and try.

To address this issue, would it be better if I could write another documentations to install it step by step? (A documentation version of llm-d-installer.sh) Maybe I could help to list out the actual dependencies llm-d required and used, so engineers or contributors can understand how llm-d can be integrated into their exsiting systems easily.

nerdalert · 2025-06-30T05:01:14Z

Hi @nekomeowww apologies for the slow reply. We definitely have a more manual documenting opportunity coming up once we land #321 if interested. Cheers.

Added Gateway API Inference Extension (GIE) installation for quicksta…

e5066ce

…rt docs

nekomeowww commented Jun 22, 2025

View reviewed changes

quickstart/README-minikube.md Outdated Show resolved Hide resolved

nekomeowww commented Jun 22, 2025

View reviewed changes

quickstart/README.md Outdated Show resolved Hide resolved

nekomeowww added 2 commits June 22, 2025 13:11

Update quickstart/README-minikube.md

fbaa2ab

Signed-off-by: Neko <[email protected]>

Update quickstart/README.md

9d6a996

Signed-off-by: Neko <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added Gateway API Inference Extension (GIE) installation for quickstart docs #332

Added Gateway API Inference Extension (GIE) installation for quickstart docs #332

Uh oh!

nekomeowww commented Jun 20, 2025

Uh oh!

nerdalert commented Jun 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

nekomeowww commented Jun 22, 2025

Uh oh!

nekomeowww commented Jun 22, 2025 •

edited

Loading

Uh oh!

nerdalert commented Jun 30, 2025

Uh oh!

Uh oh!

Added Gateway API Inference Extension (GIE) installation for quickstart docs #332

Are you sure you want to change the base?

Added Gateway API Inference Extension (GIE) installation for quickstart docs #332

Uh oh!

Conversation

nekomeowww commented Jun 20, 2025

Summary

Related

Uh oh!

nerdalert commented Jun 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nekomeowww commented Jun 22, 2025

Uh oh!

nekomeowww commented Jun 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nerdalert commented Jun 30, 2025

Uh oh!

Uh oh!

nerdalert commented Jun 22, 2025 •

edited

Loading

nekomeowww commented Jun 22, 2025 •

edited

Loading