feat: gaie helm chart based example (#2168)

Signed-off-by: Biswa Panda <biswa.panda@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

feat: gaie helm chart based example (#2168)
Signed-off-by: Biswa Panda <biswa.panda@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
8248a116 · Biswa Panda · GitHub · 8b0a035a · 8248a116 · 8b0a035a
Unverified Commit 8248a116 authored Jul 29, 2025 by Biswa Panda Committed by GitHub Jul 29, 2025
16 changed files
--- a/deploy/inference-gateway/README.md
+++ b/deploy/inference-gateway/README.md
+## Inference Gateway Setup with Dynamo
+
+This Setup treats each Dynamo deployment as a black box and routes traffic randomly among the deployments.
+Currently, this setup is only kgateway based Inference Gateway.
+
+## Table of Contents
+
+- [Prerequisites](#prerequisites)
+- [Installation Steps](#installation-steps)
+- [Usage](#usage)
+
+## Prerequisites
+
+- Kubernetes cluster with kubectl configured
+- NVIDIA GPU drivers installed on worker nodes
+
+## Installation Steps
+
+1. **Install Dynamo Platform**
+
+[See Quickstart Guide](../../../docs/guides/dynamo_deploy/quickstart.md) to install Dynamo Cloud.
+
+
+2. **Deploy Inference Gateway**
+
+First, deploy an inference gateway service. In this example, we'll install `kgateway` based gateway implementation.
+You can use the script below or follow the steps manually.
+
+Script:
+```bash
+./install_gaie_crd_kgateway.sh
+```
+
+Manual steps:
+
+a. Deploy the Gateway API CRDs:
+```bash
+GATEWAY_API_VERSION=v1.3.0
+kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/$GATEWAY_API_VERSION/standard-install.yaml
+```
+
+b. Install the Inference Extension CRDs (Inferenece Model and Inference Pool CRDs)
+```bash
+INFERENCE_EXTENSION_VERSION=v0.5.1
+kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/$INFERENCE_EXTENSION_VERSION/manifests.yaml -n  my-model
+```
+
+c. Install `kgateway` CRDs and kgateway.
+```bash
+KGATEWAY_VERSION=v2.0.3
+
+# Install the Kgateway CRDs
+helm upgrade -i --create-namespace --namespace kgateway-system --version $KGATEWAY_VERSION kgateway-crds oci://cr.kgateway.dev/kgateway-dev/charts/kgateway-crds
+
+# Install Kgateway
+helm upgrade -i --namespace kgateway-system --version $KGATEWAY_VERSION kgateway oci://cr.kgateway.dev/kgateway-dev/charts/kgateway --set inferenceExtension.enabled=true
+```
+
+d. Deploy the Gateway Instance
+```bash
+kubectl create namespace my-model
+kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/kgateway/gateway.yaml -n  my-model
+```
+
+```bash
+kubectl get gateway inference-gateway -n my-model
+
+# Sample output
+# NAME                CLASS      ADDRESS   PROGRAMMED   AGE
+# inference-gateway   kgateway   x.x.x.x   True         1m
+```
+
+3. **Install dynamo model and dynamo gaie helm chart**
+
+The Inference Gateway is configured through the `inference-gateway-resources.yaml` file.
+
+Deploy the Inference Gateway resources to your Kubernetes cluster:
+
+```bash
+cd deploy/inference-gateway
+helm install dynamo-gaie ./helm/dynamo-gaie -n my-model -f ./vllm_agg_qwen.yaml
+```
+
+Key configurations include:
+- An InferenceModel resource for the Qwen model
+- A service for the inference gateway
+- Required RBAC roles and bindings
+- RBAC permissions
+
+5. **Verify Installation**
+
+Check that all resources are properly deployed:
+
+```bash
+kubectl get inferencepool
+kubectl get inferencemodel
+kubectl get httproute
+kubectl get service
+kubectl get gateway
+```
+
+Sample output:
+
+```bash
+# kubectl get inferencepool
+NAME        AGE
+qwen-pool   33m
+
+# kubectl get inferencemodel
+NAME         MODEL NAME        INFERENCE POOL   CRITICALITY   AGE
+qwen-model   Qwen/Qwen3-0.6B   qwen-pool        Critical      33m
+
+# kubectl get httproute
+NAME        HOSTNAMES   AGE
+qwen-route               33m
+```
+
+## Usage
+
+The Inference Gateway provides HTTP endpoints for model inference.
+
+### 1: Populate gateway URL for your k8s cluster
+```bash
+export GATEWAY_URL=<Gateway-URL>
+```
+
+To test the gateway in minikube, use the following command:
+a. User minikube tunnel to expose the gateway to the host
+   This requires `sudo` access to the host machine. alternatively, you can use port-forward to expose the gateway to the host as shown in alternateive (b).
+```bash
+# in first terminal
+minikube tunnel
+
+# in second terminal where you want to send inference requests
+GATEWAY_URL=$(kubectl get svc inference-gateway -n my-model -o yaml -o jsonpath='{.spec.clusterIP}')
+echo $GATEWAY_URL
+```
+
+b. use port-forward to expose the gateway to the host
+```bash
+# in first terminal
+kubectl port-forward svc/inference-gateway 8000:80 -n my-model
+
+# in second terminal where you want to send inference requests
+GATEWAY_URL=http://localhost:8000
+```
+
+### 2: Check models deployed to inference gateway
+
+
+a. Query models:
+```bash
+# in the second terminal where you GATEWAY_URL is set
+
+curl $GATEWAY_URL/v1/models | jq .
+```
+Sample output:
+```json
+{
+  "data": [
+    {
+      "created": 1753768323,
+      "id": "Qwen/Qwen3-0.6B",
+      "object": "object",
+      "owned_by": "nvidia"
+    }
+  ],
+  "object": "list"
+}
+```
+
+b. Send inference request to gateway:
+
+```bash
+MODEL_NAME="Qwen/Qwen3-0.6B"
+curl $GATEWAY_URL/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+      "model": "'"${MODEL_NAME}"'",
+      "messages": [
+      {
+          "role": "user",
+          "content": "In the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the world for centuries. You are an intrepid explorer, known for your unparalleled curiosity and courage, who has stumbled upon an ancient map hinting at ests that Aeloria holds a secret so profound that it has the potential to reshape the very fabric of reality. Your journey will take you through treacherous deserts, enchanted forests, and across perilous mountain ranges. Your Task: Character Background: Develop a detailed background for your character. Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost familt clue is hidden."
+      }
+      ],
+      "stream":false,
+      "max_tokens": 30,
+      "temperature": 0.0
+    }'
+```
+
+Sample inference output:
+
+```json
+{
+  "choices": [
+    {
+      "finish_reason": "stop",
+      "index": 0,
+      "logprobs": null,
+      "message": {
+        "audio": null,
+        "content": "<think>\nOkay, I need to develop a character background for the user's query. Let me start by understanding the requirements. The character is an",
+        "function_call": null,
+        "refusal": null,
+        "role": "assistant",
+        "tool_calls": null
+      }
+    }
+  ],
+  "created": 1753768682,
+  "id": "chatcmpl-772289b8-5998-4f6d-bd61-3659b684b347",
+  "model": "Qwen/Qwen3-0.6B",
+  "object": "chat.completion",
+  "service_tier": null,
+  "system_fingerprint": null,
+  "usage": {
+    "completion_tokens": 29,
+    "completion_tokens_details": null,
+    "prompt_tokens": 196,
+    "prompt_tokens_details": null,
+    "total_tokens": 225
+  }
+}
+```
--- a/deploy/inference-gateway/example/README.md
+++ b/deploy/inference-gateway/example/README.md
-# Installing Inference Gateway with Dynamo (Experimental)
-
-This is an experimental setup that treats each Dynamo deployment as a black box and routes traffic randomly among the deployments.
-
-This guide provides instructions for setting up the Inference Gateway with Dynamo for managing and routing inference requests.
-
-## Prerequisites
-
- Kubernetes cluster with kubectl configured
- NVIDIA GPU drivers installed on worker nodes
-
-## Installation Steps
-
-1. **Install Dynamo Cloud**
-
-[See Quickstart Guide](../../../docs/guides/dynamo_deploy/quickstart.md) to install Dynamo Cloud.
-
-
-2. **Deploy Inference Gateway**
-
-First, deploy an inference gateway service. In this example, we'll install `kgateway` based gateway implementation.
-
-Install the Inference Extension CRDs:
-```bash
-VERSION=v0.3.0
-kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/$VERSION/manifests.yaml
-```
-
-Deploy an Inference Gateway. In this example, we'll install `Kgateway`:
-```bash
-KGTW_VERSION=v2.0.2
-
-# Install the Kgateway CRDs
-helm upgrade -i --create-namespace --namespace kgateway-system --version $KGTW_VERSION kgateway-crds oci://cr.kgateway.dev/kgateway-dev/charts/kgateway-crds
-
-# Install Kgateway
-helm upgrade -i --namespace kgateway-system --version $KGTW_VERSION kgateway oci://cr.kgateway.dev/kgateway-dev/charts/kgateway --set inferenceExtension.enabled=true
-
-# Deploy the Gateway
-kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.3.0/standard-install.yaml
-kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/kgateway/gateway.yaml
-```
-
-### Validate Resources
-```bash
-kubectl get gateway inference-gateway
-
-# Sample output
-# NAME                CLASS      ADDRESS   PROGRAMMED   AGE
-# inference-gateway   kgateway             True         1m
-```
-
-3. **Apply Dynamo-specific manifests**
-
-The Inference Gateway is configured through the `inference-gateway-resources.yaml` file.
-
-Deploy the Inference Gateway resources to your Kubernetes cluster:
-
-```bash
-cd deploy/inference-gateway/example
-kubectl apply -f resources
-```
-
-Key configurations include:
- An InferenceModel resource for the DeepSeek model
- A service for the inference gateway
- Required RBAC roles and bindings
- RBAC permissions
-
-5. **Verify Installation**
-
-Check that all resources are properly deployed:
-
-```bash
-kubectl get inferencepool
-kubectl get inferencemodel
-kubectl get httproute
-```
-
-Sample output:
-
-```bash
-# kubectl get inferencepool
-NAME              AGE
-dynamo-deepseek   6s
-
-# kubectl get inferencemodel
-NAME              MODEL NAME                                 INFERENCE POOL    CRITICALITY   AGE
-deep-seek-model   deepseek-ai/DeepSeek-R1-Distill-Llama-8B   dynamo-deepseek   Critical      6s
-
-# kubectl get httproute
-NAME        HOSTNAMES   AGE
-llm-route               6s
-```
-
-## Usage
-
-The Inference Gateway provides HTTP/2 endpoints for model inference. The default service is exposed on port 9002.
-
-### 1: Populate gateway URL for your k8s cluster
-```bash
-export GATEWAY_URL=<Gateway-URL>
-```
-
-To test the gateway in minikube, use the following command:
-```bash
-minikube tunnel &
-
-GATEWAY_URL=$(kubectl get svc inference-gateway -o yaml -o jsonpath='{.spec.clusterIP}')
-echo $GATEWAY_URL
-```
-
-### 2: Check models deployed to inference gateway
-
-Query models:
-```bash
-curl $GATEWAY_URL/v1/models | jq .
-```
-
-Send inference request to gateway:
-
-```bash
-curl $GATEWAY_URL/v1/chat/completions \
-  -H "Content-Type: application/json" \
-  -d '{
-    "model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
-    "messages": [
-    {
-        "role": "user",
-        "content": "In the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the world for centuries. You are an intrepid explorer, known for your unparalleled curiosity and courage, who has stumbled upon an ancient map hinting at ests that Aeloria holds a secret so profound that it has the potential to reshape the very fabric of reality. Your journey will take you through treacherous deserts, enchanted forests, and across perilous mountain ranges. Your Task: Character Background: Develop a detailed background for your character. Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost familt clue is hidden."
-    }
-    ],
-    "stream":false,
-    "max_tokens": 30
-  }'
-```
\ No newline at end of file
--- a/deploy/inference-gateway/helm/dynamo-gaie/.helmignore
+++ b/deploy/inference-gateway/helm/dynamo-gaie/.helmignore
+# Patterns to ignore when building packages.
+# This supports shell glob matching, relative path matching, and
+# negation (prefixed with !). Only one pattern per line.
+.DS_Store
+# Common VCS dirs
+.git/
+.gitignore
+.bzr/
+.bzrignore
+.hg/
+.hgignore
+.svn/
+# Common backup files
+*.swp
+*.bak
+*.tmp
+*.orig
+*~
+# Various IDEs
+.project
+.idea/
+*.tmproj
+.vscode/
--- a/deploy/inference-gateway/helm/dynamo-gaie/Chart.yaml
+++ b/deploy/inference-gateway/helm/dynamo-gaie/Chart.yaml
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+apiVersion: v2
+name: dynamo-gaie
+description: A Helm chart for installing Inference Gateway Inference Extension with Dynamo
+
+# A chart can be either an 'application' or a 'library' chart.
+#
+# Application charts are a collection of templates that can be packaged into versioned archives
+# to be deployed.
+#
+# Library charts provide useful utilities or functions for the chart developer. They're included as
+# a dependency of application charts to inject those utilities and functions into the rendering
+# pipeline. Library charts do not define any templates and therefore cannot be deployed.
+type: application
+
+# This is the chart version. This version number should be incremented each time you make changes
+# to the chart and its templates, including the app version.
+# Versions are expected to follow Semantic Versioning (https://semver.org/)
+version: 0.1.0
+
+# This is the version number of the application being deployed. This version number should be
+# incremented each time you make changes to the application. Versions are not expected to
+# follow Semantic Versioning. They should reflect the version the application is using.
+# It is recommended to use it with quotes.
+appVersion: "1.16.0"
--- a/deploy/inference-gateway/helm/dynamo-gaie/templates/NOTES.txt
+++ b/deploy/inference-gateway/helm/dynamo-gaie/templates/NOTES.txt
--- a/deploy/inference-gateway/helm/dynamo-gaie/templates/_helpers.tpl
+++ b/deploy/inference-gateway/helm/dynamo-gaie/templates/_helpers.tpl
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+{{/*
+Expand the name of the chart.
+*/}}
+{{- define "dynamo-gaie.name" -}}
+{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }}
+{{- end }}
+
+{{/*
+Create a default fully qualified app name.
+We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec).
+If release name contains chart name it will be used as a full name.
+*/}}
+{{- define "dynamo-gaie.fullname" -}}
+{{- if .Values.fullnameOverride }}
+{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }}
+{{- else }}
+{{- $name := default .Chart.Name .Values.nameOverride }}
+{{- if contains $name .Release.Name }}
+{{- .Release.Name | trunc 63 | trimSuffix "-" }}
+{{- else }}
+{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }}
+{{- end }}
+{{- end }}
+{{- end }}
+
+{{/*
+Create chart name and version as used by the chart label.
+*/}}
+{{- define "dynamo-gaie.chart" -}}
+{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }}
+{{- end }}
+
+{{/*
+Common labels
+*/}}
+{{- define "dynamo-gaie.labels" -}}
+helm.sh/chart: {{ include "dynamo-gaie.chart" . }}
+{{ include "dynamo-gaie.selectorLabels" . }}
+{{- if .Chart.AppVersion }}
+app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
+{{- end }}
+app.kubernetes.io/managed-by: {{ .Release.Service }}
+{{- end }}
+
+{{/*
+Selector labels
+*/}}
+{{- define "dynamo-gaie.selectorLabels" -}}
+app.kubernetes.io/name: {{ include "dynamo-gaie.name" . }}
+app.kubernetes.io/instance: {{ .Release.Name }}
+{{- end }}
+
+{{/*
+Create the name of the service account to use
+*/}}
+{{- define "dynamo-gaie.serviceAccountName" -}}
+{{- if .Values.serviceAccount.create }}
+{{- default (include "dynamo-gaie.fullname" .) .Values.serviceAccount.name }}
+{{- else }}
+{{- default "default" .Values.serviceAccount.name }}
+{{- end }}
+{{- end }}
--- a/deploy/inference-gateway/example/resources/cluster-role-binding.yaml
+++ b/deploy/inference-gateway/example/resources/cluster-role-binding.yaml
@@ -19,7 +19,7 @@ metadata:
 subjects:
 - kind: ServiceAccount
  name: default
-  namespace: default
+  namespace: {{ .Release.Namespace }}
 roleRef:
  kind: ClusterRole
-  name: pod-read
+  name: pod-read
\ No newline at end of file
--- a/deploy/inference-gateway/example/resources/cluster-role.yaml
+++ b/deploy/inference-gateway/example/resources/cluster-role.yaml
--- a/deploy/inference-gateway/example/resources/dynamo-epp.yaml
+++ b/deploy/inference-gateway/example/resources/dynamo-epp.yaml
@@ -15,31 +15,31 @@
 apiVersion: apps/v1
 kind: Deployment
 metadata:
-  name: dynamo-deepseek-epp
-  namespace: default
+  name: {{ .Values.model.shortName }}-epp
+  namespace: {{ .Release.Namespace }}
  labels:
-    app: dynamo-deepseek-epp
+    app: {{ .Values.model.shortName }}-epp
 spec:
  replicas: 1
  selector:
    matchLabels:
-      app: dynamo-deepseek-epp
+      app: {{ .Values.model.shortName }}-epp
  template:
    metadata:
      labels:
-        app: dynamo-deepseek-epp
+        app: {{ .Values.model.shortName }}-epp
    spec:
      # Conservatively, this timeout should mirror the longest grace period of the pods within the pool
      terminationGracePeriodSeconds: 130
      containers:
      - name: epp
-        image: us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/epp:v0.4.0
+        image: {{ .Values.extension.image }}
        imagePullPolicy: IfNotPresent
        args:
        - -poolName
-        - "dynamo-deepseek"
+        - "{{ .Values.model.shortName }}-pool"
        - "-poolNamespace"
-        - "default"
+        - "{{ .Release.Namespace }}"
        - -v
        - "4"
        - --zap-encoder
@@ -64,4 +64,4 @@ spec:
            port: 9003
            service: inference-extension
          initialDelaySeconds: 5
-          periodSeconds: 10
+          periodSeconds: 10
\ No newline at end of file
--- a/deploy/inference-gateway/example/resources/http-router.yaml
+++ b/deploy/inference-gateway/example/resources/http-router.yaml
@@ -12,23 +12,28 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
+
+{{- if .Values.httpRoute.enabled }}
 apiVersion: gateway.networking.k8s.io/v1
 kind: HTTPRoute
 metadata:
-  name: llm-route
+  name: {{ .Values.model.shortName }}-route
 spec:
  parentRefs:
  - group: gateway.networking.k8s.io
    kind: Gateway
-    name: inference-gateway
+    name: {{ .Values.httpRoute.gatewayName }}
  rules:
  - backendRefs:
    - group: inference.networking.x-k8s.io
      kind: InferencePool
-      name: dynamo-deepseek
+      name: {{ .Values.model.shortName }}-pool
+      port: {{ .Values.inferencePool.port }}
+      weight: 1
    matches:
    - path:
        type: PathPrefix
-        value: /
+        value: {{ .Values.httpRoute.path.prefix }}
    timeouts:
-      request: 300s
\ No newline at end of file
+      request: {{ .Values.httpRoute.timeout.request }}
+{{- end }}
\ No newline at end of file
--- a/deploy/inference-gateway/example/resources/inference-model.yaml
+++ b/deploy/inference-gateway/example/resources/inference-model.yaml
@@ -15,12 +15,12 @@
 apiVersion: inference.networking.x-k8s.io/v1alpha2
 kind: InferenceModel
 metadata:
-  name: deep-seek-model
-  namespace: default
+  name: {{ .Values.model.shortName }}-model
+  namespace: {{ .Release.Namespace }}
 spec:
-  criticality: Critical
-  modelName: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
+  criticality: {{ .Values.model.criticality }}
+  modelName: {{ .Values.model.identifier }}
  poolRef:
    group: inference.networking.x-k8s.io
    kind: InferencePool
-    name: dynamo-deepseek
\ No newline at end of file
+    name: {{ .Values.model.shortName }}-pool
\ No newline at end of file
--- a/deploy/inference-gateway/example/resources/inference-pool.yaml
+++ b/deploy/inference-gateway/example/resources/inference-pool.yaml
@@ -15,14 +15,15 @@
 apiVersion: inference.networking.x-k8s.io/v1alpha2
 kind: InferencePool
 metadata:
-  name: dynamo-deepseek
-  namespace: default
+  name: {{ .Values.model.shortName }}-pool
+  namespace: {{ .Release.Namespace }}
 spec:
-  targetPortNumber: 8000
+  targetPortNumber: {{ .Values.inferencePool.port }}
  selector:
    nvidia.com/dynamo-component: Frontend
+    nvidia.com/dynamo-namespace: {{ .Values.dynamoNamespace }}
  extensionRef:
    failureMode: FailOpen
    group: ""
    kind: Service
-    name: dynamo-deepseek-epp
\ No newline at end of file
+    name: {{ .Values.model.shortName }}-epp
\ No newline at end of file
--- a/deploy/inference-gateway/example/resources/service.yaml
+++ b/deploy/inference-gateway/example/resources/service.yaml
@@ -15,11 +15,11 @@
 apiVersion: v1
 kind: Service
 metadata:
-  name: dynamo-deepseek-epp
-  namespace: default
+  name: {{ .Values.model.shortName }}-epp
+  namespace: {{ .Release.Namespace }}
 spec:
  selector:
-    app: dynamo-deepseek-epp
+    app: {{ .Values.model.shortName }}-epp
  ports:
    - protocol: TCP
      port: 9002

--- a/deploy/inference-gateway/helm/dynamo-gaie/values.yaml
+++ b/deploy/inference-gateway/helm/dynamo-gaie/values.yaml
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Default values for dynamo-gaie.
+# This is a YAML-formatted file.
+# Declare variables to be passed into your templates.
+
+# This is the Dynamo namespace where the dynamo model is deployed
+dynamoNamespace: "vllm-agg"
+
+# This is the port on which the model is exposed
+model:
+  # This is the model name that will be used to route traffic to the dynamo model
+  # for example, if the model name is Qwen/Qwen3-0.6B, then the modelShortName should be qwen
+  identifier: "Qwen/Qwen3-0.6B"
+  # This is the short name of the model that will be used to generate the resource names
+  shortName: "qwen"
+  # Criticality level for the inference model
+  criticality: "Critical"
+
+# InferencePool configuration
+inferencePool:
+  # Target port number for the inference pool
+  port: 8000
+
+# HTTPRoute configuration
+httpRoute:
+  # Enable the HTTPRoute
+  enabled: true
+  # Gateway parent reference configuration
+  gatewayName: "inference-gateway"
+  # Path matching configuration
+  path:
+    prefix: "/"
+  # Timeout configuration
+  timeout:
+    request: "300s"
+
+extension:
+  # the GAIE extension
+  image: us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/epp:v0.4.0
\ No newline at end of file
--- a/deploy/inference-gateway/install_gaie_crd_kgateway.sh
+++ b/deploy/inference-gateway/install_gaie_crd_kgateway.sh
+#!/usr/bin/env bash
+
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+set -euo pipefail
+trap 'echo "Error at line $LINENO. Exiting."' ERR
+
+
+MODEL_NAMESPACE=my-model
+kubectl create namespace $MODEL_NAMESPACE || true
+
+# Install the Gateway API
+GATEWAY_API_VERSION=v1.3.0
+kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/$GATEWAY_API_VERSION/standard-install.yaml
+
+
+# Install the Inference Extension CRDs
+INFERENCE_EXTENSION_VERSION=v0.5.1
+kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/$INFERENCE_EXTENSION_VERSION/manifests.yaml -n  $MODEL_NAMESPACE
+
+
+# Install the Kgateway CRDs and Kgateway
+KGATEWAY_VERSION=v2.0.3
+KGATEWAY_SYSTEM_NAMESPACE=kgateway-system
+helm repo add kgateway-dev oci://cr.kgateway.dev/kgateway-dev || true
+helm upgrade -i --create-namespace --namespace $KGATEWAY_SYSTEM_NAMESPACE --version $KGATEWAY_VERSION kgateway-crds oci://cr.kgateway.dev/kgateway-dev/charts/kgateway-crds
+helm upgrade -i --namespace $KGATEWAY_SYSTEM_NAMESPACE --version $KGATEWAY_VERSION kgateway oci://cr.kgateway.dev/kgateway-dev/charts/kgateway --set inferenceExtension.enabled=true
+
+
+# Deploy the Gateway Instance
+kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/kgateway/gateway.yaml -n $MODEL_NAMESPACE
\ No newline at end of file
--- a/deploy/inference-gateway/vllm_agg_qwen.yaml
+++ b/deploy/inference-gateway/vllm_agg_qwen.yaml
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+
+# Default values for dynamo-gaie.
+# This is a YAML-formatted file.
+# Declare variables to be passed into your templates.
+
+# This is the Dynamo namespace where the dynamo model is deployed
+dynamoNamespace: "vllm-agg"
+
+# This is the port on which the model is exposed
+model:
+  # This is the model name that will be used to route traffic to the dynamo model
+  # for example, if the model name is Qwen/Qwen3-0.6B, then the modelShortName should be qwen
+  identifier: "Qwen/Qwen3-0.6B"
+  # This is the short name of the model that will be used to generate the resource names
+  shortName: "qwen"