"lib/llm/src/vscode:/vscode.git/clone" did not exist on "f978f4d1d68cec95835a967834a34c410e07eb0a"
Unverified Commit 8248a116 authored by Biswa Panda's avatar Biswa Panda Committed by GitHub
Browse files

feat: gaie helm chart based example (#2168)


Signed-off-by: default avatarBiswa Panda <biswa.panda@gmail.com>
Co-authored-by: default avatarcoderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
parent 8b0a035a
## Inference Gateway Setup with Dynamo
This Setup treats each Dynamo deployment as a black box and routes traffic randomly among the deployments.
Currently, this setup is only kgateway based Inference Gateway.
## Table of Contents
- [Prerequisites](#prerequisites)
- [Installation Steps](#installation-steps)
- [Usage](#usage)
## Prerequisites
- Kubernetes cluster with kubectl configured
- NVIDIA GPU drivers installed on worker nodes
## Installation Steps
1. **Install Dynamo Platform**
[See Quickstart Guide](../../../docs/guides/dynamo_deploy/quickstart.md) to install Dynamo Cloud.
2. **Deploy Inference Gateway**
First, deploy an inference gateway service. In this example, we'll install `kgateway` based gateway implementation.
You can use the script below or follow the steps manually.
Script:
```bash
./install_gaie_crd_kgateway.sh
```
Manual steps:
a. Deploy the Gateway API CRDs:
```bash
GATEWAY_API_VERSION=v1.3.0
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/$GATEWAY_API_VERSION/standard-install.yaml
```
b. Install the Inference Extension CRDs (Inferenece Model and Inference Pool CRDs)
```bash
INFERENCE_EXTENSION_VERSION=v0.5.1
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/$INFERENCE_EXTENSION_VERSION/manifests.yaml -n my-model
```
c. Install `kgateway` CRDs and kgateway.
```bash
KGATEWAY_VERSION=v2.0.3
# Install the Kgateway CRDs
helm upgrade -i --create-namespace --namespace kgateway-system --version $KGATEWAY_VERSION kgateway-crds oci://cr.kgateway.dev/kgateway-dev/charts/kgateway-crds
# Install Kgateway
helm upgrade -i --namespace kgateway-system --version $KGATEWAY_VERSION kgateway oci://cr.kgateway.dev/kgateway-dev/charts/kgateway --set inferenceExtension.enabled=true
```
d. Deploy the Gateway Instance
```bash
kubectl create namespace my-model
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/kgateway/gateway.yaml -n my-model
```
```bash
kubectl get gateway inference-gateway -n my-model
# Sample output
# NAME CLASS ADDRESS PROGRAMMED AGE
# inference-gateway kgateway x.x.x.x True 1m
```
3. **Install dynamo model and dynamo gaie helm chart**
The Inference Gateway is configured through the `inference-gateway-resources.yaml` file.
Deploy the Inference Gateway resources to your Kubernetes cluster:
```bash
cd deploy/inference-gateway
helm install dynamo-gaie ./helm/dynamo-gaie -n my-model -f ./vllm_agg_qwen.yaml
```
Key configurations include:
- An InferenceModel resource for the Qwen model
- A service for the inference gateway
- Required RBAC roles and bindings
- RBAC permissions
5. **Verify Installation**
Check that all resources are properly deployed:
```bash
kubectl get inferencepool
kubectl get inferencemodel
kubectl get httproute
kubectl get service
kubectl get gateway
```
Sample output:
```bash
# kubectl get inferencepool
NAME AGE
qwen-pool 33m
# kubectl get inferencemodel
NAME MODEL NAME INFERENCE POOL CRITICALITY AGE
qwen-model Qwen/Qwen3-0.6B qwen-pool Critical 33m
# kubectl get httproute
NAME HOSTNAMES AGE
qwen-route 33m
```
## Usage
The Inference Gateway provides HTTP endpoints for model inference.
### 1: Populate gateway URL for your k8s cluster
```bash
export GATEWAY_URL=<Gateway-URL>
```
To test the gateway in minikube, use the following command:
a. User minikube tunnel to expose the gateway to the host
This requires `sudo` access to the host machine. alternatively, you can use port-forward to expose the gateway to the host as shown in alternateive (b).
```bash
# in first terminal
minikube tunnel
# in second terminal where you want to send inference requests
GATEWAY_URL=$(kubectl get svc inference-gateway -n my-model -o yaml -o jsonpath='{.spec.clusterIP}')
echo $GATEWAY_URL
```
b. use port-forward to expose the gateway to the host
```bash
# in first terminal
kubectl port-forward svc/inference-gateway 8000:80 -n my-model
# in second terminal where you want to send inference requests
GATEWAY_URL=http://localhost:8000
```
### 2: Check models deployed to inference gateway
a. Query models:
```bash
# in the second terminal where you GATEWAY_URL is set
curl $GATEWAY_URL/v1/models | jq .
```
Sample output:
```json
{
"data": [
{
"created": 1753768323,
"id": "Qwen/Qwen3-0.6B",
"object": "object",
"owned_by": "nvidia"
}
],
"object": "list"
}
```
b. Send inference request to gateway:
```bash
MODEL_NAME="Qwen/Qwen3-0.6B"
curl $GATEWAY_URL/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "'"${MODEL_NAME}"'",
"messages": [
{
"role": "user",
"content": "In the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the world for centuries. You are an intrepid explorer, known for your unparalleled curiosity and courage, who has stumbled upon an ancient map hinting at ests that Aeloria holds a secret so profound that it has the potential to reshape the very fabric of reality. Your journey will take you through treacherous deserts, enchanted forests, and across perilous mountain ranges. Your Task: Character Background: Develop a detailed background for your character. Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost familt clue is hidden."
}
],
"stream":false,
"max_tokens": 30,
"temperature": 0.0
}'
```
Sample inference output:
```json
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"logprobs": null,
"message": {
"audio": null,
"content": "<think>\nOkay, I need to develop a character background for the user's query. Let me start by understanding the requirements. The character is an",
"function_call": null,
"refusal": null,
"role": "assistant",
"tool_calls": null
}
}
],
"created": 1753768682,
"id": "chatcmpl-772289b8-5998-4f6d-bd61-3659b684b347",
"model": "Qwen/Qwen3-0.6B",
"object": "chat.completion",
"service_tier": null,
"system_fingerprint": null,
"usage": {
"completion_tokens": 29,
"completion_tokens_details": null,
"prompt_tokens": 196,
"prompt_tokens_details": null,
"total_tokens": 225
}
}
```
# Installing Inference Gateway with Dynamo (Experimental)
This is an experimental setup that treats each Dynamo deployment as a black box and routes traffic randomly among the deployments.
This guide provides instructions for setting up the Inference Gateway with Dynamo for managing and routing inference requests.
## Prerequisites
- Kubernetes cluster with kubectl configured
- NVIDIA GPU drivers installed on worker nodes
## Installation Steps
1. **Install Dynamo Cloud**
[See Quickstart Guide](../../../docs/guides/dynamo_deploy/quickstart.md) to install Dynamo Cloud.
2. **Deploy Inference Gateway**
First, deploy an inference gateway service. In this example, we'll install `kgateway` based gateway implementation.
Install the Inference Extension CRDs:
```bash
VERSION=v0.3.0
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/$VERSION/manifests.yaml
```
Deploy an Inference Gateway. In this example, we'll install `Kgateway`:
```bash
KGTW_VERSION=v2.0.2
# Install the Kgateway CRDs
helm upgrade -i --create-namespace --namespace kgateway-system --version $KGTW_VERSION kgateway-crds oci://cr.kgateway.dev/kgateway-dev/charts/kgateway-crds
# Install Kgateway
helm upgrade -i --namespace kgateway-system --version $KGTW_VERSION kgateway oci://cr.kgateway.dev/kgateway-dev/charts/kgateway --set inferenceExtension.enabled=true
# Deploy the Gateway
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.3.0/standard-install.yaml
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/kgateway/gateway.yaml
```
### Validate Resources
```bash
kubectl get gateway inference-gateway
# Sample output
# NAME CLASS ADDRESS PROGRAMMED AGE
# inference-gateway kgateway True 1m
```
3. **Apply Dynamo-specific manifests**
The Inference Gateway is configured through the `inference-gateway-resources.yaml` file.
Deploy the Inference Gateway resources to your Kubernetes cluster:
```bash
cd deploy/inference-gateway/example
kubectl apply -f resources
```
Key configurations include:
- An InferenceModel resource for the DeepSeek model
- A service for the inference gateway
- Required RBAC roles and bindings
- RBAC permissions
5. **Verify Installation**
Check that all resources are properly deployed:
```bash
kubectl get inferencepool
kubectl get inferencemodel
kubectl get httproute
```
Sample output:
```bash
# kubectl get inferencepool
NAME AGE
dynamo-deepseek 6s
# kubectl get inferencemodel
NAME MODEL NAME INFERENCE POOL CRITICALITY AGE
deep-seek-model deepseek-ai/DeepSeek-R1-Distill-Llama-8B dynamo-deepseek Critical 6s
# kubectl get httproute
NAME HOSTNAMES AGE
llm-route 6s
```
## Usage
The Inference Gateway provides HTTP/2 endpoints for model inference. The default service is exposed on port 9002.
### 1: Populate gateway URL for your k8s cluster
```bash
export GATEWAY_URL=<Gateway-URL>
```
To test the gateway in minikube, use the following command:
```bash
minikube tunnel &
GATEWAY_URL=$(kubectl get svc inference-gateway -o yaml -o jsonpath='{.spec.clusterIP}')
echo $GATEWAY_URL
```
### 2: Check models deployed to inference gateway
Query models:
```bash
curl $GATEWAY_URL/v1/models | jq .
```
Send inference request to gateway:
```bash
curl $GATEWAY_URL/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
"messages": [
{
"role": "user",
"content": "In the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the world for centuries. You are an intrepid explorer, known for your unparalleled curiosity and courage, who has stumbled upon an ancient map hinting at ests that Aeloria holds a secret so profound that it has the potential to reshape the very fabric of reality. Your journey will take you through treacherous deserts, enchanted forests, and across perilous mountain ranges. Your Task: Character Background: Develop a detailed background for your character. Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost familt clue is hidden."
}
],
"stream":false,
"max_tokens": 30
}'
```
\ No newline at end of file
# Patterns to ignore when building packages.
# This supports shell glob matching, relative path matching, and
# negation (prefixed with !). Only one pattern per line.
.DS_Store
# Common VCS dirs
.git/
.gitignore
.bzr/
.bzrignore
.hg/
.hgignore
.svn/
# Common backup files
*.swp
*.bak
*.tmp
*.orig
*~
# Various IDEs
.project
.idea/
*.tmproj
.vscode/
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
apiVersion: v2
name: dynamo-gaie
description: A Helm chart for installing Inference Gateway Inference Extension with Dynamo
# A chart can be either an 'application' or a 'library' chart.
#
# Application charts are a collection of templates that can be packaged into versioned archives
# to be deployed.
#
# Library charts provide useful utilities or functions for the chart developer. They're included as
# a dependency of application charts to inject those utilities and functions into the rendering
# pipeline. Library charts do not define any templates and therefore cannot be deployed.
type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.1.0
# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
# It is recommended to use it with quotes.
appVersion: "1.16.0"
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
{{/*
Expand the name of the chart.
*/}}
{{- define "dynamo-gaie.name" -}}
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }}
{{- end }}
{{/*
Create a default fully qualified app name.
We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec).
If release name contains chart name it will be used as a full name.
*/}}
{{- define "dynamo-gaie.fullname" -}}
{{- if .Values.fullnameOverride }}
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- $name := default .Chart.Name .Values.nameOverride }}
{{- if contains $name .Release.Name }}
{{- .Release.Name | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }}
{{- end }}
{{- end }}
{{- end }}
{{/*
Create chart name and version as used by the chart label.
*/}}
{{- define "dynamo-gaie.chart" -}}
{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }}
{{- end }}
{{/*
Common labels
*/}}
{{- define "dynamo-gaie.labels" -}}
helm.sh/chart: {{ include "dynamo-gaie.chart" . }}
{{ include "dynamo-gaie.selectorLabels" . }}
{{- if .Chart.AppVersion }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
{{- end }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- end }}
{{/*
Selector labels
*/}}
{{- define "dynamo-gaie.selectorLabels" -}}
app.kubernetes.io/name: {{ include "dynamo-gaie.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- end }}
{{/*
Create the name of the service account to use
*/}}
{{- define "dynamo-gaie.serviceAccountName" -}}
{{- if .Values.serviceAccount.create }}
{{- default (include "dynamo-gaie.fullname" .) .Values.serviceAccount.name }}
{{- else }}
{{- default "default" .Values.serviceAccount.name }}
{{- end }}
{{- end }}
......@@ -19,7 +19,7 @@ metadata:
subjects:
- kind: ServiceAccount
name: default
namespace: default
namespace: {{ .Release.Namespace }}
roleRef:
kind: ClusterRole
name: pod-read
name: pod-read
\ No newline at end of file
......@@ -15,31 +15,31 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: dynamo-deepseek-epp
namespace: default
name: {{ .Values.model.shortName }}-epp
namespace: {{ .Release.Namespace }}
labels:
app: dynamo-deepseek-epp
app: {{ .Values.model.shortName }}-epp
spec:
replicas: 1
selector:
matchLabels:
app: dynamo-deepseek-epp
app: {{ .Values.model.shortName }}-epp
template:
metadata:
labels:
app: dynamo-deepseek-epp
app: {{ .Values.model.shortName }}-epp
spec:
# Conservatively, this timeout should mirror the longest grace period of the pods within the pool
terminationGracePeriodSeconds: 130
containers:
- name: epp
image: us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/epp:v0.4.0
image: {{ .Values.extension.image }}
imagePullPolicy: IfNotPresent
args:
- -poolName
- "dynamo-deepseek"
- "{{ .Values.model.shortName }}-pool"
- "-poolNamespace"
- "default"
- "{{ .Release.Namespace }}"
- -v
- "4"
- --zap-encoder
......@@ -64,4 +64,4 @@ spec:
port: 9003
service: inference-extension
initialDelaySeconds: 5
periodSeconds: 10
periodSeconds: 10
\ No newline at end of file
......@@ -12,23 +12,28 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
{{- if .Values.httpRoute.enabled }}
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: llm-route
name: {{ .Values.model.shortName }}-route
spec:
parentRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: inference-gateway
name: {{ .Values.httpRoute.gatewayName }}
rules:
- backendRefs:
- group: inference.networking.x-k8s.io
kind: InferencePool
name: dynamo-deepseek
name: {{ .Values.model.shortName }}-pool
port: {{ .Values.inferencePool.port }}
weight: 1
matches:
- path:
type: PathPrefix
value: /
value: {{ .Values.httpRoute.path.prefix }}
timeouts:
request: 300s
\ No newline at end of file
request: {{ .Values.httpRoute.timeout.request }}
{{- end }}
\ No newline at end of file
......@@ -15,12 +15,12 @@
apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferenceModel
metadata:
name: deep-seek-model
namespace: default
name: {{ .Values.model.shortName }}-model
namespace: {{ .Release.Namespace }}
spec:
criticality: Critical
modelName: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
criticality: {{ .Values.model.criticality }}
modelName: {{ .Values.model.identifier }}
poolRef:
group: inference.networking.x-k8s.io
kind: InferencePool
name: dynamo-deepseek
\ No newline at end of file
name: {{ .Values.model.shortName }}-pool
\ No newline at end of file
......@@ -15,14 +15,15 @@
apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferencePool
metadata:
name: dynamo-deepseek
namespace: default
name: {{ .Values.model.shortName }}-pool
namespace: {{ .Release.Namespace }}
spec:
targetPortNumber: 8000
targetPortNumber: {{ .Values.inferencePool.port }}
selector:
nvidia.com/dynamo-component: Frontend
nvidia.com/dynamo-namespace: {{ .Values.dynamoNamespace }}
extensionRef:
failureMode: FailOpen
group: ""
kind: Service
name: dynamo-deepseek-epp
\ No newline at end of file
name: {{ .Values.model.shortName }}-epp
\ No newline at end of file
......@@ -15,11 +15,11 @@
apiVersion: v1
kind: Service
metadata:
name: dynamo-deepseek-epp
namespace: default
name: {{ .Values.model.shortName }}-epp
namespace: {{ .Release.Namespace }}
spec:
selector:
app: dynamo-deepseek-epp
app: {{ .Values.model.shortName }}-epp
ports:
- protocol: TCP
port: 9002
......
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Default values for dynamo-gaie.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
# This is the Dynamo namespace where the dynamo model is deployed
dynamoNamespace: "vllm-agg"
# This is the port on which the model is exposed
model:
# This is the model name that will be used to route traffic to the dynamo model
# for example, if the model name is Qwen/Qwen3-0.6B, then the modelShortName should be qwen
identifier: "Qwen/Qwen3-0.6B"
# This is the short name of the model that will be used to generate the resource names
shortName: "qwen"
# Criticality level for the inference model
criticality: "Critical"
# InferencePool configuration
inferencePool:
# Target port number for the inference pool
port: 8000
# HTTPRoute configuration
httpRoute:
# Enable the HTTPRoute
enabled: true
# Gateway parent reference configuration
gatewayName: "inference-gateway"
# Path matching configuration
path:
prefix: "/"
# Timeout configuration
timeout:
request: "300s"
extension:
# the GAIE extension
image: us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/epp:v0.4.0
\ No newline at end of file
#!/usr/bin/env bash
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
set -euo pipefail
trap 'echo "Error at line $LINENO. Exiting."' ERR
MODEL_NAMESPACE=my-model
kubectl create namespace $MODEL_NAMESPACE || true
# Install the Gateway API
GATEWAY_API_VERSION=v1.3.0
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/$GATEWAY_API_VERSION/standard-install.yaml
# Install the Inference Extension CRDs
INFERENCE_EXTENSION_VERSION=v0.5.1
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/$INFERENCE_EXTENSION_VERSION/manifests.yaml -n $MODEL_NAMESPACE
# Install the Kgateway CRDs and Kgateway
KGATEWAY_VERSION=v2.0.3
KGATEWAY_SYSTEM_NAMESPACE=kgateway-system
helm repo add kgateway-dev oci://cr.kgateway.dev/kgateway-dev || true
helm upgrade -i --create-namespace --namespace $KGATEWAY_SYSTEM_NAMESPACE --version $KGATEWAY_VERSION kgateway-crds oci://cr.kgateway.dev/kgateway-dev/charts/kgateway-crds
helm upgrade -i --namespace $KGATEWAY_SYSTEM_NAMESPACE --version $KGATEWAY_VERSION kgateway oci://cr.kgateway.dev/kgateway-dev/charts/kgateway --set inferenceExtension.enabled=true
# Deploy the Gateway Instance
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/kgateway/gateway.yaml -n $MODEL_NAMESPACE
\ No newline at end of file
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
# Default values for dynamo-gaie.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
# This is the Dynamo namespace where the dynamo model is deployed
dynamoNamespace: "vllm-agg"
# This is the port on which the model is exposed
model:
# This is the model name that will be used to route traffic to the dynamo model
# for example, if the model name is Qwen/Qwen3-0.6B, then the modelShortName should be qwen
identifier: "Qwen/Qwen3-0.6B"
# This is the short name of the model that will be used to generate the resource names
shortName: "qwen"
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment