@@ -8,7 +8,7 @@ When integrating Dynamo with the Inference Gateway you could either use the defa
...
@@ -8,7 +8,7 @@ When integrating Dynamo with the Inference Gateway you could either use the defa
The setup provided here uses the Dynamo custom EPP by default. Set `epp.useDynamo=false` in your deployment to pick the approach 2.
The setup provided here uses the Dynamo custom EPP by default. Set `epp.useDynamo=false` in your deployment to pick the approach 2.
EPP’s default kv-routing approach is token-aware only `by approximation` because the prompt is tokenized with a generic tokenizer unaware of the model deployed. But the Dynamo plugin uses a token-aware KV algorithm. It employs the dynamo router which implements kv routing by running your model’s tokenizer inline. The EPP plugin configuration lives in [`helm/dynamo-gaie/epp-config-dynamo.yaml`](helm/dynamo-gaie/epp-config-dynamo.yaml) per EPP [convention](https://gateway-api-inference-extension.sigs.k8s.io/guides/epp-configuration/config-text/).
EPP’s default kv-routing approach is not token-aware because the prompt is hashed without tokenization. But the Dynamo plugin uses a token-aware KV algorithm. It employs the dynamo router which implements kv routing by running your model’s tokenizer inline. The EPP plugin configuration lives in [`helm/dynamo-gaie/epp-config-dynamo.yaml`](helm/dynamo-gaie/epp-config-dynamo.yaml) per EPP [convention](https://gateway-api-inference-extension.sigs.k8s.io/guides/epp-configuration/config-text/).
Currently, these setups are only supported with the kGateway based Inference Gateway.
Currently, these setups are only supported with the kGateway based Inference Gateway.
...
@@ -32,29 +32,12 @@ Currently, these setups are only supported with the kGateway based Inference Gat
...
@@ -32,29 +32,12 @@ Currently, these setups are only supported with the kGateway based Inference Gat
### 2. Deploy Inference Gateway ###
### 2. Deploy Inference Gateway ###
First, deploy an inference gateway service. In this example, we'll install `kgateway` based gateway implementation.
First, deploy an inference gateway service. In this example, we'll install `kgateway` based gateway implementation.
You can use the script below or follow the steps manually.
@@ -216,10 +205,6 @@ The Inference Gateway provides HTTP endpoints for model inference.
...
@@ -216,10 +205,6 @@ The Inference Gateway provides HTTP endpoints for model inference.
#### 1: Populate gateway URL for your k8s cluster ####
#### 1: Populate gateway URL for your k8s cluster ####
```bash
export GATEWAY_URL=<Gateway-URL>
```
To test the gateway in minikube, use the following command:
To test the gateway in minikube, use the following command:
a. User minikube tunnel to expose the gateway to the host
a. User minikube tunnel to expose the gateway to the host
This requires `sudo` access to the host machine. alternatively, you can use port-forward to expose the gateway to the host as shown in alternative (b).
This requires `sudo` access to the host machine. alternatively, you can use port-forward to expose the gateway to the host as shown in alternative (b).
...
@@ -230,7 +215,7 @@ ps aux | grep "minikube tunnel" | grep -v grep # make sure minikube tunnel is no
...
@@ -230,7 +215,7 @@ ps aux | grep "minikube tunnel" | grep -v grep # make sure minikube tunnel is no
minikube tunnel & # start the tunnel
minikube tunnel & # start the tunnel
# in second terminal where you want to send inference requests
# in second terminal where you want to send inference requests
GATEWAY_URL=$(kubectl get svc inference-gateway -n my-model -oyaml -ojsonpath='{.spec.clusterIP}')
GATEWAY_URL=$(kubectl get svc inference-gateway -n my-model -ojsonpath='{.spec.clusterIP}')