Unverified Commit ee33acf3 authored by atchernych's avatar atchernych Committed by GitHub
Browse files

fix: Add weight: 1 to the EPP config plugins (#6756)


Signed-off-by: default avatarAnna Tchernych <atchernych@nvidia.com>
parent 8ed69ea2
...@@ -46,6 +46,8 @@ schedulingProfiles: ...@@ -46,6 +46,8 @@ schedulingProfiles:
- name: decode - name: decode
plugins: plugins:
- pluginRef: decode-filter - pluginRef: decode-filter
weight: 1
- pluginRef: dyn-decode - pluginRef: dyn-decode
weight: 1 weight: 1
- pluginRef: picker - pluginRef: picker
weight: 1
...@@ -278,7 +278,7 @@ ps aux | grep "minikube tunnel" | grep -v grep # make sure minikube tunnel is no ...@@ -278,7 +278,7 @@ ps aux | grep "minikube tunnel" | grep -v grep # make sure minikube tunnel is no
minikube tunnel # start the tunnel minikube tunnel # start the tunnel
# in second terminal where you want to send inference requests # in second terminal where you want to send inference requests
GATEWAY_URL=$(kubectl get svc inference-gateway -n my-model -o jsonpath='{.spec.clusterIP}') & echo $GATEWAY_URL GATEWAY_URL=$(kubectl get svc inference-gateway -n my-model -o jsonpath='{.spec.clusterIP}') && echo $GATEWAY_URL
``` ```
b. use port-forward to expose the gateway to the host b. use port-forward to expose the gateway to the host
......
...@@ -14,6 +14,9 @@ spec: ...@@ -14,6 +14,9 @@ spec:
extraPodSpec: extraPodSpec:
mainContainer: mainContainer:
image: nvcr.io/nvidia/ai-dynamo/epp-image:my-tag image: nvcr.io/nvidia/ai-dynamo/epp-image:my-tag
env:
- name: DYN_DECODE_FALLBACK
value: "true"
eppConfig: eppConfig:
config: config:
plugins: plugins:
...@@ -33,9 +36,11 @@ spec: ...@@ -33,9 +36,11 @@ spec:
- name: decode - name: decode
plugins: plugins:
- pluginRef: decode-filter - pluginRef: decode-filter
weight: 1
- pluginRef: dyn-decode - pluginRef: dyn-decode
weight: 1 weight: 1
- pluginRef: picker - pluginRef: picker
weight: 1
VllmDecodeWorker: VllmDecodeWorker:
componentType: worker componentType: worker
envFromSecret: hf-token-secret envFromSecret: hf-token-secret
......
...@@ -46,9 +46,11 @@ spec: ...@@ -46,9 +46,11 @@ spec:
- name: decode - name: decode
plugins: plugins:
- pluginRef: decode-filter - pluginRef: decode-filter
weight: 1
- pluginRef: dyn-decode - pluginRef: dyn-decode
weight: 1 weight: 1
- pluginRef: picker - pluginRef: picker
weight: 1
VllmDecodeWorker: VllmDecodeWorker:
componentType: worker componentType: worker
envFromSecret: hf-token-secret envFromSecret: hf-token-secret
......
...@@ -45,15 +45,19 @@ spec: ...@@ -45,15 +45,19 @@ spec:
- name: prefill - name: prefill
plugins: plugins:
- pluginRef: prefill-filter - pluginRef: prefill-filter
weight: 1
- pluginRef: dyn-prefill - pluginRef: dyn-prefill
weight: 1 weight: 1
- pluginRef: picker - pluginRef: picker
weight: 1
- name: decode - name: decode
plugins: plugins:
- pluginRef: decode-filter - pluginRef: decode-filter
weight: 1
- pluginRef: dyn-decode - pluginRef: dyn-decode
weight: 1 weight: 1
- pluginRef: picker - pluginRef: picker
weight: 1
VllmPrefillWorker: VllmPrefillWorker:
componentType: worker componentType: worker
subComponentType: prefill subComponentType: prefill
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment