"ssh:/git@developer.sourcefind.cn:2222/OpenDAS/vllm_cscc.git" did not exist on "1ac4ccf73c91370f1fcb4f60c1117646cd7a7502"
Unverified Commit ee33acf3 authored by atchernych's avatar atchernych Committed by GitHub
Browse files

fix: Add weight: 1 to the EPP config plugins (#6756)


Signed-off-by: default avatarAnna Tchernych <atchernych@nvidia.com>
parent 8ed69ea2
......@@ -46,6 +46,8 @@ schedulingProfiles:
- name: decode
plugins:
- pluginRef: decode-filter
weight: 1
- pluginRef: dyn-decode
weight: 1
- pluginRef: picker
weight: 1
......@@ -278,7 +278,7 @@ ps aux | grep "minikube tunnel" | grep -v grep # make sure minikube tunnel is no
minikube tunnel # start the tunnel
# in second terminal where you want to send inference requests
GATEWAY_URL=$(kubectl get svc inference-gateway -n my-model -o jsonpath='{.spec.clusterIP}') & echo $GATEWAY_URL
GATEWAY_URL=$(kubectl get svc inference-gateway -n my-model -o jsonpath='{.spec.clusterIP}') && echo $GATEWAY_URL
```
b. use port-forward to expose the gateway to the host
......
......@@ -14,6 +14,9 @@ spec:
extraPodSpec:
mainContainer:
image: nvcr.io/nvidia/ai-dynamo/epp-image:my-tag
env:
- name: DYN_DECODE_FALLBACK
value: "true"
eppConfig:
config:
plugins:
......@@ -33,9 +36,11 @@ spec:
- name: decode
plugins:
- pluginRef: decode-filter
weight: 1
- pluginRef: dyn-decode
weight: 1
- pluginRef: picker
weight: 1
VllmDecodeWorker:
componentType: worker
envFromSecret: hf-token-secret
......
......@@ -46,9 +46,11 @@ spec:
- name: decode
plugins:
- pluginRef: decode-filter
weight: 1
- pluginRef: dyn-decode
weight: 1
- pluginRef: picker
weight: 1
VllmDecodeWorker:
componentType: worker
envFromSecret: hf-token-secret
......
......@@ -45,15 +45,19 @@ spec:
- name: prefill
plugins:
- pluginRef: prefill-filter
weight: 1
- pluginRef: dyn-prefill
weight: 1
- pluginRef: picker
weight: 1
- name: decode
plugins:
- pluginRef: decode-filter
weight: 1
- pluginRef: dyn-decode
weight: 1
- pluginRef: picker
weight: 1
VllmPrefillWorker:
componentType: worker
subComponentType: prefill
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment