Unverified Commit 0f087451 authored by Anna Pendleton's avatar Anna Pendleton Committed by GitHub
Browse files

[Doc] Add troubleshooting section to k8s deployment (#19377)


Signed-off-by: default avatarAnna Pendleton <pendleton@google.com>
parent 3597b06a
...@@ -5,19 +5,22 @@ title: Using Kubernetes ...@@ -5,19 +5,22 @@ title: Using Kubernetes
Deploying vLLM on Kubernetes is a scalable and efficient way to serve machine learning models. This guide walks you through deploying vLLM using native Kubernetes. Deploying vLLM on Kubernetes is a scalable and efficient way to serve machine learning models. This guide walks you through deploying vLLM using native Kubernetes.
* [Deployment with CPUs](#deployment-with-cpus) - [Deployment with CPUs](#deployment-with-cpus)
* [Deployment with GPUs](#deployment-with-gpus) - [Deployment with GPUs](#deployment-with-gpus)
- [Troubleshooting](#troubleshooting)
- [Startup Probe or Readiness Probe Failure, container log contains "KeyboardInterrupt: terminated"](#startup-probe-or-readiness-probe-failure-container-log-contains-keyboardinterrupt-terminated)
- [Conclusion](#conclusion)
Alternatively, you can deploy vLLM to Kubernetes using any of the following: Alternatively, you can deploy vLLM to Kubernetes using any of the following:
* [Helm](frameworks/helm.md) - [Helm](frameworks/helm.md)
* [InftyAI/llmaz](integrations/llmaz.md) - [InftyAI/llmaz](integrations/llmaz.md)
* [KServe](integrations/kserve.md) - [KServe](integrations/kserve.md)
* [kubernetes-sigs/lws](frameworks/lws.md) - [kubernetes-sigs/lws](frameworks/lws.md)
* [meta-llama/llama-stack](integrations/llamastack.md) - [meta-llama/llama-stack](integrations/llamastack.md)
* [substratusai/kubeai](integrations/kubeai.md) - [substratusai/kubeai](integrations/kubeai.md)
* [vllm-project/aibrix](https://github.com/vllm-project/aibrix) - [vllm-project/aibrix](https://github.com/vllm-project/aibrix)
* [vllm-project/production-stack](integrations/production-stack.md) - [vllm-project/production-stack](integrations/production-stack.md)
## Deployment with CPUs ## Deployment with CPUs
...@@ -351,6 +354,17 @@ INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) ...@@ -351,6 +354,17 @@ INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
If the service is correctly deployed, you should receive a response from the vLLM model. If the service is correctly deployed, you should receive a response from the vLLM model.
## Troubleshooting
### Startup Probe or Readiness Probe Failure, container log contains "KeyboardInterrupt: terminated"
If the startup or readiness probe failureThreshold is too low for the time needed to startup the server, Kubernetes scheduler will kill the container. A couple of indications that this has happened:
1. container log contains "KeyboardInterrupt: terminated"
2. `kubectl get events` shows message `Container $NAME failed startup probe, will be restarted`
To mitigate, increase the failureThreshold to allow more time for the model server to start serving. You can identify an ideal failureThreshold by removing the probes from the manifest and measuring how much time it takes for the model server to show it's ready to serve.
## Conclusion ## Conclusion
Deploying vLLM with Kubernetes allows for efficient scaling and management of ML models leveraging GPU resources. By following the steps outlined above, you should be able to set up and test a vLLM deployment within your Kubernetes cluster. If you encounter any issues or have suggestions, please feel free to contribute to the documentation. Deploying vLLM with Kubernetes allows for efficient scaling and management of ML models leveraging GPU resources. By following the steps outlined above, you should be able to set up and test a vLLM deployment within your Kubernetes cluster. If you encounter any issues or have suggestions, please feel free to contribute to the documentation.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment