Deploying vLLM on Kubernetes is a scalable and efficient way to serve machine learning models. This guide walks you through deploying vLLM using native Kubernetes.
Deploying vLLM on Kubernetes is a scalable and efficient way to serve machine learning models. This guide walks you through deploying vLLM using native Kubernetes.
*[Deployment with CPUs](#deployment-with-cpus)
*[Deployment with GPUs](#deployment-with-gpus)
Alternatively, you can deploy vLLM to Kubernetes using any of the following:
Alternatively, you can deploy vLLM to Kubernetes using any of the following:
*[Helm](frameworks/helm.md)
*[Helm](frameworks/helm.md)
*[InftyAI/llmaz](integrations/llmaz.md)
*[InftyAI/llmaz](integrations/llmaz.md)
...
@@ -14,11 +17,107 @@ Alternatively, you can deploy vLLM to Kubernetes using any of the following:
...
@@ -14,11 +17,107 @@ Alternatively, you can deploy vLLM to Kubernetes using any of the following: