Support to serve vLLM on Kubernetes with LWS (#4829)

Signed-off-by: kerthcet <kerthcet@gmail.com>

Support to serve vLLM on Kubernetes with LWS (#4829)
Signed-off-by: kerthcet <kerthcet@gmail.com>
8e7fb5d4 · Kante Yin · GitHub · 9a31a817 · 8e7fb5d4 · 8e7fb5d4
Unverified Commit 8e7fb5d4 authored May 17, 2024 by Kante Yin Committed by GitHub May 16, 2024
Show whitespace changes
Inline Side-by-side

Showing with 13 additions and 0 deletions

docs/source/serving/deploying_with_lws.rst docs/source/serving/deploying_with_lws.rst +12 -0

docs/source/serving/integrations.rst docs/source/serving/integrations.rst +1 -0

No files found.
--- a/docs/source/serving/deploying_with_lws.rst
+++ b/docs/source/serving/deploying_with_lws.rst
+.. _deploying_with_lws:
+Deploying with LWS
+============================
+LeaderWorkerSet (LWS) is a Kubernetes API that aims to address common deployment patterns of AI/ML inference workloads.
+A major use case is for multi-host/multi-node distributed inference.
+vLLM can be deployed with `LWS <https://github.com/kubernetes-sigs/lws>`_ on Kubernetes for distributed model serving.
+Please see `this guide <https://github.com/kubernetes-sigs/lws/tree/main/docs/examples/vllm>`_ for more details on
+deploying vLLM on Kubernetes using LWS.
--- a/docs/source/serving/integrations.rst
+++ b/docs/source/serving/integrations.rst
@@ -8,4 +8,5 @@ Integrations
   deploying_with_kserve
   deploying_with_triton
   deploying_with_bentoml
+   deploying_with_lws
   serving_with_langchain