[Dependency] Remove default ray dependency (#36170)

Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

[Dependency] Remove default ray dependency (#36170)
Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
384425f8 · Wentao Ye · GitHub · a0f44bb6 · 384425f8 · 384425f8
Unverified Commit 384425f8 authored Mar 08, 2026 by Wentao Ye Committed by GitHub Mar 08, 2026
Showing with 6 additions and 2 deletions

docs/serving/parallelism_scaling.md docs/serving/parallelism_scaling.md +6 -0

requirements/cuda.txt requirements/cuda.txt +0 -1

requirements/rocm.txt requirements/rocm.txt +0 -1

No files found.
--- a/docs/serving/parallelism_scaling.md
+++ b/docs/serving/parallelism_scaling.md
@@ -68,6 +68,12 @@ vLLM uses Ray to manage the distributed execution of tasks across multiple nodes
 Ray also offers high-level APIs for large-scale [offline batch inference](https://docs.ray.io/en/latest/data/working-with-llms.html) and [online serving](https://docs.ray.io/en/latest/serve/llm) that can leverage vLLM as the engine. These APIs add production-grade fault tolerance, scaling, and distributed observability to vLLM workloads.
+Ray is an optional dependency. Install it explicitly before using Ray-based execution, for example:
+```bash
+pip install "ray[cgraph]"
+```
 For details, see the [Ray documentation](https://docs.ray.io/en/latest/index.html).
 ### Ray cluster setup with containers

--- a/requirements/cuda.txt
+++ b/requirements/cuda.txt
@@ -4,7 +4,6 @@
 numba == 0.61.2 # Required for N-gram speculative decoding
 # Dependencies for NVIDIA GPUs
-ray[cgraph]>=2.48.0
 torch==2.10.0
 torchaudio==2.10.0
 # These must be updated alongside torch

--- a/requirements/rocm.txt
+++ b/requirements/rocm.txt
@@ -10,7 +10,6 @@ numba == 0.61.2 # Required for N-gram speculative decoding
 # Dependencies for AMD GPUs
 datasets
-ray[cgraph]>=2.48.0
 peft
 pytest-asyncio
 tensorizer==2.10.1