Unverified Commit 384425f8 authored by Wentao Ye's avatar Wentao Ye Committed by GitHub
Browse files

[Dependency] Remove default ray dependency (#36170)


Signed-off-by: default avataryewentao256 <zhyanwentao@126.com>
Signed-off-by: default avatarWentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: default avatargemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
parent a0f44bb6
...@@ -68,6 +68,12 @@ vLLM uses Ray to manage the distributed execution of tasks across multiple nodes ...@@ -68,6 +68,12 @@ vLLM uses Ray to manage the distributed execution of tasks across multiple nodes
Ray also offers high-level APIs for large-scale [offline batch inference](https://docs.ray.io/en/latest/data/working-with-llms.html) and [online serving](https://docs.ray.io/en/latest/serve/llm) that can leverage vLLM as the engine. These APIs add production-grade fault tolerance, scaling, and distributed observability to vLLM workloads. Ray also offers high-level APIs for large-scale [offline batch inference](https://docs.ray.io/en/latest/data/working-with-llms.html) and [online serving](https://docs.ray.io/en/latest/serve/llm) that can leverage vLLM as the engine. These APIs add production-grade fault tolerance, scaling, and distributed observability to vLLM workloads.
Ray is an optional dependency. Install it explicitly before using Ray-based execution, for example:
```bash
pip install "ray[cgraph]"
```
For details, see the [Ray documentation](https://docs.ray.io/en/latest/index.html). For details, see the [Ray documentation](https://docs.ray.io/en/latest/index.html).
### Ray cluster setup with containers ### Ray cluster setup with containers
......
...@@ -4,7 +4,6 @@ ...@@ -4,7 +4,6 @@
numba == 0.61.2 # Required for N-gram speculative decoding numba == 0.61.2 # Required for N-gram speculative decoding
# Dependencies for NVIDIA GPUs # Dependencies for NVIDIA GPUs
ray[cgraph]>=2.48.0
torch==2.10.0 torch==2.10.0
torchaudio==2.10.0 torchaudio==2.10.0
# These must be updated alongside torch # These must be updated alongside torch
......
...@@ -10,7 +10,6 @@ numba == 0.61.2 # Required for N-gram speculative decoding ...@@ -10,7 +10,6 @@ numba == 0.61.2 # Required for N-gram speculative decoding
# Dependencies for AMD GPUs # Dependencies for AMD GPUs
datasets datasets
ray[cgraph]>=2.48.0
peft peft
pytest-asyncio pytest-asyncio
tensorizer==2.10.1 tensorizer==2.10.1
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment