[Doc]: fix 2 hyperlinks leading to Ray site after they changed Ray's doc structure (#24438)

Signed-off-by: Didier Durand <durand.didier@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

[Doc]: fix 2 hyperlinks leading to Ray site after they changed Ray's doc structure (#24438)
Signed-off-by: Didier Durand <durand.didier@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
55be93ba · Didier Durand · GitHub · 717fc00e · 55be93ba
Unverified Commit 55be93ba authored Sep 08, 2025 by Didier Durand Committed by GitHub Sep 08, 2025
Show whitespace changes
Inline Side-by-side

Showing with 2 additions and 2 deletions

docs/serving/parallelism_scaling.md docs/serving/parallelism_scaling.md +2 -2

No files found.
--- a/docs/serving/parallelism_scaling.md
+++ b/docs/serving/parallelism_scaling.md
@@ -66,7 +66,7 @@ Ray is a distributed computing framework for scaling Python programs. Multi-node
 vLLM uses Ray to manage the distributed execution of tasks across multiple nodes and control where execution happens.
-Ray also offers high-level APIs for large-scale [offline batch inference](https://docs.ray.io/en/latest/data/working-with-llms.html) and [online serving](https://docs.ray.io/en/latest/serve/llm/serving-llms.html) that can leverage vLLM as the engine. These APIs add production-grade fault tolerance, scaling, and distributed observability to vLLM workloads.
+Ray also offers high-level APIs for large-scale [offline batch inference](https://docs.ray.io/en/latest/data/working-with-llms.html) and [online serving](https://docs.ray.io/en/latest/serve/llm) that can leverage vLLM as the engine. These APIs add production-grade fault tolerance, scaling, and distributed observability to vLLM workloads.
 For details, see the [Ray documentation](https://docs.ray.io/en/latest/index.html).
@@ -104,7 +104,7 @@ Note that `VLLM_HOST_IP` is unique for each worker. Keep the shells running thes
 From any node, enter a container and run `ray status` and `ray list nodes` to verify that Ray finds the expected number of nodes and GPUs.
 !!! tip
-    Alternatively, set up the Ray cluster using KubeRay. For more information, see [KubeRay vLLM documentation](https://docs.ray.io/en/latest/cluster/kubernetes/examples/vllm-rayservice.html).
+    Alternatively, set up the Ray cluster using KubeRay. For more information, see [KubeRay vLLM documentation](https://docs.ray.io/en/latest/cluster/kubernetes/examples/rayserve-llm-example.html).
 ### Running vLLM on a Ray cluster