Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
e60d422f
Unverified
Commit
e60d422f
authored
Jul 07, 2025
by
Ricardo Decal
Committed by
GitHub
Jul 07, 2025
Browse files
[Docs] Improve docstring for ray data llm example (#20597)
Signed-off-by:
Ricardo Decal
<
rdecal@anyscale.com
>
parent
0d914c81
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
11 additions
and
9 deletions
+11
-9
examples/offline_inference/batch_llm_inference.py
examples/offline_inference/batch_llm_inference.py
+11
-9
No files found.
examples/offline_inference/batch_llm_inference.py
View file @
e60d422f
...
@@ -3,17 +3,19 @@
...
@@ -3,17 +3,19 @@
"""
"""
This example shows how to use Ray Data for data parallel batch inference.
This example shows how to use Ray Data for data parallel batch inference.
Ray Data is a data processing framework that can handle large datasets
Ray Data is a data processing framework that can process very large datasets
and integrates tightly with vLLM for data-parallel inference.
with first-class support for vLLM.
As of Ray 2.44, Ray Data has a native integration with
vLLM (under ray.data.llm).
Ray Data provides functionality for:
Ray Data provides functionality for:
* Reading and writing to cloud storage (S3, GCS, etc.)
* Reading and writing to most popular file formats and cloud object storage.
* Automatic sharding and load-balancing across a cluster
* Streaming execution, so you can run inference on datasets that far exceed
* Optimized configuration of vLLM using continuous batching
the aggregate RAM of the cluster.
* Compatible with tensor/pipeline parallel inference as well.
* Scale up the workload without code changes.
* Automatic sharding, load-balancing, and autoscaling across a Ray cluster,
with built-in fault-tolerance and retry semantics.
* Continuous batching that keeps vLLM replicas saturated and maximizes GPU
utilization.
* Compatible with tensor/pipeline parallel inference.
Learn more about Ray Data's LLM integration:
Learn more about Ray Data's LLM integration:
https://docs.ray.io/en/latest/data/working-with-llms.html
https://docs.ray.io/en/latest/data/working-with-llms.html
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment