"vllm/vscode:/vscode.git/clone" did not exist on "b311efd0bd84faffcb1fe47aaa27ffd8c53688be"
Unverified Commit c4e3b125 authored by Ricardo Decal's avatar Ricardo Decal Committed by GitHub
Browse files

[Docs] Add minimal demo of Ray Data API usage (#21080)


Signed-off-by: default avatarRicardo Decal <rdecal@anyscale.com>
parent 8dfb45ca
...@@ -30,8 +30,31 @@ This API adds several batteries-included capabilities that simplify large-scale, ...@@ -30,8 +30,31 @@ This API adds several batteries-included capabilities that simplify large-scale,
- Automatic sharding, load balancing, and autoscaling distribute work across a Ray cluster with built-in fault tolerance. - Automatic sharding, load balancing, and autoscaling distribute work across a Ray cluster with built-in fault tolerance.
- Continuous batching keeps vLLM replicas saturated and maximizes GPU utilization. - Continuous batching keeps vLLM replicas saturated and maximizes GPU utilization.
- Transparent support for tensor and pipeline parallelism enables efficient multi-GPU inference. - Transparent support for tensor and pipeline parallelism enables efficient multi-GPU inference.
- Reading and writing to most popular file formats and cloud object storage.
The following example shows how to run batched inference with Ray Data and vLLM: - Scaling up the workload without code changes.
<gh-file:examples/offline_inference/batch_llm_inference.py>
??? code
```python
import ray # Requires ray>=2.44.1
from ray.data.llm import vLLMEngineProcessorConfig, build_llm_processor
config = vLLMEngineProcessorConfig(model_source="unsloth/Llama-3.2-1B-Instruct")
processor = build_llm_processor(
config,
preprocess=lambda row: {
"messages": [
{"role": "system", "content": "You are a bot that completes unfinished haikus."},
{"role": "user", "content": row["item"]},
],
"sampling_params": {"temperature": 0.3, "max_tokens": 250},
},
postprocess=lambda row: {"answer": row["generated_text"]},
)
ds = ray.data.from_items(["An old silent pond..."])
ds = processor(ds)
ds.write_parquet("local:///tmp/data/")
```
For more information about the Ray Data LLM API, see the [Ray Data LLM documentation](https://docs.ray.io/en/latest/data/working-with-llms.html). For more information about the Ray Data LLM API, see the [Ray Data LLM documentation](https://docs.ray.io/en/latest/data/working-with-llms.html).
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment