[Doc] Rename offline inference examples (#11927)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

[Doc] Rename offline inference examples (#11927)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
482cdc49 · Harry Mellor · GitHub · 20410b2f · 482cdc49 · 482cdc49
Unverified Commit 482cdc49 authored Jan 10, 2025 by Harry Mellor Committed by GitHub Jan 10, 2025
20 changed files
--- a/examples/offline_inference/offline_chat_with_tools.py
+++ b/examples/offline_inference/offline_chat_with_tools.py
--- a/examples/offline_inference/offline_inference_classification.py
+++ b/examples/offline_inference/offline_inference_classification.py
--- a/examples/offline_inference/offline_inference_cli.py
+++ b/examples/offline_inference/offline_inference_cli.py
--- a/examples/offline_inference/offline_inference_distributed.py
+++ b/examples/offline_inference/offline_inference_distributed.py
--- a/examples/offline_inference/offline_inference_embedding.py
+++ b/examples/offline_inference/offline_inference_embedding.py
--- a/examples/offline_inference/offline_inference_encoder_decoder.py
+++ b/examples/offline_inference/offline_inference_encoder_decoder.py
--- a/examples/offline_inference/florence2_inference.py
+++ b/examples/offline_inference/florence2_inference.py
@@ -3,7 +3,7 @@ Demonstrate prompting of text-to-text
 encoder/decoder models, specifically Florence-2
 '''
 # TODO(Isotr0py):
-# Move to offline_inference/offline_inference_vision_language.py
+# Move to offline_inference/vision_language.py
 # after porting vision backbone
 from vllm import LLM, SamplingParams


--- a/examples/offline_inference/offline_inference_mlpspeculator.py
+++ b/examples/offline_inference/offline_inference_mlpspeculator.py
--- a/examples/offline_inference/offline_inference_neuron.py
+++ b/examples/offline_inference/offline_inference_neuron.py
--- a/examples/offline_inference/offline_inference_neuron_int8_quantization.py
+++ b/examples/offline_inference/offline_inference_neuron_int8_quantization.py
--- a/examples/offline_inference/offline_inference_openai/offline_inference_openai.md
+++ b/examples/offline_inference/offline_inference_openai/offline_inference_openai.md
@@ -8,7 +8,7 @@ This is a guide to performing batch inference using the OpenAI batch file format
 
 The OpenAI batch file format consists of a series of json objects on new lines.
 
-[See here for an example file.](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/offline_inference_openai/openai_example_batch.jsonl)
+[See here for an example file.](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/openai/openai_example_batch.jsonl)
 
 Each line represents a separate request. See the [OpenAI package reference](https://platform.openai.com/docs/api-reference/batch/requestInput) for more details.
 
@@ -31,13 +31,13 @@ We currently only support `/v1/chat/completions` and `/v1/embeddings` endpoints
 To follow along with this example, you can download the example batch, or create your own batch file in your working directory.

 ```
-wget https://raw.githubusercontent.com/vllm-project/vllm/main/examples/offline_inference/offline_inference_openai/openai_example_batch.jsonl
+wget https://raw.githubusercontent.com/vllm-project/vllm/main/examples/offline_inference/openai/openai_example_batch.jsonl
 ```

 Once you've created your batch file it should look like this

 ```
-$ cat offline_inference/offline_inference_openai/openai_example_batch.jsonl
+$ cat offline_inference/openai/openai_example_batch.jsonl
 {"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_completion_tokens": 1000}}
 {"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are an unhelpful assistant."},{"role": "user", "content": "Hello world!"}],"max_completion_tokens": 1000}}
 ```
@@ -49,7 +49,7 @@ The batch running tool is designed to be used from the command line.
 You can run the batch with the following command, which will write its results to a file called `results.jsonl`

 ```
-python -m vllm.entrypoints.openai.run_batch -i offline_inference/offline_inference_openai/openai_example_batch.jsonl -o results.jsonl --model meta-llama/Meta-Llama-3-8B-Instruct
+python -m vllm.entrypoints.openai.run_batch -i offline_inference/openai/openai_example_batch.jsonl -o results.jsonl --model meta-llama/Meta-Llama-3-8B-Instruct
 ```

 ### Step 3: Check your results
@@ -66,10 +66,10 @@ $ cat results.jsonl

 The batch runner supports remote input and output urls that are accessible via http/https.

-For example, to run against our example input file located at `https://raw.githubusercontent.com/vllm-project/vllm/main/examples/offline_inference/offline_inference_openai/openai_example_batch.jsonl`, you can run
+For example, to run against our example input file located at `https://raw.githubusercontent.com/vllm-project/vllm/main/examples/offline_inference/openai/openai_example_batch.jsonl`, you can run

 ```
-python -m vllm.entrypoints.openai.run_batch -i https://raw.githubusercontent.com/vllm-project/vllm/main/examples/offline_inference/offline_inference_openai/openai_example_batch.jsonl -o results.jsonl --model meta-llama/Meta-Llama-3-8B-Instruct
+python -m vllm.entrypoints.openai.run_batch -i https://raw.githubusercontent.com/vllm-project/vllm/main/examples/offline_inference/openai/openai_example_batch.jsonl -o results.jsonl --model meta-llama/Meta-Llama-3-8B-Instruct
 ```

 ## Example 3: Integrating with AWS S3
@@ -90,13 +90,13 @@ To integrate with cloud blob storage, we recommend using presigned urls.
 To follow along with this example, you can download the example batch, or create your own batch file in your working directory.

 ```
-wget https://raw.githubusercontent.com/vllm-project/vllm/main/examples/offline_inference/offline_inference_openai/openai_example_batch.jsonl
+wget https://raw.githubusercontent.com/vllm-project/vllm/main/examples/offline_inference/openai/openai_example_batch.jsonl
 ```

 Once you've created your batch file it should look like this

 ```
-$ cat offline_inference/offline_inference_openai/openai_example_batch.jsonl
+$ cat offline_inference/openai/openai_example_batch.jsonl
 {"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_completion_tokens": 1000}}
 {"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are an unhelpful assistant."},{"role": "user", "content": "Hello world!"}],"max_completion_tokens": 1000}}
 ```
@@ -104,7 +104,7 @@ $ cat offline_inference/offline_inference_openai/openai_example_batch.jsonl
 Now upload your batch file to your S3 bucket.

 ```
-aws s3 cp offline_inference/offline_inference_openai/openai_example_batch.jsonl s3://MY_BUCKET/MY_INPUT_FILE.jsonl
+aws s3 cp offline_inference/openai/openai_example_batch.jsonl s3://MY_BUCKET/MY_INPUT_FILE.jsonl
 ```

 ### Step 2: Generate your presigned urls

--- a/examples/offline_inference/offline_inference_openai/openai_example_batch.jsonl
+++ b/examples/offline_inference/offline_inference_openai/openai_example_batch.jsonl
--- a/examples/offline_inference/offline_inference_pixtral.py
+++ b/examples/offline_inference/offline_inference_pixtral.py
--- a/examples/offline_inference/offline_inference_with_prefix.py
+++ b/examples/offline_inference/offline_inference_with_prefix.py
--- a/examples/offline_inference/offline_profile.py
+++ b/examples/offline_inference/offline_profile.py
@@ -363,7 +363,7 @@ Profile a model

    example:
    ```
-    python examples/offline_inference/offline_profile.py \\
+    python examples/offline_inference/profiling.py \\
        --model neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8 --batch-size 4 \\
        --prompt-len 512 --max-num-batched-tokens 8196 --json Llama31-8b-FP8 \\
        --enforce-eager run_num_steps -n 2

--- a/examples/offline_inference/offline_inference_scoring.py
+++ b/examples/offline_inference/offline_inference_scoring.py
--- a/examples/offline_inference/offline_inference_with_profiler.py
+++ b/examples/offline_inference/offline_inference_with_profiler.py
--- a/examples/offline_inference/offline_inference_structured_outputs.py
+++ b/examples/offline_inference/offline_inference_structured_outputs.py
--- a/examples/offline_inference/offline_inference_tpu.py
+++ b/examples/offline_inference/offline_inference_tpu.py
--- a/examples/offline_inference/offline_inference_vision_language.py
+++ b/examples/offline_inference/offline_inference_vision_language.py