@@ -13,7 +13,7 @@ The OpenAI batch file format consists of a series of json objects on new lines.
...
@@ -13,7 +13,7 @@ The OpenAI batch file format consists of a series of json objects on new lines.
Each line represents a separate request. See the [OpenAI package reference](https://platform.openai.com/docs/api-reference/batch/requestInput) for more details.
Each line represents a separate request. See the [OpenAI package reference](https://platform.openai.com/docs/api-reference/batch/requestInput) for more details.
```{note}
```{note}
We currently only support `/v1/chat/completions` and `/v1/embeddings` endpoints (completions coming soon).
We currently support `/v1/chat/completions`, `/v1/embeddings`, and `/v1/score` endpoints (completions coming soon).
Add score requests to your batch file. The following is an example:
```
{"custom_id": "request-1", "method": "POST", "url": "/v1/score", "body": {"model": "BAAI/bge-reranker-v2-m3", "text_1": "What is the capital of France?", "text_2": ["The capital of Brazil is Brasilia.", "The capital of France is Paris."]}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/score", "body": {"model": "BAAI/bge-reranker-v2-m3", "text_1": "What is the capital of France?", "text_2": ["The capital of Brazil is Brasilia.", "The capital of France is Paris."]}}
```
You can mix chat completion, embedding, and score requests in the batch file, as long as the model you are using supports them all (note that all requests must use the same model).
### Step 2: Run the batch
You can run the batch using the same command as in earlier examples.
### Step 3: Check your results
You can check your results by running `cat results.jsonl`
INPUT_SCORE_BATCH="""{"custom_id": "request-1", "method": "POST", "url": "/v1/score", "body": {"model": "BAAI/bge-reranker-v2-m3", "text_1": "What is the capital of France?", "text_2": ["The capital of Brazil is Brasilia.", "The capital of France is Paris."]}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/score", "body": {"model": "BAAI/bge-reranker-v2-m3", "text_1": "What is the capital of France?", "text_2": ["The capital of Brazil is Brasilia.", "The capital of France is Paris."]}}"""
deftest_empty_file():
deftest_empty_file():
withtempfile.NamedTemporaryFile(
withtempfile.NamedTemporaryFile(
...
@@ -102,3 +106,36 @@ def test_embeddings():
...
@@ -102,3 +106,36 @@ def test_embeddings():
# Ensure that the output format conforms to the openai api.
# Ensure that the output format conforms to the openai api.
# Validation should throw if the schema is wrong.
# Validation should throw if the schema is wrong.
BatchRequestOutput.model_validate_json(line)
BatchRequestOutput.model_validate_json(line)
deftest_score():
withtempfile.NamedTemporaryFile(
"w")asinput_file,tempfile.NamedTemporaryFile(
"r")asoutput_file:
input_file.write(INPUT_SCORE_BATCH)
input_file.flush()
proc=subprocess.Popen([
sys.executable,
"-m",
"vllm.entrypoints.openai.run_batch",
"-i",
input_file.name,
"-o",
output_file.name,
"--model",
"BAAI/bge-reranker-v2-m3",
],)
proc.communicate()
proc.wait()
assertproc.returncode==0,f"{proc=}"
contents=output_file.read()
forlineincontents.strip().split("\n"):
# Ensure that the output format conforms to the openai api.