Unverified Commit af107d5a authored by Harry Mellor's avatar Harry Mellor Committed by GitHub
Browse files

Make distinct `code` and `console` admonitions so readers are less likely to miss them (#20585)


Signed-off-by: default avatarHarry Mellor <19981378+hmellor@users.noreply.github.com>
parent 31c5d0a1
...@@ -76,7 +76,7 @@ Currently, there are no pre-built CPU wheels. ...@@ -76,7 +76,7 @@ Currently, there are no pre-built CPU wheels.
### Build image from source ### Build image from source
??? Commands ??? console "Commands"
```bash ```bash
docker build -f docker/Dockerfile.cpu \ docker build -f docker/Dockerfile.cpu \
...@@ -149,7 +149,7 @@ vllm serve facebook/opt-125m ...@@ -149,7 +149,7 @@ vllm serve facebook/opt-125m
- If using vLLM CPU backend on a machine with hyper-threading, it is recommended to bind only one OpenMP thread on each physical CPU core using `VLLM_CPU_OMP_THREADS_BIND` or using auto thread binding feature by default. On a hyper-threading enabled platform with 16 logical CPU cores / 8 physical CPU cores: - If using vLLM CPU backend on a machine with hyper-threading, it is recommended to bind only one OpenMP thread on each physical CPU core using `VLLM_CPU_OMP_THREADS_BIND` or using auto thread binding feature by default. On a hyper-threading enabled platform with 16 logical CPU cores / 8 physical CPU cores:
??? Commands ??? console "Commands"
```console ```console
$ lscpu -e # check the mapping between logical CPU cores and physical CPU cores $ lscpu -e # check the mapping between logical CPU cores and physical CPU cores
......
...@@ -95,7 +95,7 @@ Currently, there are no pre-built ROCm wheels. ...@@ -95,7 +95,7 @@ Currently, there are no pre-built ROCm wheels.
4. Build vLLM. For example, vLLM on ROCM 6.3 can be built with the following steps: 4. Build vLLM. For example, vLLM on ROCM 6.3 can be built with the following steps:
??? Commands ??? console "Commands"
```bash ```bash
pip install --upgrade pip pip install --upgrade pip
...@@ -206,7 +206,7 @@ DOCKER_BUILDKIT=1 docker build \ ...@@ -206,7 +206,7 @@ DOCKER_BUILDKIT=1 docker build \
To run the above docker image `vllm-rocm`, use the below command: To run the above docker image `vllm-rocm`, use the below command:
??? Command ??? console "Command"
```bash ```bash
docker run -it \ docker run -it \
......
...@@ -237,7 +237,7 @@ As an example, if a request of 3 sequences, with max sequence length of 412 come ...@@ -237,7 +237,7 @@ As an example, if a request of 3 sequences, with max sequence length of 412 come
Warmup is an optional, but highly recommended step occurring before vLLM server starts listening. It executes a forward pass for each bucket with dummy data. The goal is to pre-compile all graphs and not incur any graph compilation overheads within bucket boundaries during server runtime. Each warmup step is logged during vLLM startup: Warmup is an optional, but highly recommended step occurring before vLLM server starts listening. It executes a forward pass for each bucket with dummy data. The goal is to pre-compile all graphs and not incur any graph compilation overheads within bucket boundaries during server runtime. Each warmup step is logged during vLLM startup:
??? Logs ??? console "Logs"
```text ```text
INFO 08-01 22:26:47 hpu_model_runner.py:1066] [Warmup][Prompt][1/24] batch_size:4 seq_len:1024 free_mem:79.16 GiB INFO 08-01 22:26:47 hpu_model_runner.py:1066] [Warmup][Prompt][1/24] batch_size:4 seq_len:1024 free_mem:79.16 GiB
...@@ -286,7 +286,7 @@ When there's large amount of requests pending, vLLM scheduler will attempt to fi ...@@ -286,7 +286,7 @@ When there's large amount of requests pending, vLLM scheduler will attempt to fi
Each described step is logged by vLLM server, as follows (negative values correspond to memory being released): Each described step is logged by vLLM server, as follows (negative values correspond to memory being released):
??? Logs ??? console "Logs"
```text ```text
INFO 08-02 17:37:44 hpu_model_runner.py:493] Prompt bucket config (min, step, max_warmup) bs:[1, 32, 4], seq:[128, 128, 1024] INFO 08-02 17:37:44 hpu_model_runner.py:493] Prompt bucket config (min, step, max_warmup) bs:[1, 32, 4], seq:[128, 128, 1024]
......
...@@ -147,7 +147,7 @@ curl http://localhost:8000/v1/completions \ ...@@ -147,7 +147,7 @@ curl http://localhost:8000/v1/completions \
Since this server is compatible with OpenAI API, you can use it as a drop-in replacement for any applications using OpenAI API. For example, another way to query the server is via the `openai` Python package: Since this server is compatible with OpenAI API, you can use it as a drop-in replacement for any applications using OpenAI API. For example, another way to query the server is via the `openai` Python package:
??? Code ??? code
```python ```python
from openai import OpenAI from openai import OpenAI
...@@ -186,7 +186,7 @@ curl http://localhost:8000/v1/chat/completions \ ...@@ -186,7 +186,7 @@ curl http://localhost:8000/v1/chat/completions \
Alternatively, you can use the `openai` Python package: Alternatively, you can use the `openai` Python package:
??? Code ??? code
```python ```python
from openai import OpenAI from openai import OpenAI
......
...@@ -39,6 +39,8 @@ body[data-md-color-scheme="slate"] .md-nav__item--section > label.md-nav__link . ...@@ -39,6 +39,8 @@ body[data-md-color-scheme="slate"] .md-nav__item--section > label.md-nav__link .
:root { :root {
--md-admonition-icon--announcement: url('data:image/svg+xml;charset=utf-8,<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16" width="16" height="16"><path d="M3.25 9a.75.75 0 0 1 .75.75c0 2.142.456 3.828.733 4.653a.122.122 0 0 0 .05.064.212.212 0 0 0 .117.033h1.31c.085 0 .18-.042.258-.152a.45.45 0 0 0 .075-.366A16.743 16.743 0 0 1 6 9.75a.75.75 0 0 1 1.5 0c0 1.588.25 2.926.494 3.85.293 1.113-.504 2.4-1.783 2.4H4.9c-.686 0-1.35-.41-1.589-1.12A16.4 16.4 0 0 1 2.5 9.75.75.75 0 0 1 3.25 9Z"></path><path d="M0 6a4 4 0 0 1 4-4h2.75a.75.75 0 0 1 .75.75v6.5a.75.75 0 0 1-.75.75H4a4 4 0 0 1-4-4Zm4-2.5a2.5 2.5 0 1 0 0 5h2v-5Z"></path><path d="M15.59.082A.75.75 0 0 1 16 .75v10.5a.75.75 0 0 1-1.189.608l-.002-.001h.001l-.014-.01a5.775 5.775 0 0 0-.422-.25 10.63 10.63 0 0 0-1.469-.64C11.576 10.484 9.536 10 6.75 10a.75.75 0 0 1 0-1.5c2.964 0 5.174.516 6.658 1.043.423.151.787.302 1.092.443V2.014c-.305.14-.669.292-1.092.443C11.924 2.984 9.713 3.5 6.75 3.5a.75.75 0 0 1 0-1.5c2.786 0 4.826-.484 6.155-.957.665-.236 1.154-.47 1.47-.64.144-.077.284-.161.421-.25l.014-.01a.75.75 0 0 1 .78-.061Z"></path></svg>'); --md-admonition-icon--announcement: url('data:image/svg+xml;charset=utf-8,<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16" width="16" height="16"><path d="M3.25 9a.75.75 0 0 1 .75.75c0 2.142.456 3.828.733 4.653a.122.122 0 0 0 .05.064.212.212 0 0 0 .117.033h1.31c.085 0 .18-.042.258-.152a.45.45 0 0 0 .075-.366A16.743 16.743 0 0 1 6 9.75a.75.75 0 0 1 1.5 0c0 1.588.25 2.926.494 3.85.293 1.113-.504 2.4-1.783 2.4H4.9c-.686 0-1.35-.41-1.589-1.12A16.4 16.4 0 0 1 2.5 9.75.75.75 0 0 1 3.25 9Z"></path><path d="M0 6a4 4 0 0 1 4-4h2.75a.75.75 0 0 1 .75.75v6.5a.75.75 0 0 1-.75.75H4a4 4 0 0 1-4-4Zm4-2.5a2.5 2.5 0 1 0 0 5h2v-5Z"></path><path d="M15.59.082A.75.75 0 0 1 16 .75v10.5a.75.75 0 0 1-1.189.608l-.002-.001h.001l-.014-.01a5.775 5.775 0 0 0-.422-.25 10.63 10.63 0 0 0-1.469-.64C11.576 10.484 9.536 10 6.75 10a.75.75 0 0 1 0-1.5c2.964 0 5.174.516 6.658 1.043.423.151.787.302 1.092.443V2.014c-.305.14-.669.292-1.092.443C11.924 2.984 9.713 3.5 6.75 3.5a.75.75 0 0 1 0-1.5c2.786 0 4.826-.484 6.155-.957.665-.236 1.154-.47 1.47-.64.144-.077.284-.161.421-.25l.014-.01a.75.75 0 0 1 .78-.061Z"></path></svg>');
--md-admonition-icon--important: url('data:image/svg+xml;charset=utf-8,<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16" width="16" height="16"><path d="M4.47.22A.749.749 0 0 1 5 0h6c.199 0 .389.079.53.22l4.25 4.25c.141.14.22.331.22.53v6a.749.749 0 0 1-.22.53l-4.25 4.25A.749.749 0 0 1 11 16H5a.749.749 0 0 1-.53-.22L.22 11.53A.749.749 0 0 1 0 11V5c0-.199.079-.389.22-.53Zm.84 1.28L1.5 5.31v5.38l3.81 3.81h5.38l3.81-3.81V5.31L10.69 1.5ZM8 4a.75.75 0 0 1 .75.75v3.5a.75.75 0 0 1-1.5 0v-3.5A.75.75 0 0 1 8 4Zm0 8a1 1 0 1 1 0-2 1 1 0 0 1 0 2Z"></path></svg>'); --md-admonition-icon--important: url('data:image/svg+xml;charset=utf-8,<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16" width="16" height="16"><path d="M4.47.22A.749.749 0 0 1 5 0h6c.199 0 .389.079.53.22l4.25 4.25c.141.14.22.331.22.53v6a.749.749 0 0 1-.22.53l-4.25 4.25A.749.749 0 0 1 11 16H5a.749.749 0 0 1-.53-.22L.22 11.53A.749.749 0 0 1 0 11V5c0-.199.079-.389.22-.53Zm.84 1.28L1.5 5.31v5.38l3.81 3.81h5.38l3.81-3.81V5.31L10.69 1.5ZM8 4a.75.75 0 0 1 .75.75v3.5a.75.75 0 0 1-1.5 0v-3.5A.75.75 0 0 1 8 4Zm0 8a1 1 0 1 1 0-2 1 1 0 0 1 0 2Z"></path></svg>');
--md-admonition-icon--code: url('data:image/svg+xml;charset=utf-8,<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16"><path d="m11.28 3.22 4.25 4.25a.75.75 0 0 1 0 1.06l-4.25 4.25a.749.749 0 0 1-1.275-.326.75.75 0 0 1 .215-.734L13.94 8l-3.72-3.72a.749.749 0 0 1 .326-1.275.75.75 0 0 1 .734.215m-6.56 0a.75.75 0 0 1 1.042.018.75.75 0 0 1 .018 1.042L2.06 8l3.72 3.72a.749.749 0 0 1-.326 1.275.75.75 0 0 1-.734-.215L.47 8.53a.75.75 0 0 1 0-1.06Z"/></svg>');
--md-admonition-icon--console: url('data:image/svg+xml;charset=utf-8,<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16"><path d="M0 2.75C0 1.784.784 1 1.75 1h12.5c.966 0 1.75.784 1.75 1.75v10.5A1.75 1.75 0 0 1 14.25 15H1.75A1.75 1.75 0 0 1 0 13.25Zm1.75-.25a.25.25 0 0 0-.25.25v10.5c0 .138.112.25.25.25h12.5a.25.25 0 0 0 .25-.25V2.75a.25.25 0 0 0-.25-.25ZM7.25 8a.75.75 0 0 1-.22.53l-2.25 2.25a.749.749 0 0 1-1.275-.326.75.75 0 0 1 .215-.734L5.44 8 3.72 6.28a.749.749 0 0 1 .326-1.275.75.75 0 0 1 .734.215l2.25 2.25c.141.14.22.331.22.53m1.5 1.5h3a.75.75 0 0 1 0 1.5h-3a.75.75 0 0 1 0-1.5"/></svg>');
} }
.md-typeset .admonition.announcement, .md-typeset .admonition.announcement,
...@@ -49,6 +51,14 @@ body[data-md-color-scheme="slate"] .md-nav__item--section > label.md-nav__link . ...@@ -49,6 +51,14 @@ body[data-md-color-scheme="slate"] .md-nav__item--section > label.md-nav__link .
.md-typeset details.important { .md-typeset details.important {
border-color: rgb(239, 85, 82); border-color: rgb(239, 85, 82);
} }
.md-typeset .admonition.code,
.md-typeset details.code {
border-color: #64dd17
}
.md-typeset .admonition.console,
.md-typeset details.console {
border-color: #64dd17
}
.md-typeset .announcement > .admonition-title, .md-typeset .announcement > .admonition-title,
.md-typeset .announcement > summary { .md-typeset .announcement > summary {
...@@ -58,6 +68,14 @@ body[data-md-color-scheme="slate"] .md-nav__item--section > label.md-nav__link . ...@@ -58,6 +68,14 @@ body[data-md-color-scheme="slate"] .md-nav__item--section > label.md-nav__link .
.md-typeset .important > summary { .md-typeset .important > summary {
background-color: rgb(239, 85, 82, 0.1); background-color: rgb(239, 85, 82, 0.1);
} }
.md-typeset .code > .admonition-title,
.md-typeset .code > summary {
background-color: #64dd171a;
}
.md-typeset .console > .admonition-title,
.md-typeset .console > summary {
background-color: #64dd171a;
}
.md-typeset .announcement > .admonition-title::before, .md-typeset .announcement > .admonition-title::before,
.md-typeset .announcement > summary::before { .md-typeset .announcement > summary::before {
...@@ -71,6 +89,18 @@ body[data-md-color-scheme="slate"] .md-nav__item--section > label.md-nav__link . ...@@ -71,6 +89,18 @@ body[data-md-color-scheme="slate"] .md-nav__item--section > label.md-nav__link .
-webkit-mask-image: var(--md-admonition-icon--important); -webkit-mask-image: var(--md-admonition-icon--important);
mask-image: var(--md-admonition-icon--important); mask-image: var(--md-admonition-icon--important);
} }
.md-typeset .code > .admonition-title::before,
.md-typeset .code > summary::before {
background-color: #64dd17;
-webkit-mask-image: var(--md-admonition-icon--code);
mask-image: var(--md-admonition-icon--code);
}
.md-typeset .console > .admonition-title::before,
.md-typeset .console > summary::before {
background-color: #64dd17;
-webkit-mask-image: var(--md-admonition-icon--console);
mask-image: var(--md-admonition-icon--console);
}
/* Make label fully visible on hover */ /* Make label fully visible on hover */
.md-content__button[href*="edit"]:hover::after { .md-content__button[href*="edit"]:hover::after {
......
...@@ -85,7 +85,7 @@ and automatically applies the model's [chat template](https://huggingface.co/doc ...@@ -85,7 +85,7 @@ and automatically applies the model's [chat template](https://huggingface.co/doc
In general, only instruction-tuned models have a chat template. In general, only instruction-tuned models have a chat template.
Base models may perform poorly as they are not trained to respond to the chat conversation. Base models may perform poorly as they are not trained to respond to the chat conversation.
??? Code ??? code
```python ```python
from vllm import LLM from vllm import LLM
......
...@@ -642,7 +642,7 @@ Specified using `--task generate`. ...@@ -642,7 +642,7 @@ Specified using `--task generate`.
For the best results, we recommend using the following dependency versions (tested on A10 and L40): For the best results, we recommend using the following dependency versions (tested on A10 and L40):
??? Dependency versions ??? code "Dependency versions"
```text ```text
# Core vLLM-compatible dependencies with Molmo accuracy setup (tested on L40) # Core vLLM-compatible dependencies with Molmo accuracy setup (tested on L40)
......
...@@ -13,7 +13,7 @@ pip install langchain langchain_community -q ...@@ -13,7 +13,7 @@ pip install langchain langchain_community -q
To run inference on a single or multiple GPUs, use `VLLM` class from `langchain`. To run inference on a single or multiple GPUs, use `VLLM` class from `langchain`.
??? Code ??? code
```python ```python
from langchain_community.llms import VLLM from langchain_community.llms import VLLM
......
...@@ -15,7 +15,7 @@ vllm serve NousResearch/Meta-Llama-3-8B-Instruct \ ...@@ -15,7 +15,7 @@ vllm serve NousResearch/Meta-Llama-3-8B-Instruct \
To call the server, in your preferred text editor, create a script that uses an HTTP client. Include any messages that you want to send to the model. Then run that script. Below is an example script using the [official OpenAI Python client](https://github.com/openai/openai-python). To call the server, in your preferred text editor, create a script that uses an HTTP client. Include any messages that you want to send to the model. Then run that script. Below is an example script using the [official OpenAI Python client](https://github.com/openai/openai-python).
??? Code ??? code
```python ```python
from openai import OpenAI from openai import OpenAI
...@@ -146,7 +146,7 @@ completion = client.chat.completions.create( ...@@ -146,7 +146,7 @@ completion = client.chat.completions.create(
Only `X-Request-Id` HTTP request header is supported for now. It can be enabled Only `X-Request-Id` HTTP request header is supported for now. It can be enabled
with `--enable-request-id-headers`. with `--enable-request-id-headers`.
??? Code ??? code
```python ```python
completion = client.chat.completions.create( completion = client.chat.completions.create(
...@@ -185,7 +185,7 @@ Code example: <gh-file:examples/online_serving/openai_completion_client.py> ...@@ -185,7 +185,7 @@ Code example: <gh-file:examples/online_serving/openai_completion_client.py>
The following [sampling parameters][sampling-params] are supported. The following [sampling parameters][sampling-params] are supported.
??? Code ??? code
```python ```python
--8<-- "vllm/entrypoints/openai/protocol.py:completion-sampling-params" --8<-- "vllm/entrypoints/openai/protocol.py:completion-sampling-params"
...@@ -193,7 +193,7 @@ The following [sampling parameters][sampling-params] are supported. ...@@ -193,7 +193,7 @@ The following [sampling parameters][sampling-params] are supported.
The following extra parameters are supported: The following extra parameters are supported:
??? Code ??? code
```python ```python
--8<-- "vllm/entrypoints/openai/protocol.py:completion-extra-params" --8<-- "vllm/entrypoints/openai/protocol.py:completion-extra-params"
...@@ -217,7 +217,7 @@ Code example: <gh-file:examples/online_serving/openai_chat_completion_client.py> ...@@ -217,7 +217,7 @@ Code example: <gh-file:examples/online_serving/openai_chat_completion_client.py>
The following [sampling parameters][sampling-params] are supported. The following [sampling parameters][sampling-params] are supported.
??? Code ??? code
```python ```python
--8<-- "vllm/entrypoints/openai/protocol.py:chat-completion-sampling-params" --8<-- "vllm/entrypoints/openai/protocol.py:chat-completion-sampling-params"
...@@ -225,7 +225,7 @@ The following [sampling parameters][sampling-params] are supported. ...@@ -225,7 +225,7 @@ The following [sampling parameters][sampling-params] are supported.
The following extra parameters are supported: The following extra parameters are supported:
??? Code ??? code
```python ```python
--8<-- "vllm/entrypoints/openai/protocol.py:chat-completion-extra-params" --8<-- "vllm/entrypoints/openai/protocol.py:chat-completion-extra-params"
...@@ -268,7 +268,7 @@ and passing a list of `messages` in the request. Refer to the examples below for ...@@ -268,7 +268,7 @@ and passing a list of `messages` in the request. Refer to the examples below for
Since the request schema is not defined by OpenAI client, we post a request to the server using the lower-level `requests` library: Since the request schema is not defined by OpenAI client, we post a request to the server using the lower-level `requests` library:
??? Code ??? code
```python ```python
import requests import requests
...@@ -327,7 +327,7 @@ The following [pooling parameters][pooling-params] are supported. ...@@ -327,7 +327,7 @@ The following [pooling parameters][pooling-params] are supported.
The following extra parameters are supported by default: The following extra parameters are supported by default:
??? Code ??? code
```python ```python
--8<-- "vllm/entrypoints/openai/protocol.py:embedding-extra-params" --8<-- "vllm/entrypoints/openai/protocol.py:embedding-extra-params"
...@@ -335,7 +335,7 @@ The following extra parameters are supported by default: ...@@ -335,7 +335,7 @@ The following extra parameters are supported by default:
For chat-like input (i.e. if `messages` is passed), these extra parameters are supported instead: For chat-like input (i.e. if `messages` is passed), these extra parameters are supported instead:
??? Code ??? code
```python ```python
--8<-- "vllm/entrypoints/openai/protocol.py:chat-embedding-extra-params" --8<-- "vllm/entrypoints/openai/protocol.py:chat-embedding-extra-params"
...@@ -358,7 +358,7 @@ Code example: <gh-file:examples/online_serving/openai_transcription_client.py> ...@@ -358,7 +358,7 @@ Code example: <gh-file:examples/online_serving/openai_transcription_client.py>
The following [sampling parameters][sampling-params] are supported. The following [sampling parameters][sampling-params] are supported.
??? Code ??? code
```python ```python
--8<-- "vllm/entrypoints/openai/protocol.py:transcription-sampling-params" --8<-- "vllm/entrypoints/openai/protocol.py:transcription-sampling-params"
...@@ -366,7 +366,7 @@ The following [sampling parameters][sampling-params] are supported. ...@@ -366,7 +366,7 @@ The following [sampling parameters][sampling-params] are supported.
The following extra parameters are supported: The following extra parameters are supported:
??? Code ??? code
```python ```python
--8<-- "vllm/entrypoints/openai/protocol.py:transcription-extra-params" --8<-- "vllm/entrypoints/openai/protocol.py:transcription-extra-params"
...@@ -446,7 +446,7 @@ curl -v "http://127.0.0.1:8000/classify" \ ...@@ -446,7 +446,7 @@ curl -v "http://127.0.0.1:8000/classify" \
}' }'
``` ```
??? Response ??? console "Response"
```bash ```bash
{ {
...@@ -494,7 +494,7 @@ curl -v "http://127.0.0.1:8000/classify" \ ...@@ -494,7 +494,7 @@ curl -v "http://127.0.0.1:8000/classify" \
}' }'
``` ```
??? Response ??? console "Response"
```bash ```bash
{ {
...@@ -564,7 +564,7 @@ curl -X 'POST' \ ...@@ -564,7 +564,7 @@ curl -X 'POST' \
}' }'
``` ```
??? Response ??? console "Response"
```bash ```bash
{ {
...@@ -589,7 +589,7 @@ You can pass a string to `text_1` and a list to `text_2`, forming multiple sente ...@@ -589,7 +589,7 @@ You can pass a string to `text_1` and a list to `text_2`, forming multiple sente
where each pair is built from `text_1` and a string in `text_2`. where each pair is built from `text_1` and a string in `text_2`.
The total number of pairs is `len(text_2)`. The total number of pairs is `len(text_2)`.
??? Request ??? console "Request"
```bash ```bash
curl -X 'POST' \ curl -X 'POST' \
...@@ -606,7 +606,7 @@ The total number of pairs is `len(text_2)`. ...@@ -606,7 +606,7 @@ The total number of pairs is `len(text_2)`.
}' }'
``` ```
??? Response ??? console "Response"
```bash ```bash
{ {
...@@ -634,7 +634,7 @@ You can pass a list to both `text_1` and `text_2`, forming multiple sentence pai ...@@ -634,7 +634,7 @@ You can pass a list to both `text_1` and `text_2`, forming multiple sentence pai
where each pair is built from a string in `text_1` and the corresponding string in `text_2` (similar to `zip()`). where each pair is built from a string in `text_1` and the corresponding string in `text_2` (similar to `zip()`).
The total number of pairs is `len(text_2)`. The total number of pairs is `len(text_2)`.
??? Request ??? console "Request"
```bash ```bash
curl -X 'POST' \ curl -X 'POST' \
...@@ -655,7 +655,7 @@ The total number of pairs is `len(text_2)`. ...@@ -655,7 +655,7 @@ The total number of pairs is `len(text_2)`.
}' }'
``` ```
??? Response ??? console "Response"
```bash ```bash
{ {
...@@ -716,7 +716,7 @@ Code example: <gh-file:examples/online_serving/jinaai_rerank_client.py> ...@@ -716,7 +716,7 @@ Code example: <gh-file:examples/online_serving/jinaai_rerank_client.py>
Note that the `top_n` request parameter is optional and will default to the length of the `documents` field. Note that the `top_n` request parameter is optional and will default to the length of the `documents` field.
Result documents will be sorted by relevance, and the `index` property can be used to determine original order. Result documents will be sorted by relevance, and the `index` property can be used to determine original order.
??? Request ??? console "Request"
```bash ```bash
curl -X 'POST' \ curl -X 'POST' \
...@@ -734,7 +734,7 @@ Result documents will be sorted by relevance, and the `index` property can be us ...@@ -734,7 +734,7 @@ Result documents will be sorted by relevance, and the `index` property can be us
}' }'
``` ```
??? Response ??? console "Response"
```bash ```bash
{ {
......
...@@ -12,7 +12,7 @@ vllm serve unsloth/Llama-3.2-1B-Instruct ...@@ -12,7 +12,7 @@ vllm serve unsloth/Llama-3.2-1B-Instruct
Then query the endpoint to get the latest metrics from the server: Then query the endpoint to get the latest metrics from the server:
??? Output ??? console "Output"
```console ```console
$ curl http://0.0.0.0:8000/metrics $ curl http://0.0.0.0:8000/metrics
...@@ -33,7 +33,7 @@ Then query the endpoint to get the latest metrics from the server: ...@@ -33,7 +33,7 @@ Then query the endpoint to get the latest metrics from the server:
The following metrics are exposed: The following metrics are exposed:
??? Code ??? code
```python ```python
--8<-- "vllm/engine/metrics.py:metrics-definitions" --8<-- "vllm/engine/metrics.py:metrics-definitions"
......
...@@ -60,7 +60,7 @@ To identify the particular CUDA operation that causes the error, you can add `-- ...@@ -60,7 +60,7 @@ To identify the particular CUDA operation that causes the error, you can add `--
If GPU/CPU communication cannot be established, you can use the following Python script and follow the instructions below to confirm whether the GPU/CPU communication is working correctly. If GPU/CPU communication cannot be established, you can use the following Python script and follow the instructions below to confirm whether the GPU/CPU communication is working correctly.
??? Code ??? code
```python ```python
# Test PyTorch NCCL # Test PyTorch NCCL
...@@ -170,7 +170,7 @@ WARNING 12-11 14:50:37 multiproc_worker_utils.py:281] CUDA was previously ...@@ -170,7 +170,7 @@ WARNING 12-11 14:50:37 multiproc_worker_utils.py:281] CUDA was previously
or an error from Python that looks like this: or an error from Python that looks like this:
??? Logs ??? console "Logs"
```console ```console
RuntimeError: RuntimeError:
...@@ -214,7 +214,7 @@ if __name__ == '__main__': ...@@ -214,7 +214,7 @@ if __name__ == '__main__':
vLLM heavily depends on `torch.compile` to optimize the model for better performance, which introduces the dependency on the `torch.compile` functionality and the `triton` library. By default, we use `torch.compile` to [optimize some functions](gh-pr:10406) in the model. Before running vLLM, you can check if `torch.compile` is working as expected by running the following script: vLLM heavily depends on `torch.compile` to optimize the model for better performance, which introduces the dependency on the `torch.compile` functionality and the `triton` library. By default, we use `torch.compile` to [optimize some functions](gh-pr:10406) in the model. Before running vLLM, you can check if `torch.compile` is working as expected by running the following script:
??? Code ??? code
```python ```python
import torch import torch
......
...@@ -10,7 +10,7 @@ The list of data collected by the latest version of vLLM can be found here: <gh- ...@@ -10,7 +10,7 @@ The list of data collected by the latest version of vLLM can be found here: <gh-
Here is an example as of v0.4.0: Here is an example as of v0.4.0:
??? Output ??? console "Output"
```json ```json
{ {
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment