Make distinct `code` and `console` admonitions so readers are less likely to miss them (#20585)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

Make distinct `code` and `console` admonitions so readers are less likely to miss them (#20585)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
af107d5a · Harry Mellor · GitHub · 31c5d0a1 · af107d5a · af107d5a
Unverified Commit af107d5a authored Jul 08, 2025 by Harry Mellor Committed by GitHub Jul 07, 2025
12 changed files
--- a/docs/getting_started/installation/cpu.md
+++ b/docs/getting_started/installation/cpu.md
@@ -76,7 +76,7 @@ Currently, there are no pre-built CPU wheels.

 ### Build image from source

-??? Commands
+??? console "Commands"

    ```bash
    docker build -f docker/Dockerfile.cpu \
@@ -149,7 +149,7 @@ vllm serve facebook/opt-125m

 - If using vLLM CPU backend on a machine with hyper-threading, it is recommended to bind only one OpenMP thread on each physical CPU core using `VLLM_CPU_OMP_THREADS_BIND` or using auto thread binding feature by default. On a hyper-threading enabled platform with 16 logical CPU cores / 8 physical CPU cores:

-??? Commands
+??? console "Commands"

    ```console
    $ lscpu -e # check the mapping between logical CPU cores and physical CPU cores

--- a/docs/getting_started/installation/gpu/rocm.inc.md
+++ b/docs/getting_started/installation/gpu/rocm.inc.md
@@ -95,7 +95,7 @@ Currently, there are no pre-built ROCm wheels.

 4. Build vLLM. For example, vLLM on ROCM 6.3 can be built with the following steps:

-    ??? Commands
+    ??? console "Commands"

        ```bash
        pip install --upgrade pip
@@ -206,7 +206,7 @@ DOCKER_BUILDKIT=1 docker build \

 To run the above docker image `vllm-rocm`, use the below command:

-??? Command
+??? console "Command"

    ```bash
    docker run -it \

--- a/docs/getting_started/installation/intel_gaudi.md
+++ b/docs/getting_started/installation/intel_gaudi.md
@@ -237,7 +237,7 @@ As an example, if a request of 3 sequences, with max sequence length of 412 come

 Warmup is an optional, but highly recommended step occurring before vLLM server starts listening. It executes a forward pass for each bucket with dummy data. The goal is to pre-compile all graphs and not incur any graph compilation overheads within bucket boundaries during server runtime. Each warmup step is logged during vLLM startup:

-??? Logs
+??? console "Logs"

    ```text
    INFO 08-01 22:26:47 hpu_model_runner.py:1066] [Warmup][Prompt][1/24] batch_size:4 seq_len:1024 free_mem:79.16 GiB
@@ -286,7 +286,7 @@ When there's large amount of requests pending, vLLM scheduler will attempt to fi

 Each described step is logged by vLLM server, as follows (negative values correspond to memory being released):

-??? Logs
+??? console "Logs"

    ```text
    INFO 08-02 17:37:44 hpu_model_runner.py:493] Prompt bucket config (min, step, max_warmup) bs:[1, 32, 4], seq:[128, 128, 1024]

--- a/docs/getting_started/quickstart.md
+++ b/docs/getting_started/quickstart.md
@@ -147,7 +147,7 @@ curl http://localhost:8000/v1/completions \

 Since this server is compatible with OpenAI API, you can use it as a drop-in replacement for any applications using OpenAI API. For example, another way to query the server is via the `openai` Python package:

-??? Code
+??? code

    ```python
    from openai import OpenAI
@@ -186,7 +186,7 @@ curl http://localhost:8000/v1/chat/completions \

 Alternatively, you can use the `openai` Python package:

-??? Code
+??? code

    ```python
    from openai import OpenAI

--- a/docs/mkdocs/stylesheets/extra.css
+++ b/docs/mkdocs/stylesheets/extra.css
@@ -39,6 +39,8 @@ body[data-md-color-scheme="slate"] .md-nav__item--section > label.md-nav__link .
 :root {
  --md-admonition-icon--announcement: url('data:image/svg+xml;charset=utf-8,<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16" width="16" height="16"><path d="M3.25 9a.75.75 0 0 1 .75.75c0 2.142.456 3.828.733 4.653a.122.122 0 0 0 .05.064.212.212 0 0 0 .117.033h1.31c.085 0 .18-.042.258-.152a.45.45 0 0 0 .075-.366A16.743 16.743 0 0 1 6 9.75a.75.75 0 0 1 1.5 0c0 1.588.25 2.926.494 3.85.293 1.113-.504 2.4-1.783 2.4H4.9c-.686 0-1.35-.41-1.589-1.12A16.4 16.4 0 0 1 2.5 9.75.75.75 0 0 1 3.25 9Z"></path><path d="M0 6a4 4 0 0 1 4-4h2.75a.75.75 0 0 1 .75.75v6.5a.75.75 0 0 1-.75.75H4a4 4 0 0 1-4-4Zm4-2.5a2.5 2.5 0 1 0 0 5h2v-5Z"></path><path d="M15.59.082A.75.75 0 0 1 16 .75v10.5a.75.75 0 0 1-1.189.608l-.002-.001h.001l-.014-.01a5.775 5.775 0 0 0-.422-.25 10.63 10.63 0 0 0-1.469-.64C11.576 10.484 9.536 10 6.75 10a.75.75 0 0 1 0-1.5c2.964 0 5.174.516 6.658 1.043.423.151.787.302 1.092.443V2.014c-.305.14-.669.292-1.092.443C11.924 2.984 9.713 3.5 6.75 3.5a.75.75 0 0 1 0-1.5c2.786 0 4.826-.484 6.155-.957.665-.236 1.154-.47 1.47-.64.144-.077.284-.161.421-.25l.014-.01a.75.75 0 0 1 .78-.061Z"></path></svg>');
  --md-admonition-icon--important: url('data:image/svg+xml;charset=utf-8,<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16" width="16" height="16"><path d="M4.47.22A.749.749 0 0 1 5 0h6c.199 0 .389.079.53.22l4.25 4.25c.141.14.22.331.22.53v6a.749.749 0 0 1-.22.53l-4.25 4.25A.749.749 0 0 1 11 16H5a.749.749 0 0 1-.53-.22L.22 11.53A.749.749 0 0 1 0 11V5c0-.199.079-.389.22-.53Zm.84 1.28L1.5 5.31v5.38l3.81 3.81h5.38l3.81-3.81V5.31L10.69 1.5ZM8 4a.75.75 0 0 1 .75.75v3.5a.75.75 0 0 1-1.5 0v-3.5A.75.75 0 0 1 8 4Zm0 8a1 1 0 1 1 0-2 1 1 0 0 1 0 2Z"></path></svg>');
+  --md-admonition-icon--code: url('data:image/svg+xml;charset=utf-8,<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16"><path d="m11.28 3.22 4.25 4.25a.75.75 0 0 1 0 1.06l-4.25 4.25a.749.749 0 0 1-1.275-.326.75.75 0 0 1 .215-.734L13.94 8l-3.72-3.72a.749.749 0 0 1 .326-1.275.75.75 0 0 1 .734.215m-6.56 0a.75.75 0 0 1 1.042.018.75.75 0 0 1 .018 1.042L2.06 8l3.72 3.72a.749.749 0 0 1-.326 1.275.75.75 0 0 1-.734-.215L.47 8.53a.75.75 0 0 1 0-1.06Z"/></svg>');
+  --md-admonition-icon--console: url('data:image/svg+xml;charset=utf-8,<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16"><path d="M0 2.75C0 1.784.784 1 1.75 1h12.5c.966 0 1.75.784 1.75 1.75v10.5A1.75 1.75 0 0 1 14.25 15H1.75A1.75 1.75 0 0 1 0 13.25Zm1.75-.25a.25.25 0 0 0-.25.25v10.5c0 .138.112.25.25.25h12.5a.25.25 0 0 0 .25-.25V2.75a.25.25 0 0 0-.25-.25ZM7.25 8a.75.75 0 0 1-.22.53l-2.25 2.25a.749.749 0 0 1-1.275-.326.75.75 0 0 1 .215-.734L5.44 8 3.72 6.28a.749.749 0 0 1 .326-1.275.75.75 0 0 1 .734.215l2.25 2.25c.141.14.22.331.22.53m1.5 1.5h3a.75.75 0 0 1 0 1.5h-3a.75.75 0 0 1 0-1.5"/></svg>');
 }

 .md-typeset .admonition.announcement,
@@ -49,6 +51,14 @@ body[data-md-color-scheme="slate"] .md-nav__item--section > label.md-nav__link .
 .md-typeset details.important {
  border-color: rgb(239, 85, 82);
 }
+.md-typeset .admonition.code,
+.md-typeset details.code {
+  border-color: #64dd17
+}
+.md-typeset .admonition.console,
+.md-typeset details.console {
+  border-color: #64dd17
+}

 .md-typeset .announcement > .admonition-title,
 .md-typeset .announcement > summary {
@@ -58,6 +68,14 @@ body[data-md-color-scheme="slate"] .md-nav__item--section > label.md-nav__link .
 .md-typeset .important > summary {
  background-color: rgb(239, 85, 82, 0.1);
 }
+.md-typeset .code > .admonition-title,
+.md-typeset .code > summary {
+  background-color: #64dd171a;
+}
+.md-typeset .console > .admonition-title,
+.md-typeset .console > summary {
+  background-color: #64dd171a;
+}

 .md-typeset .announcement > .admonition-title::before,
 .md-typeset .announcement > summary::before {
@@ -71,6 +89,18 @@ body[data-md-color-scheme="slate"] .md-nav__item--section > label.md-nav__link .
  -webkit-mask-image: var(--md-admonition-icon--important);
          mask-image: var(--md-admonition-icon--important);
 }
+.md-typeset .code > .admonition-title::before,
+.md-typeset .code > summary::before {
+  background-color: #64dd17;
+  -webkit-mask-image: var(--md-admonition-icon--code);
+          mask-image: var(--md-admonition-icon--code);
+}
+.md-typeset .console > .admonition-title::before,
+.md-typeset .console > summary::before {
+  background-color: #64dd17;
+  -webkit-mask-image: var(--md-admonition-icon--console);
+          mask-image: var(--md-admonition-icon--console);
+}

 /* Make label fully visible on hover */
 .md-content__button[href*="edit"]:hover::after {

--- a/docs/models/generative_models.md
+++ b/docs/models/generative_models.md
@@ -85,7 +85,7 @@ and automatically applies the model's [chat template](https://huggingface.co/doc
    In general, only instruction-tuned models have a chat template.
    Base models may perform poorly as they are not trained to respond to the chat conversation.

-??? Code
+??? code

    ```python
    from vllm import LLM

--- a/docs/models/supported_models.md
+++ b/docs/models/supported_models.md
@@ -642,7 +642,7 @@ Specified using `--task generate`.

    For the best results, we recommend using the following dependency versions (tested on A10 and L40):

-    ??? Dependency versions
+    ??? code "Dependency versions"

        ```text
        # Core vLLM-compatible dependencies with Molmo accuracy setup (tested on L40)

--- a/docs/serving/integrations/langchain.md
+++ b/docs/serving/integrations/langchain.md
@@ -13,7 +13,7 @@ pip install langchain langchain_community -q

 To run inference on a single or multiple GPUs, use `VLLM` class from `langchain`.

-??? Code
+??? code

    ```python
    from langchain_community.llms import VLLM

--- a/docs/serving/openai_compatible_server.md
+++ b/docs/serving/openai_compatible_server.md
@@ -15,7 +15,7 @@ vllm serve NousResearch/Meta-Llama-3-8B-Instruct \

 To call the server, in your preferred text editor, create a script that uses an HTTP client. Include any messages that you want to send to the model. Then run that script. Below is an example script using the [official OpenAI Python client](https://github.com/openai/openai-python).

-??? Code
+??? code

    ```python
    from openai import OpenAI
@@ -146,7 +146,7 @@ completion = client.chat.completions.create(
 Only `X-Request-Id` HTTP request header is supported for now. It can be enabled
 with `--enable-request-id-headers`.

-??? Code
+??? code

    ```python
    completion = client.chat.completions.create(
@@ -185,7 +185,7 @@ Code example: <gh-file:examples/online_serving/openai_completion_client.py>

 The following [sampling parameters][sampling-params] are supported.

-??? Code
+??? code

    ```python
    --8<-- "vllm/entrypoints/openai/protocol.py:completion-sampling-params"
@@ -193,7 +193,7 @@ The following [sampling parameters][sampling-params] are supported.

 The following extra parameters are supported:

-??? Code
+??? code

    ```python
    --8<-- "vllm/entrypoints/openai/protocol.py:completion-extra-params"
@@ -217,7 +217,7 @@ Code example: <gh-file:examples/online_serving/openai_chat_completion_client.py>

 The following [sampling parameters][sampling-params] are supported.

-??? Code
+??? code

    ```python
    --8<-- "vllm/entrypoints/openai/protocol.py:chat-completion-sampling-params"
@@ -225,7 +225,7 @@ The following [sampling parameters][sampling-params] are supported.

 The following extra parameters are supported:

-??? Code
+??? code

    ```python
    --8<-- "vllm/entrypoints/openai/protocol.py:chat-completion-extra-params"
@@ -268,7 +268,7 @@ and passing a list of `messages` in the request. Refer to the examples below for

    Since the request schema is not defined by OpenAI client, we post a request to the server using the lower-level `requests` library:

-    ??? Code
+    ??? code

        ```python
        import requests
@@ -327,7 +327,7 @@ The following [pooling parameters][pooling-params] are supported.

 The following extra parameters are supported by default:

-??? Code
+??? code

    ```python
    --8<-- "vllm/entrypoints/openai/protocol.py:embedding-extra-params"
@@ -335,7 +335,7 @@ The following extra parameters are supported by default:

 For chat-like input (i.e. if `messages` is passed), these extra parameters are supported instead:

-??? Code
+??? code

    ```python
    --8<-- "vllm/entrypoints/openai/protocol.py:chat-embedding-extra-params"
@@ -358,7 +358,7 @@ Code example: <gh-file:examples/online_serving/openai_transcription_client.py>

 The following [sampling parameters][sampling-params] are supported.

-??? Code
+??? code

    ```python
    --8<-- "vllm/entrypoints/openai/protocol.py:transcription-sampling-params"
@@ -366,7 +366,7 @@ The following [sampling parameters][sampling-params] are supported.

 The following extra parameters are supported:

-??? Code
+??? code

    ```python
    --8<-- "vllm/entrypoints/openai/protocol.py:transcription-extra-params"
@@ -446,7 +446,7 @@ curl -v "http://127.0.0.1:8000/classify" \
  }'
 ```

-??? Response
+??? console "Response"

    ```bash
    {
@@ -494,7 +494,7 @@ curl -v "http://127.0.0.1:8000/classify" \
  }'
 ```

-??? Response
+??? console "Response"

    ```bash
    {
@@ -564,7 +564,7 @@ curl -X 'POST' \
 }'
 ```

-??? Response
+??? console "Response"

    ```bash
    {
@@ -589,7 +589,7 @@ You can pass a string to `text_1` and a list to `text_2`, forming multiple sente
 where each pair is built from `text_1` and a string in `text_2`.
 The total number of pairs is `len(text_2)`.

-??? Request
+??? console "Request"

    ```bash
    curl -X 'POST' \
@@ -606,7 +606,7 @@ The total number of pairs is `len(text_2)`.
    }'
    ```

-??? Response
+??? console "Response"

    ```bash
    {
@@ -634,7 +634,7 @@ You can pass a list to both `text_1` and `text_2`, forming multiple sentence pai
 where each pair is built from a string in `text_1` and the corresponding string in `text_2` (similar to `zip()`).
 The total number of pairs is `len(text_2)`.

-??? Request
+??? console "Request"

    ```bash
    curl -X 'POST' \
@@ -655,7 +655,7 @@ The total number of pairs is `len(text_2)`.
    }'
    ```

-??? Response
+??? console "Response"

    ```bash
    {
@@ -716,7 +716,7 @@ Code example: <gh-file:examples/online_serving/jinaai_rerank_client.py>
 Note that the `top_n` request parameter is optional and will default to the length of the `documents` field.
 Result documents will be sorted by relevance, and the `index` property can be used to determine original order.

-??? Request
+??? console "Request"

    ```bash
    curl -X 'POST' \
@@ -734,7 +734,7 @@ Result documents will be sorted by relevance, and the `index` property can be us
    }'
    ```

-??? Response
+??? console "Response"

    ```bash
    {

--- a/docs/usage/metrics.md
+++ b/docs/usage/metrics.md
@@ -12,7 +12,7 @@ vllm serve unsloth/Llama-3.2-1B-Instruct

 Then query the endpoint to get the latest metrics from the server:

-??? Output
+??? console "Output"

    ```console
    $ curl http://0.0.0.0:8000/metrics
@@ -33,7 +33,7 @@ Then query the endpoint to get the latest metrics from the server:

 The following metrics are exposed:

-??? Code
+??? code

    ```python
    --8<-- "vllm/engine/metrics.py:metrics-definitions"

--- a/docs/usage/troubleshooting.md
+++ b/docs/usage/troubleshooting.md
@@ -60,7 +60,7 @@ To identify the particular CUDA operation that causes the error, you can add `--

 If GPU/CPU communication cannot be established, you can use the following Python script and follow the instructions below to confirm whether the GPU/CPU communication is working correctly.

-??? Code
+??? code

    ```python
    # Test PyTorch NCCL
@@ -170,7 +170,7 @@ WARNING 12-11 14:50:37 multiproc_worker_utils.py:281] CUDA was previously

 or an error from Python that looks like this:

-??? Logs
+??? console "Logs"

    ```console
    RuntimeError:
@@ -214,7 +214,7 @@ if __name__ == '__main__':

 vLLM heavily depends on `torch.compile` to optimize the model for better performance, which introduces the dependency on the `torch.compile` functionality and the `triton` library. By default, we use `torch.compile` to [optimize some functions](gh-pr:10406) in the model. Before running vLLM, you can check if `torch.compile` is working as expected by running the following script:

-??? Code
+??? code

    ```python
    import torch

--- a/docs/usage/usage_stats.md
+++ b/docs/usage/usage_stats.md
@@ -10,7 +10,7 @@ The list of data collected by the latest version of vLLM can be found here: <gh-

 Here is an example as of v0.4.0:

-??? Output
+??? console "Output"

    ```json
    {