[Docs] Fix syntax highlighting of shell commands (#19870)

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

[Docs] Fix syntax highlighting of shell commands (#19870)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
c3649e4f · Lukas Geiger · GitHub · 53243e5c · c3649e4f · c3649e4f
Unverified Commit c3649e4f authored Jun 23, 2025 by Lukas Geiger Committed by GitHub Jun 23, 2025
13 changed files
--- a/docs/getting_started/installation/gpu/xpu.inc.md
+++ b/docs/getting_started/installation/gpu/xpu.inc.md
@@ -25,7 +25,7 @@ Currently, there are no pre-built XPU wheels.
 - First, install required driver and Intel OneAPI 2025.0 or later.
 - Second, install Python packages for vLLM XPU backend building:

-```console
+```bash
 git clone https://github.com/vllm-project/vllm.git
 cd vllm
 pip install --upgrade pip
@@ -34,7 +34,7 @@ pip install -v -r requirements/xpu.txt

 - Then, build and install vLLM XPU backend:

-```console
+```bash
 VLLM_TARGET_DEVICE=xpu python setup.py install
 ```

@@ -53,9 +53,9 @@ Currently, there are no pre-built XPU images.
 # --8<-- [end:pre-built-images]
 # --8<-- [start:build-image-from-source]

-```console
-$ docker build -f docker/Dockerfile.xpu -t vllm-xpu-env --shm-size=4g .
-$ docker run -it \
+```bash
+docker build -f docker/Dockerfile.xpu -t vllm-xpu-env --shm-size=4g .
+docker run -it \
             --rm \
             --network=host \
             --device /dev/dri \
@@ -68,7 +68,7 @@ $ docker run -it \

 XPU platform supports **tensor parallel** inference/serving and also supports **pipeline parallel** as a beta feature for online serving. We require Ray as the distributed runtime backend. For example, a reference execution like following:

-```console
+```bash
 python -m vllm.entrypoints.openai.api_server \
     --model=facebook/opt-13b \
     --dtype=bfloat16 \

--- a/docs/getting_started/installation/intel_gaudi.md
+++ b/docs/getting_started/installation/intel_gaudi.md
@@ -24,7 +24,7 @@ please follow the methods outlined in the

 To verify that the Intel Gaudi software was correctly installed, run:

-```console
+```bash
 hl-smi # verify that hl-smi is in your PATH and each Gaudi accelerator is visible
 apt list --installed | grep habana # verify that habanalabs-firmware-tools, habanalabs-graph, habanalabs-rdma-core, habanalabs-thunk and habanalabs-container-runtime are installed
 pip list | grep habana # verify that habana-torch-plugin, habana-torch-dataloader, habana-pyhlml and habana-media-loader are installed
@@ -42,7 +42,7 @@ for more details.

 Use the following commands to run a Docker image:

-```console
+```bash
 docker pull vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
 docker run \
  -it \
@@ -65,7 +65,7 @@ Currently, there are no pre-built Intel Gaudi wheels.

 To build and install vLLM from source, run:

-```console
+```bash
 git clone https://github.com/vllm-project/vllm.git
 cd vllm
 pip install -r requirements/hpu.txt
@@ -74,7 +74,7 @@ python setup.py develop

 Currently, the latest features and performance optimizations are developed in Gaudi's [vLLM-fork](https://github.com/HabanaAI/vllm-fork) and we periodically upstream them to vLLM main repo. To install latest [HabanaAI/vLLM-fork](https://github.com/HabanaAI/vllm-fork), run the following:

-```console
+```bash
 git clone https://github.com/HabanaAI/vllm-fork.git
 cd vllm-fork
 git checkout habana_main
@@ -90,7 +90,7 @@ Currently, there are no pre-built Intel Gaudi images.

 ### Build image from source

-```console
+```bash
 docker build -f docker/Dockerfile.hpu -t vllm-hpu-env  .
 docker run \
  -it \

--- a/docs/getting_started/installation/python_env_setup.inc.md
+++ b/docs/getting_started/installation/python_env_setup.inc.md
 It's recommended to use [uv](https://docs.astral.sh/uv/), a very fast Python environment manager, to create and manage Python environments. Please follow the [documentation](https://docs.astral.sh/uv/#getting-started) to install `uv`. After installing `uv`, you can create a new Python environment and install vLLM using the following commands:

-```console
+```bash
 uv venv --python 3.12 --seed
 source .venv/bin/activate
 ```
--- a/docs/getting_started/quickstart.md
+++ b/docs/getting_started/quickstart.md
@@ -19,7 +19,7 @@ If you are using NVIDIA GPUs, you can install vLLM using [pip](https://pypi.org/

 It's recommended to use [uv](https://docs.astral.sh/uv/), a very fast Python environment manager, to create and manage Python environments. Please follow the [documentation](https://docs.astral.sh/uv/#getting-started) to install `uv`. After installing `uv`, you can create a new Python environment and install vLLM using the following commands:

-```console
+```bash
 uv venv --python 3.12 --seed
 source .venv/bin/activate
 uv pip install vllm --torch-backend=auto
@@ -29,13 +29,13 @@ uv pip install vllm --torch-backend=auto

 Another delightful way is to use `uv run` with `--with [dependency]` option, which allows you to run commands such as `vllm serve` without creating any permanent environment:

-```console
+```bash
 uv run --with vllm vllm --help
 ```

 You can also use [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html) to create and manage Python environments. You can install `uv` to the conda environment through `pip` if you want to manage it within the environment.

-```console
+```bash
 conda create -n myenv python=3.12 -y
 conda activate myenv
 pip install --upgrade uv
@@ -110,7 +110,7 @@ By default, it starts the server at `http://localhost:8000`. You can specify the

 Run the following command to start the vLLM server with the [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) model:

-```console
+```bash
 vllm serve Qwen/Qwen2.5-1.5B-Instruct
 ```

@@ -124,7 +124,7 @@ vllm serve Qwen/Qwen2.5-1.5B-Instruct

 This server can be queried in the same format as OpenAI API. For example, to list the models:

-```console
+```bash
 curl http://localhost:8000/v1/models
 ```

@@ -134,7 +134,7 @@ You can pass in the argument `--api-key` or environment variable `VLLM_API_KEY`

 Once your server is started, you can query the model with input prompts:

-```console
+```bash
 curl http://localhost:8000/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
@@ -172,7 +172,7 @@ vLLM is designed to also support the OpenAI Chat Completions API. The chat inter

 You can use the [create chat completion](https://platform.openai.com/docs/api-reference/chat/completions/create) endpoint to interact with the model:

-```console
+```bash
 curl http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{

--- a/docs/models/extensions/runai_model_streamer.md
+++ b/docs/models/extensions/runai_model_streamer.md
@@ -9,27 +9,27 @@ Further reading can be found in [Run:ai Model Streamer Documentation](https://gi
 vLLM supports loading weights in Safetensors format using the Run:ai Model Streamer.
 You first need to install vLLM RunAI optional dependency:

-```console
+```bash
 pip3 install vllm[runai]
 ```

 To run it as an OpenAI-compatible server, add the `--load-format runai_streamer` flag:

-```console
+```bash
 vllm serve /home/meta-llama/Llama-3.2-3B-Instruct \
    --load-format runai_streamer
 ```

 To run model from AWS S3 object store run:

-```console
+```bash
 vllm serve s3://core-llm/Llama-3-8b \
    --load-format runai_streamer
 ```

 To run model from a S3 compatible object store run:

-```console
+```bash
 RUNAI_STREAMER_S3_USE_VIRTUAL_ADDRESSING=0 \
 AWS_EC2_METADATA_DISABLED=true \
 AWS_ENDPOINT_URL=https://storage.googleapis.com \
@@ -44,7 +44,7 @@ You can tune parameters using `--model-loader-extra-config`:
 You can tune `concurrency` that controls the level of concurrency and number of OS threads reading tensors from the file to the CPU buffer.
 For reading from S3, it will be the number of client instances the host is opening to the S3 server.

-```console
+```bash
 vllm serve /home/meta-llama/Llama-3.2-3B-Instruct \
    --load-format runai_streamer \
    --model-loader-extra-config '{"concurrency":16}'
@@ -53,7 +53,7 @@ vllm serve /home/meta-llama/Llama-3.2-3B-Instruct \
 You can control the size of the CPU Memory buffer to which tensors are read from the file, and limit this size.
 You can read further about CPU buffer memory limiting [here](https://github.com/run-ai/runai-model-streamer/blob/master/docs/src/env-vars.md#runai_streamer_memory_limit).

-```console
+```bash
 vllm serve /home/meta-llama/Llama-3.2-3B-Instruct \
    --load-format runai_streamer \
    --model-loader-extra-config '{"memory_limit":5368709120}'
@@ -66,13 +66,13 @@ vllm serve /home/meta-llama/Llama-3.2-3B-Instruct \

 vLLM also supports loading sharded models using Run:ai Model Streamer. This is particularly useful for large models that are split across multiple files. To use this feature, use the `--load-format runai_streamer_sharded` flag:

-```console
+```bash
 vllm serve /path/to/sharded/model --load-format runai_streamer_sharded
 ```

 The sharded loader expects model files to follow the same naming pattern as the regular sharded state loader: `model-rank-{rank}-part-{part}.safetensors`. You can customize this pattern using the `pattern` parameter in `--model-loader-extra-config`:

-```console
+```bash
 vllm serve /path/to/sharded/model \
    --load-format runai_streamer_sharded \
    --model-loader-extra-config '{"pattern":"custom-model-rank-{rank}-part-{part}.safetensors"}'
@@ -82,7 +82,7 @@ To create sharded model files, you can use the script provided in <gh-file:examp

 The sharded loader supports all the same tunable parameters as the regular Run:ai Model Streamer, including `concurrency` and `memory_limit`. These can be configured in the same way:

-```console
+```bash
 vllm serve /path/to/sharded/model \
    --load-format runai_streamer_sharded \
    --model-loader-extra-config '{"concurrency":16, "memory_limit":5368709120}'

--- a/docs/models/supported_models.md
+++ b/docs/models/supported_models.md
@@ -178,7 +178,7 @@ Alternatively, you can [open an issue on GitHub](https://github.com/vllm-project

 If you prefer, you can use the Hugging Face CLI to [download a model](https://huggingface.co/docs/huggingface_hub/guides/cli#huggingface-cli-download) or specific files from a model repository:

-```console
+```bash
 # Download a model
 huggingface-cli download HuggingFaceH4/zephyr-7b-beta

@@ -193,7 +193,7 @@ huggingface-cli download HuggingFaceH4/zephyr-7b-beta eval_results.json

 Use the Hugging Face CLI to [manage models](https://huggingface.co/docs/huggingface_hub/guides/manage-cache#scan-your-cache) stored in local cache:

-```console
+```bash
 # List cached models
 huggingface-cli scan-cache


--- a/docs/serving/distributed_serving.md
+++ b/docs/serving/distributed_serving.md
@@ -34,15 +34,15 @@ output = llm.generate("San Francisco is a")

 To run multi-GPU serving, pass in the `--tensor-parallel-size` argument when starting the server. For example, to run API server on 4 GPUs:

-```console
- vllm serve facebook/opt-13b \
+```bash
+vllm serve facebook/opt-13b \
     --tensor-parallel-size 4
 ```

 You can also additionally specify `--pipeline-parallel-size` to enable pipeline parallelism. For example, to run API server on 8 GPUs with pipeline parallelism and tensor parallelism:

-```console
- vllm serve gpt2 \
+```bash
+vllm serve gpt2 \
     --tensor-parallel-size 4 \
     --pipeline-parallel-size 2
 ```
@@ -55,7 +55,7 @@ The first step, is to start containers and organize them into a cluster. We have

 Pick a node as the head node, and run the following command:

-```console
+```bash
 bash run_cluster.sh \
                vllm/vllm-openai \
                ip_of_head_node \
@@ -66,7 +66,7 @@ bash run_cluster.sh \

 On the rest of the worker nodes, run the following command:

-```console
+```bash
 bash run_cluster.sh \
                vllm/vllm-openai \
                ip_of_head_node \
@@ -87,7 +87,7 @@ Then, on any node, use `docker exec -it node /bin/bash` to enter the container,

 After that, on any node, use `docker exec -it node /bin/bash` to enter the container again. **In the container**, you can use vLLM as usual, just as you have all the GPUs on one node: vLLM will be able to leverage GPU resources of all nodes in the Ray cluster, and therefore, only run the `vllm` command on this node but not other nodes. The common practice is to set the tensor parallel size to the number of GPUs in each node, and the pipeline parallel size to the number of nodes. For example, if you have 16 GPUs in 2 nodes (8 GPUs per node), you can set the tensor parallel size to 8 and the pipeline parallel size to 2:

-```console
+```bash
 vllm serve /path/to/the/model/in/the/container \
     --tensor-parallel-size 8 \
     --pipeline-parallel-size 2
@@ -95,7 +95,7 @@ After that, on any node, use `docker exec -it node /bin/bash` to enter the conta

 You can also use tensor parallel without pipeline parallel, just set the tensor parallel size to the number of GPUs in the cluster. For example, if you have 16 GPUs in 2 nodes (8 GPUs per node), you can set the tensor parallel size to 16:

-```console
+```bash
 vllm serve /path/to/the/model/in/the/container \
     --tensor-parallel-size 16
 ```

--- a/docs/serving/integrations/langchain.md
+++ b/docs/serving/integrations/langchain.md
@@ -7,7 +7,7 @@ vLLM is also available via [LangChain](https://github.com/langchain-ai/langchain

 To install LangChain, run

-```console
+```bash
 pip install langchain langchain_community -q
 ```


--- a/docs/serving/integrations/llamaindex.md
+++ b/docs/serving/integrations/llamaindex.md
@@ -7,7 +7,7 @@ vLLM is also available via [LlamaIndex](https://github.com/run-llama/llama_index

 To install LlamaIndex, run

-```console
+```bash
 pip install llama-index-llms-vllm -q
 ```


--- a/docs/usage/metrics.md
+++ b/docs/usage/metrics.md
@@ -6,7 +6,7 @@ OpenAI compatible API server.

 You can start the server using Python, or using [Docker][deployment-docker]:

-```console
+```bash
 vllm serve unsloth/Llama-3.2-1B-Instruct
 ```


--- a/docs/usage/troubleshooting.md
+++ b/docs/usage/troubleshooting.md
@@ -127,13 +127,13 @@ If GPU/CPU communication cannot be established, you can use the following Python

 If you are testing with a single node, adjust `--nproc-per-node` to the number of GPUs you want to use:

-```console
+```bash
 NCCL_DEBUG=TRACE torchrun --nproc-per-node=<number-of-GPUs> test.py
 ```

 If you are testing with multi-nodes, adjust `--nproc-per-node` and `--nnodes` according to your setup and set `MASTER_ADDR` to the correct IP address of the master node, reachable from all nodes. Then, run:

-```console
+```bash
 NCCL_DEBUG=TRACE torchrun --nnodes 2 \
    --nproc-per-node=2 \
    --rdzv_backend=c10d \

--- a/examples/offline_inference/openai_batch/README.md
+++ b/examples/offline_inference/openai_batch/README.md
@@ -29,14 +29,14 @@ We currently support `/v1/chat/completions`, `/v1/embeddings`, and `/v1/score` e

 To follow along with this example, you can download the example batch, or create your own batch file in your working directory.

-```console
+```bash
 wget https://raw.githubusercontent.com/vllm-project/vllm/main/examples/offline_inference/openai_batch/openai_example_batch.jsonl
 ```

 Once you've created your batch file it should look like this

-```console
-$ cat offline_inference/openai_batch/openai_example_batch.jsonl
+```bash
+cat offline_inference/openai_batch/openai_example_batch.jsonl
 {"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_completion_tokens": 1000}}
 {"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are an unhelpful assistant."},{"role": "user", "content": "Hello world!"}],"max_completion_tokens": 1000}}
 ```
@@ -47,7 +47,7 @@ The batch running tool is designed to be used from the command line.

 You can run the batch with the following command, which will write its results to a file called `results.jsonl`

-```console
+```bash
 python -m vllm.entrypoints.openai.run_batch \
    -i offline_inference/openai_batch/openai_example_batch.jsonl \
    -o results.jsonl \
@@ -56,7 +56,7 @@ python -m vllm.entrypoints.openai.run_batch \

 or use command-line:

-```console
+```bash
 vllm run-batch \
    -i offline_inference/openai_batch/openai_example_batch.jsonl \
    -o results.jsonl \
@@ -67,8 +67,8 @@ vllm run-batch \

 You should now have your results at `results.jsonl`. You can check your results by running `cat results.jsonl`

-```console
-$ cat results.jsonl
+```bash
+cat results.jsonl
 {"id":"vllm-383d1c59835645aeb2e07d004d62a826","custom_id":"request-1","response":{"id":"cmpl-61c020e54b964d5a98fa7527bfcdd378","object":"chat.completion","created":1715633336,"model":"meta-llama/Meta-Llama-3-8B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"Hello! It's great to meet you! I'm here to help with any questions or tasks you may have. What's on your mind today?"},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":25,"total_tokens":56,"completion_tokens":31}},"error":null}
 {"id":"vllm-42e3d09b14b04568afa3f1797751a267","custom_id":"request-2","response":{"id":"cmpl-f44d049f6b3a42d4b2d7850bb1e31bcc","object":"chat.completion","created":1715633336,"model":"meta-llama/Meta-Llama-3-8B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"*silence*"},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":27,"total_tokens":32,"completion_tokens":5}},"error":null}
 ```
@@ -79,7 +79,7 @@ The batch runner supports remote input and output urls that are accessible via h

 For example, to run against our example input file located at `https://raw.githubusercontent.com/vllm-project/vllm/main/examples/offline_inference/openai_batch/openai_example_batch.jsonl`, you can run

-```console
+```bash
 python -m vllm.entrypoints.openai.run_batch \
    -i https://raw.githubusercontent.com/vllm-project/vllm/main/examples/offline_inference/openai_batch/openai_example_batch.jsonl \
    -o results.jsonl \
@@ -88,7 +88,7 @@ python -m vllm.entrypoints.openai.run_batch \

 or use command-line:

-```console
+```bash
 vllm run-batch \
    -i https://raw.githubusercontent.com/vllm-project/vllm/main/examples/offline_inference/openai_batch/openai_example_batch.jsonl \
    -o results.jsonl \
@@ -112,21 +112,21 @@ To integrate with cloud blob storage, we recommend using presigned urls.

 To follow along with this example, you can download the example batch, or create your own batch file in your working directory.

-```console
+```bash
 wget https://raw.githubusercontent.com/vllm-project/vllm/main/examples/offline_inference/openai_batch/openai_example_batch.jsonl
 ```

 Once you've created your batch file it should look like this

-```console
-$ cat offline_inference/openai_batch/openai_example_batch.jsonl
+```bash
+cat offline_inference/openai_batch/openai_example_batch.jsonl
 {"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_completion_tokens": 1000}}
 {"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are an unhelpful assistant."},{"role": "user", "content": "Hello world!"}],"max_completion_tokens": 1000}}
 ```

 Now upload your batch file to your S3 bucket.

-```console
+```bash
 aws s3 cp offline_inference/openai_batch/openai_example_batch.jsonl s3://MY_BUCKET/MY_INPUT_FILE.jsonl
 ```

@@ -181,7 +181,7 @@ output_url='https://s3.us-west-2.amazonaws.com/MY_BUCKET/MY_OUTPUT_FILE.jsonl?AW

 You can now run the batch runner, using the urls generated in the previous section.

-```console
+```bash
 python -m vllm.entrypoints.openai.run_batch \
    -i "https://s3.us-west-2.amazonaws.com/MY_BUCKET/MY_INPUT_FILE.jsonl?AWSAccessKeyId=ABCDEFGHIJKLMNOPQRST&Signature=abcdefghijklmnopqrstuvwxyz12345&Expires=1715800091" \
    -o "https://s3.us-west-2.amazonaws.com/MY_BUCKET/MY_OUTPUT_FILE.jsonl?AWSAccessKeyId=ABCDEFGHIJKLMNOPQRST&Signature=abcdefghijklmnopqrstuvwxyz12345&Expires=1715800091" \
@@ -190,7 +190,7 @@ python -m vllm.entrypoints.openai.run_batch \

 or use command-line:

-```console
+```bash
 vllm run-batch \
    -i "https://s3.us-west-2.amazonaws.com/MY_BUCKET/MY_INPUT_FILE.jsonl?AWSAccessKeyId=ABCDEFGHIJKLMNOPQRST&Signature=abcdefghijklmnopqrstuvwxyz12345&Expires=1715800091" \
    -o "https://s3.us-west-2.amazonaws.com/MY_BUCKET/MY_OUTPUT_FILE.jsonl?AWSAccessKeyId=ABCDEFGHIJKLMNOPQRST&Signature=abcdefghijklmnopqrstuvwxyz12345&Expires=1715800091" \
@@ -201,7 +201,7 @@ vllm run-batch \

 Your results are now on S3. You can view them in your terminal by running

-```console
+```bash
 aws s3 cp s3://MY_BUCKET/MY_OUTPUT_FILE.jsonl -
 ```

@@ -230,8 +230,8 @@ You can run the batch using the same command as in earlier examples.

 You can check your results by running `cat results.jsonl`

-```console
-$ cat results.jsonl
+```bash
+cat results.jsonl
 {"id":"vllm-db0f71f7dec244e6bce530e0b4ef908b","custom_id":"request-1","response":{"status_code":200,"request_id":"vllm-batch-3580bf4d4ae54d52b67eee266a6eab20","body":{"id":"embd-33ac2efa7996430184461f2e38529746","object":"list","created":444647,"model":"intfloat/e5-mistral-7b-instruct","data":[{"index":0,"object":"embedding","embedding":[0.016204833984375,0.0092010498046875,0.0018358230590820312,-0.0028228759765625,0.001422882080078125,-0.0031147003173828125,...]}],"usage":{"prompt_tokens":8,"total_tokens":8,"completion_tokens":0}}},"error":null}
 ...
 ```
@@ -261,8 +261,8 @@ You can run the batch using the same command as in earlier examples.

 You can check your results by running `cat results.jsonl`

-```console
-$ cat results.jsonl
+```bash
+cat results.jsonl
 {"id":"vllm-f87c5c4539184f618e555744a2965987","custom_id":"request-1","response":{"status_code":200,"request_id":"vllm-batch-806ab64512e44071b37d3f7ccd291413","body":{"id":"score-4ee45236897b4d29907d49b01298cdb1","object":"list","created":1737847944,"model":"BAAI/bge-reranker-v2-m3","data":[{"index":0,"object":"score","score":0.0010900497436523438},{"index":1,"object":"score","score":1.0}],"usage":{"prompt_tokens":37,"total_tokens":37,"completion_tokens":0,"prompt_tokens_details":null}}},"error":null}
 {"id":"vllm-41990c51a26d4fac8419077f12871099","custom_id":"request-2","response":{"status_code":200,"request_id":"vllm-batch-73ce66379026482699f81974e14e1e99","body":{"id":"score-13f2ffe6ba40460fbf9f7f00ad667d75","object":"list","created":1737847944,"model":"BAAI/bge-reranker-v2-m3","data":[{"index":0,"object":"score","score":0.001094818115234375},{"index":1,"object":"score","score":1.0}],"usage":{"prompt_tokens":37,"total_tokens":37,"completion_tokens":0,"prompt_tokens_details":null}}},"error":null}
 ```
--- a/examples/online_serving/opentelemetry/README.md
+++ b/examples/online_serving/opentelemetry/README.md
@@ -2,7 +2,7 @@

 1. Install OpenTelemetry packages:

-    ```console
+    ```bash
    pip install \
      'opentelemetry-sdk>=1.26.0,<1.27.0' \
      'opentelemetry-api>=1.26.0,<1.27.0' \
@@ -12,7 +12,7 @@

 1. Start Jaeger in a docker container:

-    ```console
+    ```bash
    # From: https://www.jaegertracing.io/docs/1.57/getting-started/
    docker run --rm --name jaeger \
        -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
@@ -31,14 +31,14 @@

 1. In a new shell, export Jaeger IP:

-    ```console
+    ```bash
    export JAEGER_IP=$(docker inspect   --format '{{ .NetworkSettings.IPAddress }}' jaeger)
    export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=grpc://$JAEGER_IP:4317
    ```

    Then set vLLM's service name for OpenTelemetry, enable insecure connections to Jaeger and run vLLM:

-    ```console
+    ```bash
    export OTEL_SERVICE_NAME="vllm-server"
    export OTEL_EXPORTER_OTLP_TRACES_INSECURE=true
    vllm serve facebook/opt-125m --otlp-traces-endpoint="$OTEL_EXPORTER_OTLP_TRACES_ENDPOINT"
@@ -46,7 +46,7 @@

 1. In a new shell, send requests with trace context from a dummy client

-    ```console
+    ```bash
    export JAEGER_IP=$(docker inspect --format '{{ .NetworkSettings.IPAddress }}' jaeger)
    export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=grpc://$JAEGER_IP:4317
    export OTEL_EXPORTER_OTLP_TRACES_INSECURE=true
@@ -67,7 +67,7 @@
 OpenTelemetry supports either `grpc` or `http/protobuf` as the transport protocol for trace data in the exporter.
 By default, `grpc` is used. To set `http/protobuf` as the protocol, configure the `OTEL_EXPORTER_OTLP_TRACES_PROTOCOL` environment variable as follows:

-```console
+```bash
 export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=http/protobuf
 export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://$JAEGER_IP:4318/v1/traces
 vllm serve facebook/opt-125m --otlp-traces-endpoint="$OTEL_EXPORTER_OTLP_TRACES_ENDPOINT"
@@ -79,13 +79,13 @@ OpenTelemetry allows automatic instrumentation of FastAPI.

 1. Install the instrumentation library

-    ```console
+    ```bash
    pip install opentelemetry-instrumentation-fastapi
    ```

 1. Run vLLM with `opentelemetry-instrument`

-    ```console
+    ```bash
    opentelemetry-instrument vllm serve facebook/opt-125m
    ```