[Docs] Fix syntax highlighting of shell commands (#19870)

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

[Docs] Fix syntax highlighting of shell commands (#19870)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
c3649e4f · Lukas Geiger · GitHub · 53243e5c · c3649e4f · c3649e4f
Unverified Commit c3649e4f authored Jun 23, 2025 by Lukas Geiger Committed by GitHub Jun 23, 2025
20 changed files
--- a/.buildkite/nightly-benchmarks/nightly-annotation.md
+++ b/.buildkite/nightly-benchmarks/nightly-annotation.md
@@ -16,7 +16,7 @@ Please download the visualization scripts in the post
  - Download `nightly-benchmarks.zip`.
  - In the same folder, run the following code:

-  ```console
+  ```bash
  export HF_TOKEN=<your HF token>
  apt update
  apt install -y git

--- a/docs/deployment/docker.md
+++ b/docs/deployment/docker.md
@@ -10,7 +10,7 @@ title: Using Docker
 vLLM offers an official Docker image for deployment.
 The image can be used to run OpenAI compatible server and is available on Docker Hub as [vllm/vllm-openai](https://hub.docker.com/r/vllm/vllm-openai/tags).

-```console
+```bash
 docker run --runtime nvidia --gpus all \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HUGGING_FACE_HUB_TOKEN=<secret>" \
@@ -22,7 +22,7 @@ docker run --runtime nvidia --gpus all \

 This image can also be used with other container engines such as [Podman](https://podman.io/).

-```console
+```bash
 podman run --gpus all \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  --env "HUGGING_FACE_HUB_TOKEN=$HF_TOKEN" \
@@ -71,7 +71,7 @@ You can add any other [engine-args][engine-args] you need after the image tag (`

 You can build and run vLLM from source via the provided <gh-file:docker/Dockerfile>. To build vLLM:

-```console
+```bash
 # optionally specifies: --build-arg max_jobs=8 --build-arg nvcc_threads=2
 DOCKER_BUILDKIT=1 docker build . \
    --target vllm-openai \
@@ -99,7 +99,7 @@ of PyTorch Nightly and should be considered **experimental**. Using the flag `--

 ??? Command

-    ```console
+    ```bash
    # Example of building on Nvidia GH200 server. (Memory usage: ~15GB, Build time: ~1475s / ~25 min, Image size: 6.93GB)
    python3 use_existing_torch.py
    DOCKER_BUILDKIT=1 docker build . \
@@ -118,7 +118,7 @@ of PyTorch Nightly and should be considered **experimental**. Using the flag `--

    Run the following command on your host machine to register QEMU user static handlers:

-    ```console
+    ```bash
    docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
    ```

@@ -128,7 +128,7 @@ of PyTorch Nightly and should be considered **experimental**. Using the flag `--

 To run vLLM with the custom-built Docker image:

-```console
+```bash
 docker run --runtime nvidia --gpus all \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    -p 8000:8000 \

--- a/docs/deployment/frameworks/anything-llm.md
+++ b/docs/deployment/frameworks/anything-llm.md
@@ -15,7 +15,7 @@ It allows you to deploy a large language model (LLM) server with vLLM as the bac

 - Start the vLLM server with the supported chat completion model, e.g.

-```console
+```bash
 vllm serve Qwen/Qwen1.5-32B-Chat-AWQ --max-model-len 4096
 ```


--- a/docs/deployment/frameworks/autogen.md
+++ b/docs/deployment/frameworks/autogen.md
@@ -11,7 +11,7 @@ title: AutoGen

 - Setup [AutoGen](https://microsoft.github.io/autogen/0.2/docs/installation/) environment

-```console
+```bash
 pip install vllm

 # Install AgentChat and OpenAI client from Extensions
@@ -23,7 +23,7 @@ pip install -U "autogen-agentchat" "autogen-ext[openai]"

 - Start the vLLM server with the supported chat completion model, e.g.

-```console
+```bash
 python -m vllm.entrypoints.openai.api_server \
    --model mistralai/Mistral-7B-Instruct-v0.2
 ```

--- a/docs/deployment/frameworks/cerebrium.md
+++ b/docs/deployment/frameworks/cerebrium.md
@@ -11,14 +11,14 @@ vLLM can be run on a cloud based GPU machine with [Cerebrium](https://www.cerebr

 To install the Cerebrium client, run:

-```console
+```bash
 pip install cerebrium
 cerebrium login
 ```

 Next, create your Cerebrium project, run:

-```console
+```bash
 cerebrium init vllm-project
 ```

@@ -58,7 +58,7 @@ Next, let us add our code to handle inference for the LLM of your choice (`mistr

 Then, run the following code to deploy it to the cloud:

-```console
+```bash
 cerebrium deploy
 ```


--- a/docs/deployment/frameworks/chatbox.md
+++ b/docs/deployment/frameworks/chatbox.md
@@ -15,7 +15,7 @@ It allows you to deploy a large language model (LLM) server with vLLM as the bac

 - Start the vLLM server with the supported chat completion model, e.g.

-```console
+```bash
 vllm serve qwen/Qwen1.5-0.5B-Chat
 ```


--- a/docs/deployment/frameworks/dify.md
+++ b/docs/deployment/frameworks/dify.md
@@ -18,13 +18,13 @@ This guide walks you through deploying Dify using a vLLM backend.

 - Start the vLLM server with the supported chat completion model, e.g.

-```console
+```bash
 vllm serve Qwen/Qwen1.5-7B-Chat
 ```

 - Start the Dify server with docker compose ([details](https://github.com/langgenius/dify?tab=readme-ov-file#quick-start)):

-```console
+```bash
 git clone https://github.com/langgenius/dify.git
 cd dify
 cd docker

--- a/docs/deployment/frameworks/dstack.md
+++ b/docs/deployment/frameworks/dstack.md
@@ -11,14 +11,14 @@ vLLM can be run on a cloud based GPU machine with [dstack](https://dstack.ai/),

 To install dstack client, run:

-```console
+```bash
 pip install "dstack[all]
 dstack server
 ```

 Next, to configure your dstack project, run:

-```console
+```bash
 mkdir -p vllm-dstack
 cd vllm-dstack
 dstack init

--- a/docs/deployment/frameworks/haystack.md
+++ b/docs/deployment/frameworks/haystack.md
@@ -13,7 +13,7 @@ It allows you to deploy a large language model (LLM) server with vLLM as the bac

 - Setup vLLM and Haystack environment

-```console
+```bash
 pip install vllm haystack-ai
 ```

@@ -21,7 +21,7 @@ pip install vllm haystack-ai

 - Start the vLLM server with the supported chat completion model, e.g.

-```console
+```bash
 vllm serve mistralai/Mistral-7B-Instruct-v0.1
 ```


--- a/docs/deployment/frameworks/helm.md
+++ b/docs/deployment/frameworks/helm.md
@@ -22,7 +22,7 @@ Before you begin, ensure that you have the following:

 To install the chart with the release name `test-vllm`:

-```console
+```bash
 helm upgrade --install --create-namespace --namespace=ns-vllm test-vllm . -f values.yaml --set secrets.s3endpoint=$ACCESS_POINT --set secrets.s3bucketname=$BUCKET --set secrets.s3accesskeyid=$ACCESS_KEY --set secrets.s3accesskey=$SECRET_KEY
 ```

@@ -30,7 +30,7 @@ helm upgrade --install --create-namespace --namespace=ns-vllm test-vllm . -f val

 To uninstall the `test-vllm` deployment:

-```console
+```bash
 helm uninstall test-vllm --namespace=ns-vllm
 ```


--- a/docs/deployment/frameworks/litellm.md
+++ b/docs/deployment/frameworks/litellm.md
@@ -18,7 +18,7 @@ And LiteLLM supports all models on VLLM.

 - Setup vLLM and litellm environment

-```console
+```bash
 pip install vllm litellm
 ```

@@ -28,7 +28,7 @@ pip install vllm litellm

 - Start the vLLM server with the supported chat completion model, e.g.

-```console
+```bash
 vllm serve qwen/Qwen1.5-0.5B-Chat
 ```

@@ -56,7 +56,7 @@ vllm serve qwen/Qwen1.5-0.5B-Chat

 - Start the vLLM server with the supported embedding model, e.g.

-```console
+```bash
 vllm serve BAAI/bge-base-en-v1.5
 ```


--- a/docs/deployment/frameworks/open-webui.md
+++ b/docs/deployment/frameworks/open-webui.md
@@ -7,13 +7,13 @@ title: Open WebUI

 2. Start the vLLM server with the supported chat completion model, e.g.

-```console
+```bash
 vllm serve qwen/Qwen1.5-0.5B-Chat
 ```

 1. Start the [Open WebUI](https://github.com/open-webui/open-webui) docker container (replace the vllm serve host and vllm serve port):

-```console
+```bash
 docker run -d -p 3000:8080 \
 --name open-webui \
 -v open-webui:/app/backend/data \

--- a/docs/deployment/frameworks/retrieval_augmented_generation.md
+++ b/docs/deployment/frameworks/retrieval_augmented_generation.md
@@ -15,7 +15,7 @@ Here are the integrations:

 - Setup vLLM and langchain environment

-```console
+```bash
 pip install -U vllm \
            langchain_milvus langchain_openai \
            langchain_community beautifulsoup4 \
@@ -26,14 +26,14 @@ pip install -U vllm \

 - Start the vLLM server with the supported embedding model, e.g.

-```console
+```bash
 # Start embedding service (port 8000)
 vllm serve ssmits/Qwen2-7B-Instruct-embed-base
 ```

 - Start the vLLM server with the supported chat completion model, e.g.

-```console
+```bash
 # Start chat service (port 8001)
 vllm serve qwen/Qwen1.5-0.5B-Chat --port 8001
 ```
@@ -52,7 +52,7 @@ python retrieval_augmented_generation_with_langchain.py

 - Setup vLLM and llamaindex environment

-```console
+```bash
 pip install vllm \
            llama-index llama-index-readers-web \
            llama-index-llms-openai-like    \
@@ -64,14 +64,14 @@ pip install vllm \

 - Start the vLLM server with the supported embedding model, e.g.

-```console
+```bash
 # Start embedding service (port 8000)
 vllm serve ssmits/Qwen2-7B-Instruct-embed-base
 ```

 - Start the vLLM server with the supported chat completion model, e.g.

-```console
+```bash
 # Start chat service (port 8001)
 vllm serve qwen/Qwen1.5-0.5B-Chat --port 8001
 ```

--- a/docs/deployment/frameworks/skypilot.md
+++ b/docs/deployment/frameworks/skypilot.md
@@ -15,7 +15,7 @@ vLLM can be **run and scaled to multiple service replicas on clouds and Kubernet
 - Check that you have installed SkyPilot ([docs](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html)).
 - Check that `sky check` shows clouds or Kubernetes are enabled.

-```console
+```bash
 pip install skypilot-nightly
 sky check
 ```
@@ -71,7 +71,7 @@ See the vLLM SkyPilot YAML for serving, [serving.yaml](https://github.com/skypil

 Start the serving the Llama-3 8B model on any of the candidate GPUs listed (L4, A10g, ...):

-```console
+```bash
 HF_TOKEN="your-huggingface-token" sky launch serving.yaml --env HF_TOKEN
 ```

@@ -83,7 +83,7 @@ Check the output of the command. There will be a shareable gradio link (like the

 **Optional**: Serve the 70B model instead of the default 8B and use more GPU:

-```console
+```bash
 HF_TOKEN="your-huggingface-token" \
  sky launch serving.yaml \
  --gpus A100:8 \
@@ -159,7 +159,7 @@ SkyPilot can scale up the service to multiple service replicas with built-in aut

 Start the serving the Llama-3 8B model on multiple replicas:

-```console
+```bash
 HF_TOKEN="your-huggingface-token" \
  sky serve up -n vllm serving.yaml \
  --env HF_TOKEN
@@ -167,7 +167,7 @@ HF_TOKEN="your-huggingface-token" \

 Wait until the service is ready:

-```console
+```bash
 watch -n10 sky serve status vllm
 ```

@@ -271,13 +271,13 @@ This will scale the service up to when the QPS exceeds 2 for each replica.

 To update the service with the new config:

-```console
+```bash
 HF_TOKEN="your-huggingface-token" sky serve update vllm serving.yaml --env HF_TOKEN
 ```

 To stop the service:

-```console
+```bash
 sky serve down vllm
 ```

@@ -317,7 +317,7 @@ It is also possible to access the Llama-3 service with a separate GUI frontend,

 1. Start the chat web UI:

-    ```console
+    ```bash
    sky launch \
      -c gui ./gui.yaml \
      --env ENDPOINT=$(sky serve status --endpoint vllm)

--- a/docs/deployment/frameworks/streamlit.md
+++ b/docs/deployment/frameworks/streamlit.md
@@ -15,13 +15,13 @@ It can be quickly integrated with vLLM as a backend API server, enabling powerfu

 - Start the vLLM server with the supported chat completion model, e.g.

-```console
+```bash
 vllm serve qwen/Qwen1.5-0.5B-Chat
 ```

 - Install streamlit and openai:

-```console
+```bash
 pip install streamlit openai
 ```

@@ -29,7 +29,7 @@ pip install streamlit openai

 - Start the streamlit web UI and start to chat:

-```console
+```bash
 streamlit run streamlit_openai_chatbot_webserver.py

 # or specify the VLLM_API_BASE or VLLM_API_KEY

--- a/docs/deployment/integrations/llamastack.md
+++ b/docs/deployment/integrations/llamastack.md
@@ -7,7 +7,7 @@ vLLM is also available via [Llama Stack](https://github.com/meta-llama/llama-sta

 To install Llama Stack, run

-```console
+```bash
 pip install llama-stack -q
 ```


--- a/docs/deployment/k8s.md
+++ b/docs/deployment/k8s.md
@@ -115,7 +115,7 @@ Next, start the vLLM server as a Kubernetes Deployment and Service:

 We can verify that the vLLM server has started successfully via the logs (this might take a couple of minutes to download the model):

-```console
+```bash
 kubectl logs -l app.kubernetes.io/name=vllm
 ...
 INFO:     Started server process [1]
@@ -358,14 +358,14 @@ INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

      Apply the deployment and service configurations using `kubectl apply -f <filename>`:

-      ```console
+      ```bash
      kubectl apply -f deployment.yaml
      kubectl apply -f service.yaml
      ```

      To test the deployment, run the following `curl` command:

-      ```console
+      ```bash
      curl http://mistral-7b.default.svc.cluster.local/v1/completions \
        -H "Content-Type: application/json" \
        -d '{

--- a/docs/deployment/nginx.md
+++ b/docs/deployment/nginx.md
@@ -11,13 +11,13 @@ This document shows how to launch multiple vLLM serving containers and use Nginx

 This guide assumes that you have just cloned the vLLM project and you're currently in the vllm root directory.

-```console
+```bash
 export vllm_root=`pwd`
 ```

 Create a file named `Dockerfile.nginx`:

-```console
+```dockerfile
 FROM nginx:latest
 RUN rm /etc/nginx/conf.d/default.conf
 EXPOSE 80
@@ -26,7 +26,7 @@ CMD ["nginx", "-g", "daemon off;"]

 Build the container:

-```console
+```bash
 docker build . -f Dockerfile.nginx --tag nginx-lb
 ```

@@ -60,14 +60,14 @@ Create a file named `nginx_conf/nginx.conf`. Note that you can add as many serve

 ## Build vLLM Container

-```console
+```bash
 cd $vllm_root
 docker build -f docker/Dockerfile . --tag vllm
 ```

 If you are behind proxy, you can pass the proxy settings to the docker build command as shown below:

-```console
+```bash
 cd $vllm_root
 docker build \
    -f docker/Dockerfile . \
@@ -80,7 +80,7 @@ docker build \

 ## Create Docker Network

-```console
+```bash
 docker network create vllm_nginx
 ```

@@ -129,7 +129,7 @@ Notes:

 ## Launch Nginx

-```console
+```bash
 docker run \
    -itd \
    -p 8000:80 \
@@ -142,7 +142,7 @@ docker run \

 ## Verify That vLLM Servers Are Ready

-```console
+```bash
 docker logs vllm0 | grep Uvicorn
 docker logs vllm1 | grep Uvicorn
 ```

--- a/docs/features/multimodal_inputs.md
+++ b/docs/features/multimodal_inputs.md
@@ -307,7 +307,7 @@ Full example: <gh-file:examples/online_serving/openai_chat_completion_client_for
    By default, the timeout for fetching images through HTTP URL is `5` seconds.
    You can override this by setting the environment variable:

-    ```console
+    ```bash
    export VLLM_IMAGE_FETCH_TIMEOUT=<timeout>
    ```

@@ -370,7 +370,7 @@ Full example: <gh-file:examples/online_serving/openai_chat_completion_client_for
    By default, the timeout for fetching videos through HTTP URL is `30` seconds.
    You can override this by setting the environment variable:

-    ```console
+    ```bash
    export VLLM_VIDEO_FETCH_TIMEOUT=<timeout>
    ```

@@ -476,7 +476,7 @@ Full example: <gh-file:examples/online_serving/openai_chat_completion_client_for
    By default, the timeout for fetching audios through HTTP URL is `10` seconds.
    You can override this by setting the environment variable:

-    ```console
+    ```bash
    export VLLM_AUDIO_FETCH_TIMEOUT=<timeout>
    ```


--- a/docs/features/quantization/auto_awq.md
+++ b/docs/features/quantization/auto_awq.md
@@ -9,7 +9,7 @@ The main benefits are lower latency and memory usage.

 You can quantize your own models by installing AutoAWQ or picking one of the [6500+ models on Huggingface](https://huggingface.co/models?search=awq).

-```console
+```bash
 pip install autoawq
 ```

@@ -43,7 +43,7 @@ After installing AutoAWQ, you are ready to quantize a model. Please refer to the

 To run an AWQ model with vLLM, you can use [TheBloke/Llama-2-7b-Chat-AWQ](https://huggingface.co/TheBloke/Llama-2-7b-Chat-AWQ) with the following command:

-```console
+```bash
 python examples/offline_inference/llm_engine_example.py \
    --model TheBloke/Llama-2-7b-Chat-AWQ \
    --quantization awq