[Docs] Switch to better markdown linting pre-commit hook (#21851)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

[Docs] Switch to better markdown linting pre-commit hook (#21851)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
ba5c5e54 · Harry Mellor · GitHub · 555e7225 · ba5c5e54 · ba5c5e54
Unverified Commit ba5c5e54 authored Jul 30, 2025 by Harry Mellor Committed by GitHub Jul 29, 2025
20 changed files
--- a/.buildkite/nightly-benchmarks/README.md
+++ b/.buildkite/nightly-benchmarks/README.md
@@ -28,6 +28,7 @@ See [vLLM performance dashboard](https://perf.vllm.ai) for the latest performanc
 ## Trigger the benchmark
 Performance benchmark will be triggered when:
 - A PR being merged into vllm.
 - Every commit for those PRs with `perf-benchmarks` label AND `ready` label.
@@ -38,6 +39,7 @@ bash .buildkite/nightly-benchmarks/scripts/run-performance-benchmarks.sh
 ```
 Runtime environment variables:
 - `ON_CPU`: set the value to '1' on Intel® Xeon® Processors. Default value is 0.
 - `SERVING_JSON`: JSON file to use for the serving tests. Default value is empty string (use default file).
 - `LATENCY_JSON`: JSON file to use for the latency tests. Default value is empty string (use default file).
@@ -46,12 +48,14 @@ Runtime environment variables:
 - `REMOTE_PORT`: Port for the remote vLLM service to benchmark. Default value is empty string.
 Nightly benchmark will be triggered when:
 - Every commit for those PRs with `perf-benchmarks` label and `nightly-benchmarks` label.
 ## Performance benchmark details
 See [performance-benchmarks-descriptions.md](performance-benchmarks-descriptions.md) for detailed descriptions, and use `tests/latency-tests.json`, `tests/throughput-tests.json`, `tests/serving-tests.json` to configure the test cases.
 > NOTE: For Intel® Xeon® Processors, use `tests/latency-tests-cpu.json`, `tests/throughput-tests-cpu.json`, `tests/serving-tests-cpu.json` instead.
+>
 ### Latency test
 Here is an example of one test inside `latency-tests.json`:
@@ -149,6 +153,7 @@ Here is an example using the script to compare result_a and result_b without det
 Here is an example using the script to compare result_a and result_b with detail test name.
 `python3 compare-json-results.py -f results_a/benchmark_results.json -f results_b/benchmark_results.json`
 |   | results_a/benchmark_results.json_name | results_a/benchmark_results.json | results_b/benchmark_results.json_name | results_b/benchmark_results.json | perf_ratio        |
 |---|---------------------------------------------|----------------------------------------|---------------------------------------------|----------------------------------------|----------|
 | 0 | serving_llama8B_tp1_sharegpt_qps_1          | 142.633982                             | serving_llama8B_tp1_sharegpt_qps_1          | 156.526018                             | 1.097396 |

--- a/.buildkite/nightly-benchmarks/nightly-annotation.md
+++ b/.buildkite/nightly-benchmarks/nightly-annotation.md
+# Nightly benchmark annotation
 ## Description
@@ -13,15 +14,15 @@ Please download the visualization scripts in the post
 - Find the docker we use in `benchmarking pipeline`
 - Deploy the docker, and inside the docker:
-  - Download `nightly-benchmarks.zip`.
+    - Download `nightly-benchmarks.zip`.
-  - In the same folder, run the following code:
+    - In the same folder, run the following code:
-  ```bash
+    ```bash
-  export HF_TOKEN=<your HF token>
+    export HF_TOKEN=<your HF token>
-  apt update
+    apt update
-  apt install -y git
+    apt install -y git
-  unzip nightly-benchmarks.zip
+    unzip nightly-benchmarks.zip
-  VLLM_SOURCE_CODE_LOC=./ bash .buildkite/nightly-benchmarks/scripts/run-nightly-benchmarks.sh
+    VLLM_SOURCE_CODE_LOC=./ bash .buildkite/nightly-benchmarks/scripts/run-nightly-benchmarks.sh
-  ```
+    ```
 And the results will be inside `./benchmarks/results`.
--- a/.buildkite/nightly-benchmarks/nightly-descriptions.md
+++ b/.buildkite/nightly-benchmarks/nightly-descriptions.md
@@ -13,25 +13,25 @@ Latest reproduction guilde: [github issue link](https://github.com/vllm-project/
 ## Setup
 - Docker images:
-  - vLLM: `vllm/vllm-openai:v0.6.2`
+    - vLLM: `vllm/vllm-openai:v0.6.2`
-  - SGLang: `lmsysorg/sglang:v0.3.2-cu121`
+    - SGLang: `lmsysorg/sglang:v0.3.2-cu121`
-  - LMDeploy: `openmmlab/lmdeploy:v0.6.1-cu12`
+    - LMDeploy: `openmmlab/lmdeploy:v0.6.1-cu12`
-  - TensorRT-LLM: `nvcr.io/nvidia/tritonserver:24.07-trtllm-python-py3`
+    - TensorRT-LLM: `nvcr.io/nvidia/tritonserver:24.07-trtllm-python-py3`
-    - *NOTE: we uses r24.07 as the current implementation only works for this version. We are going to bump this up.*
+        - *NOTE: we uses r24.07 as the current implementation only works for this version. We are going to bump this up.*
-  - Check [nightly-pipeline.yaml](nightly-pipeline.yaml) for the concrete docker images, specs and commands we use for the benchmark.
+    - Check [nightly-pipeline.yaml](nightly-pipeline.yaml) for the concrete docker images, specs and commands we use for the benchmark.
 - Hardware
-  - 8x Nvidia A100 GPUs
+    - 8x Nvidia A100 GPUs
 - Workload:
-  - Dataset
+    - Dataset
-    - ShareGPT dataset
+        - ShareGPT dataset
-    - Prefill-heavy dataset (in average 462 input tokens, 16 tokens as output)
+        - Prefill-heavy dataset (in average 462 input tokens, 16 tokens as output)
-    - Decode-heavy dataset (in average 462 input tokens, 256 output tokens)
+        - Decode-heavy dataset (in average 462 input tokens, 256 output tokens)
-    - Check [nightly-tests.json](tests/nightly-tests.json) for the concrete configuration of datasets we use.
+        - Check [nightly-tests.json](tests/nightly-tests.json) for the concrete configuration of datasets we use.
-  - Models: llama-3 8B, llama-3 70B.
+    - Models: llama-3 8B, llama-3 70B.
-    - We do not use llama 3.1 as it is incompatible with trt-llm r24.07. ([issue](https://github.com/NVIDIA/TensorRT-LLM/issues/2105)).
+        - We do not use llama 3.1 as it is incompatible with trt-llm r24.07. ([issue](https://github.com/NVIDIA/TensorRT-LLM/issues/2105)).
-  - Average QPS (query per second): 2, 4, 8, 16, 32 and inf.
+    - Average QPS (query per second): 2, 4, 8, 16, 32 and inf.
-    - Queries are randomly sampled, and arrival patterns are determined via Poisson process, but all with fixed random seed.
+        - Queries are randomly sampled, and arrival patterns are determined via Poisson process, but all with fixed random seed.
-  - Evaluation metrics: Throughput (higher the better), TTFT (time to the first token, lower the better), ITL (inter-token latency, lower the better).
+    - Evaluation metrics: Throughput (higher the better), TTFT (time to the first token, lower the better), ITL (inter-token latency, lower the better).
 ## Known issues

--- a/.buildkite/nightly-benchmarks/performance-benchmarks-descriptions.md
+++ b/.buildkite/nightly-benchmarks/performance-benchmarks-descriptions.md
+# Performance benchmarks descriptions
 ## Latency tests

--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
-## Essential Elements of an Effective PR Description Checklist
+# Essential Elements of an Effective PR Description Checklist
 - [ ] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
 - [ ] The test plan, such as providing test command.
 - [ ] The test results, such as pasting the results comparison before and after, or e2e results
@@ -14,5 +15,4 @@ PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS ABOVE HAVE B
 ## (Optional) Documentation Update
-<!--- pyml disable-next-line no-emphasis-as-heading -->
 **BEFORE SUBMITTING, PLEASE READ <https://docs.vllm.ai/en/latest/contributing>** (anything written below this line will be removed by GitHub Actions)
--- a/.markdownlint.yaml
+++ b/.markdownlint.yaml
+MD007:
+  indent: 4
+MD013: false
+MD024:
+  siblings_only: true
+MD033: false
+MD042: false
+MD045: false
+MD046: false
+MD051: false
+MD052: false
+MD053: false
+MD059: false
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -35,12 +35,11 @@ repos:
    exclude: 'csrc/(moe/topk_softmax_kernels.cu|quantization/gguf/(ggml-common.h|dequantize.cuh|vecdotq.cuh|mmq.cuh|mmvq.cuh))|vllm/third_party/.*'
    types_or: [c++, cuda]
    args: [--style=file, --verbose]
- repo: https://github.com/jackdewinter/pymarkdown
+- repo: https://github.com/igorshubovych/markdownlint-cli
-  rev: v0.9.29
+  rev: v0.45.0
  hooks:
-  - id: pymarkdown
+  - id: markdownlint-fix
    exclude: '.*\.inc\.md'
-    args: [fix]
 - repo: https://github.com/rhysd/actionlint
  rev: v1.7.7
  hooks:

--- a/README.md
+++ b/README.md
+<!-- markdownlint-disable MD001 MD041 -->
 <p align="center">
  <picture>
    <source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/vllm-project/vllm/main/docs/assets/logos/vllm-logo-text-dark.png">
@@ -16,6 +17,7 @@ Easy, fast, and cheap LLM serving for everyone
 ---
 *Latest News* 🔥
 - [2025/05] We hosted [NYC vLLM Meetup](https://lu.ma/c1rqyf1f)! Please find the meetup slides [here](https://docs.google.com/presentation/d/1_q_aW_ioMJWUImf1s1YM-ZhjXz8cUeL0IJvaquOYBeA/edit?usp=sharing).
 - [2025/05] vLLM is now a hosted project under PyTorch Foundation! Please find the announcement [here](https://pytorch.org/blog/pytorch-foundation-welcomes-vllm/).
 - [2025/04] We hosted [Asia Developer Day](https://www.sginnovate.com/event/limited-availability-morning-evening-slots-remaining-inaugural-vllm-asia-developer-day)! Please find the meetup slides from the vLLM team [here](https://docs.google.com/presentation/d/19cp6Qu8u48ihB91A064XfaXruNYiBOUKrBxAmDOllOo/edit?usp=sharing).
@@ -46,6 +48,7 @@ Easy, fast, and cheap LLM serving for everyone
 </details>
 ---
 ## About
 vLLM is a fast and easy-to-use library for LLM inference and serving.
@@ -75,6 +78,7 @@ vLLM is flexible and easy to use with:
 - Multi-LoRA support
 vLLM seamlessly supports most popular open-source models on HuggingFace, including:
 - Transformer-like LLMs (e.g., Llama)
 - Mixture-of-Expert LLMs (e.g., Mixtral, Deepseek-V2 and V3)
 - Embedding Models (e.g., E5-Mistral)
@@ -91,6 +95,7 @@ pip install vllm
 ```
 Visit our [documentation](https://docs.vllm.ai/en/latest/) to learn more.
 - [Installation](https://docs.vllm.ai/en/latest/getting_started/installation.html)
 - [Quickstart](https://docs.vllm.ai/en/latest/getting_started/quickstart.html)
 - [List of Supported Models](https://docs.vllm.ai/en/latest/models/supported_models.html)
@@ -107,6 +112,7 @@ vLLM is a community project. Our compute resources for development and testing a
 <!-- Note: Please sort them in alphabetical order. -->
 <!-- Note: Please keep these consistent with docs/community/sponsors.md -->
 Cash Donations:
 - a16z
 - Dropbox
 - Sequoia Capital
@@ -114,6 +120,7 @@ Cash Donations:
 - ZhenFund
 Compute Resources:
 - AMD
 - Anyscale
 - AWS

--- a/RELEASE.md
+++ b/RELEASE.md
@@ -60,9 +60,10 @@ Please note: **No feature work allowed for cherry picks**. All PRs that are cons
 Before each release, we perform end-to-end performance validation to ensure no regressions are introduced. This validation uses the [vllm-benchmark workflow](https://github.com/pytorch/pytorch-integration-testing/actions/workflows/vllm-benchmark.yml) on PyTorch CI.
 **Current Coverage:**
 * Models: Llama3, Llama4, and Mixtral
 * Hardware: NVIDIA H100 and AMD MI300x
-* *Note: Coverage may change based on new model releases and hardware availability*
+* _Note: Coverage may change based on new model releases and hardware availability_
 **Performance Validation Process:**
@@ -71,11 +72,13 @@ Request write access to the [pytorch/pytorch-integration-testing](https://github
 **Step 2: Review Benchmark Setup**
 Familiarize yourself with the benchmark configurations:
 * [CUDA setup](https://github.com/pytorch/pytorch-integration-testing/tree/main/vllm-benchmarks/benchmarks/cuda)
 * [ROCm setup](https://github.com/pytorch/pytorch-integration-testing/tree/main/vllm-benchmarks/benchmarks/rocm)
 **Step 3: Run the Benchmark**
 Navigate to the [vllm-benchmark workflow](https://github.com/pytorch/pytorch-integration-testing/actions/workflows/vllm-benchmark.yml) and configure:
 * **vLLM branch**: Set to the release branch (e.g., `releases/v0.9.2`)
 * **vLLM commit**: Set to the RC commit hash

--- a/benchmarks/README.md
+++ b/benchmarks/README.md
@@ -4,7 +4,7 @@ This README guides you through running benchmark tests with the extensive
 datasets supported on vLLM. It’s a living document, updated as new features and datasets
 become available.
-**Dataset Overview**
+## Dataset Overview
 <table style="width:100%; border-collapse: collapse;">
  <thead>
@@ -81,9 +81,10 @@ become available.
 **Note**: HuggingFace dataset's `dataset-name` should be set to `hf`
---
+## 🚀 Example - Online Benchmark
 <details>
-<summary><b>🚀 Example - Online Benchmark</b></summary>
+<summary>Show more</summary>
 <br/>
@@ -109,7 +110,7 @@ vllm bench serve \
 If successful, you will see the following output
-```
+```text
 ============ Serving Benchmark Result ============
 Successful requests:                     10
 Benchmark duration (s):                  5.78
@@ -133,11 +134,11 @@ P99 ITL (ms):                            8.39
 ==================================================
 ```
-**Custom Dataset**
+### Custom Dataset
 If the dataset you want to benchmark is not supported yet in vLLM, even then you can benchmark on it using `CustomDataset`. Your data needs to be in `.jsonl` format and needs to have "prompt" field per entry, e.g., data.jsonl
-```
+```json
 {"prompt": "What is the capital of India?"}
 {"prompt": "What is the capital of Iran?"}
 {"prompt": "What is the capital of China?"}
@@ -166,7 +167,7 @@ vllm bench serve --port 9001 --save-result --save-detailed \
 You can skip applying chat template if your data already has it by using `--custom-skip-chat-template`.
-**VisionArena Benchmark for Vision Language Models**
+### VisionArena Benchmark for Vision Language Models
 ```bash
 # need a model with vision capability here
@@ -184,7 +185,7 @@ vllm bench serve \
  --num-prompts 1000
 ```
-**InstructCoder Benchmark with Speculative Decoding**
+### InstructCoder Benchmark with Speculative Decoding
 ``` bash
 VLLM_USE_V1=1 vllm serve meta-llama/Meta-Llama-3-8B-Instruct \
@@ -201,13 +202,13 @@ vllm bench serve \
    --num-prompts 2048
 ```
-**Other HuggingFaceDataset Examples**
+### Other HuggingFaceDataset Examples
 ```bash
 vllm serve Qwen/Qwen2-VL-7B-Instruct --disable-log-requests
 ```
-**`lmms-lab/LLaVA-OneVision-Data`**
+`lmms-lab/LLaVA-OneVision-Data`:
 ```bash
 vllm bench serve \
@@ -221,7 +222,7 @@ vllm bench serve \
  --num-prompts 10
 ```
-**`Aeala/ShareGPT_Vicuna_unfiltered`**
+`Aeala/ShareGPT_Vicuna_unfiltered`:
 ```bash
 vllm bench serve \
@@ -234,7 +235,7 @@ vllm bench serve \
  --num-prompts 10
 ```
-**`AI-MO/aimo-validation-aime`**
+`AI-MO/aimo-validation-aime`:
 ``` bash
 vllm bench serve \
@@ -245,7 +246,7 @@ vllm bench serve \
    --seed 42
 ```
-**`philschmid/mt-bench`**
+`philschmid/mt-bench`:
 ``` bash
 vllm bench serve \
@@ -255,7 +256,7 @@ vllm bench serve \
    --num-prompts 80
 ```
-**Running With Sampling Parameters**
+### Running With Sampling Parameters
 When using OpenAI-compatible backends such as `vllm`, optional sampling
 parameters can be specified. Example client command:
@@ -273,25 +274,29 @@ vllm bench serve \
  --num-prompts 10
 ```
-**Running With Ramp-Up Request Rate**
+### Running With Ramp-Up Request Rate
 The benchmark tool also supports ramping up the request rate over the
 duration of the benchmark run. This can be useful for stress testing the
 server or finding the maximum throughput that it can handle, given some latency budget.
 Two ramp-up strategies are supported:
 - `linear`: Increases the request rate linearly from a start value to an end value.
 - `exponential`: Increases the request rate exponentially.
 The following arguments can be used to control the ramp-up:
 - `--ramp-up-strategy`: The ramp-up strategy to use (`linear` or `exponential`).
 - `--ramp-up-start-rps`: The request rate at the beginning of the benchmark.
 - `--ramp-up-end-rps`: The request rate at the end of the benchmark.
 </details>
+## 📈 Example - Offline Throughput Benchmark
 <details>
-<summary><b>📈 Example - Offline Throughput Benchmark</b></summary>
+<summary>Show more</summary>
 <br/>
@@ -305,15 +310,15 @@ vllm bench throughput \
 If successful, you will see the following output
-```
+```text
 Throughput: 7.15 requests/s, 4656.00 total tokens/s, 1072.15 output tokens/s
 Total num prompt tokens:  5014
 Total num output tokens:  1500
 ```
-**VisionArena Benchmark for Vision Language Models**
+### VisionArena Benchmark for Vision Language Models
-``` bash
+```bash
 vllm bench throughput \
  --model Qwen/Qwen2-VL-7B-Instruct \
  --backend vllm-chat \
@@ -325,13 +330,13 @@ vllm bench throughput \
 The `num prompt tokens` now includes image token counts
-```
+```text
 Throughput: 2.55 requests/s, 4036.92 total tokens/s, 326.90 output tokens/s
 Total num prompt tokens:  14527
 Total num output tokens:  1280
 ```
-**InstructCoder Benchmark with Speculative Decoding**
+### InstructCoder Benchmark with Speculative Decoding
 ``` bash
 VLLM_WORKER_MULTIPROC_METHOD=spawn \
@@ -349,15 +354,15 @@ vllm bench throughput \
    "prompt_lookup_min": 2}'
 ```
-```
+```text
 Throughput: 104.77 requests/s, 23836.22 total tokens/s, 10477.10 output tokens/s
 Total num prompt tokens:  261136
 Total num output tokens:  204800
 ```
-**Other HuggingFaceDataset Examples**
+### Other HuggingFaceDataset Examples
-**`lmms-lab/LLaVA-OneVision-Data`**
+`lmms-lab/LLaVA-OneVision-Data`:
 ```bash
 vllm bench throughput \
@@ -370,7 +375,7 @@ vllm bench throughput \
  --num-prompts 10
 ```
-**`Aeala/ShareGPT_Vicuna_unfiltered`**
+`Aeala/ShareGPT_Vicuna_unfiltered`:
 ```bash
 vllm bench throughput \
@@ -382,7 +387,7 @@ vllm bench throughput \
  --num-prompts 10
 ```
-**`AI-MO/aimo-validation-aime`**
+`AI-MO/aimo-validation-aime`:
 ```bash
 vllm bench throughput \
@@ -394,7 +399,7 @@ vllm bench throughput \
  --num-prompts 10
 ```
-**Benchmark with LoRA Adapters**
+Benchmark with LoRA adapters:
 ``` bash
 # download dataset
@@ -413,20 +418,22 @@ vllm bench throughput \
 </details>
+## 🛠️ Example - Structured Output Benchmark
 <details>
-<summary><b>🛠️ Example - Structured Output Benchmark</b></summary>
+<summary>Show more</summary>
 <br/>
 Benchmark the performance of structured output generation (JSON, grammar, regex).
-**Server Setup**
+### Server Setup
 ```bash
 vllm serve NousResearch/Hermes-3-Llama-3.1-8B --disable-log-requests
 ```
-**JSON Schema Benchmark**
+### JSON Schema Benchmark
 ```bash
 python3 benchmarks/benchmark_serving_structured_output.py \
@@ -438,7 +445,7 @@ python3 benchmarks/benchmark_serving_structured_output.py \
  --num-prompts 1000
 ```
-**Grammar-based Generation Benchmark**
+### Grammar-based Generation Benchmark
 ```bash
 python3 benchmarks/benchmark_serving_structured_output.py \
@@ -450,7 +457,7 @@ python3 benchmarks/benchmark_serving_structured_output.py \
  --num-prompts 1000
 ```
-**Regex-based Generation Benchmark**
+### Regex-based Generation Benchmark
 ```bash
 python3 benchmarks/benchmark_serving_structured_output.py \
@@ -461,7 +468,7 @@ python3 benchmarks/benchmark_serving_structured_output.py \
  --num-prompts 1000
 ```
-**Choice-based Generation Benchmark**
+### Choice-based Generation Benchmark
 ```bash
 python3 benchmarks/benchmark_serving_structured_output.py \
@@ -472,7 +479,7 @@ python3 benchmarks/benchmark_serving_structured_output.py \
  --num-prompts 1000
 ```
-**XGrammar Benchmark Dataset**
+### XGrammar Benchmark Dataset
 ```bash
 python3 benchmarks/benchmark_serving_structured_output.py \
@@ -485,14 +492,16 @@ python3 benchmarks/benchmark_serving_structured_output.py \
 </details>
+## 📚 Example - Long Document QA Benchmark
 <details>
-<summary><b>📚 Example - Long Document QA Benchmark</b></summary>
+<summary>Show more</summary>
 <br/>
 Benchmark the performance of long document question-answering with prefix caching.
-**Basic Long Document QA Test**
+### Basic Long Document QA Test
 ```bash
 python3 benchmarks/benchmark_long_document_qa_throughput.py \
@@ -504,7 +513,7 @@ python3 benchmarks/benchmark_long_document_qa_throughput.py \
  --repeat-count 5
 ```
-**Different Repeat Modes**
+### Different Repeat Modes
 ```bash
 # Random mode (default) - shuffle prompts randomly
@@ -537,14 +546,16 @@ python3 benchmarks/benchmark_long_document_qa_throughput.py \
 </details>
+## 🗂️ Example - Prefix Caching Benchmark
 <details>
-<summary><b>🗂️ Example - Prefix Caching Benchmark</b></summary>
+<summary>Show more</summary>
 <br/>
 Benchmark the efficiency of automatic prefix caching.
-**Fixed Prompt with Prefix Caching**
+### Fixed Prompt with Prefix Caching
 ```bash
 python3 benchmarks/benchmark_prefix_caching.py \
@@ -555,7 +566,7 @@ python3 benchmarks/benchmark_prefix_caching.py \
  --input-length-range 128:256
 ```
-**ShareGPT Dataset with Prefix Caching**
+### ShareGPT Dataset with Prefix Caching
 ```bash
 # download dataset
@@ -572,14 +583,16 @@ python3 benchmarks/benchmark_prefix_caching.py \
 </details>
+## ⚡ Example - Request Prioritization Benchmark
 <details>
-<summary><b>⚡ Example - Request Prioritization Benchmark</b></summary>
+<summary>Show more</summary>
 <br/>
 Benchmark the performance of request prioritization in vLLM.
-**Basic Prioritization Test**
+### Basic Prioritization Test
 ```bash
 python3 benchmarks/benchmark_prioritization.py \
@@ -590,7 +603,7 @@ python3 benchmarks/benchmark_prioritization.py \
  --scheduling-policy priority
 ```
-**Multiple Sequences per Prompt**
+### Multiple Sequences per Prompt
 ```bash
 python3 benchmarks/benchmark_prioritization.py \

--- a/benchmarks/auto_tune/README.md
+++ b/benchmarks/auto_tune/README.md
@@ -3,6 +3,7 @@
 This script automates the process of finding the optimal server parameter combination (`max-num-seqs` and `max-num-batched-tokens`) to maximize throughput for a vLLM server. It also supports additional constraints such as E2E latency and prefix cache hit rate.
 ## Table of Contents
 - [Prerequisites](#prerequisites)
 - [Configuration](#configuration)
 - [How to Run](#how-to-run)
@@ -52,7 +53,7 @@ You must set the following variables at the top of the script before execution.
 1. **Configure**: Edit the script and set the variables in the [Configuration](#configuration) section.
 2. **Execute**: Run the script. Since the process can take a long time, it is highly recommended to use a terminal multiplexer like `tmux` or `screen` to prevent the script from stopping if your connection is lost.
-```
+```bash
 cd <FOLDER_OF_THIS_SCRIPT>
 bash auto_tune.sh
 ```
@@ -64,6 +65,7 @@ bash auto_tune.sh
 Here are a few examples of how to configure the script for different goals:
 ### 1. Maximize Throughput (No Latency Constraint)
 - **Goal**: Find the best `max-num-seqs` and `max-num-batched-tokens` to get the highest possible throughput for 1800 input tokens and 20 output tokens.
 - **Configuration**:
@@ -76,6 +78,7 @@ MAX_LATENCY_ALLOWED_MS=100000000000 # A very large number
 ```
 #### 2. Maximize Throughput with a Latency Requirement
 - **Goal**: Find the best server parameters when P99 end-to-end latency must be below 500ms.
 - **Configuration**:
@@ -88,6 +91,7 @@ MAX_LATENCY_ALLOWED_MS=500
 ```
 #### 3. Maximize Throughput with Prefix Caching and Latency Requirements
 - **Goal**: Find the best server parameters assuming a 60% prefix cache hit rate and a latency requirement of 500ms.
 - **Configuration**:
@@ -109,7 +113,7 @@ After the script finishes, you will find the results in a new, timestamped direc
 - **Final Result Summary**: A file named `result.txt` is created in the log directory. It contains a summary of each tested combination and concludes with the overall best parameters found.
-```
+```text
 # Example result.txt content
 hash:a1b2c3d4...
 max_num_seqs: 128, max_num_batched_tokens: 2048, request_rate: 10.0, e2el: 450.5, throughput: 9.8, goodput: 9.8

--- a/benchmarks/kernels/deepgemm/README.md
+++ b/benchmarks/kernels/deepgemm/README.md
@@ -8,7 +8,7 @@ Currently this just includes dense GEMMs and only works on Hopper GPUs.
 You need to install vLLM in your usual fashion, then install DeepGEMM from source in its own directory:
-```
+```bash
 git clone --recursive https://github.com/deepseek-ai/DeepGEMM
 cd DeepGEMM
 python setup.py install
@@ -17,7 +17,7 @@ uv pip install -e .
 ## Usage
-```
+```console
 python benchmark_fp8_block_dense_gemm.py
 INFO 02-26 21:55:13 [__init__.py:207] Automatically detected platform cuda.
 ===== STARTING FP8 GEMM BENCHMARK =====

--- a/csrc/quantization/cutlass_w8a8/Epilogues.md
+++ b/csrc/quantization/cutlass_w8a8/Epilogues.md
@@ -86,6 +86,7 @@ D = s_a s_b \widehat A \widehat B
 ```
 Epilogue parameters:
 - `scale_a` is the scale for activations, can be per-tensor (scalar) or per-token (column-vector).
 - `scale_b` is the scale for weights, can be per-tensor (scalar) or per-channel (row-vector).
@@ -135,7 +136,7 @@ That is precomputed and stored in `azp_with_adj` as a row-vector.
 Epilogue parameters:
 - `scale_a` is the scale for activations, can be per-tensor (scalar) or per-token (column-vector).
-  - Generally this will be per-tensor as the zero-points are per-tensor.
+    - Generally this will be per-tensor as the zero-points are per-tensor.
 - `scale_b` is the scale for weights, can be per-tensor (scalar) or per-channel (row-vector).
 - `azp_with_adj` is the precomputed zero-point term ($` z_a J_a \widehat B `$), is per-channel (row-vector).
 - `bias` is the bias, is always per-channel (row-vector).
@@ -152,7 +153,7 @@ That means the zero-point term $` z_a J_a \widehat B `$ becomes an outer product
 Epilogue parameters:
 - `scale_a` is the scale for activations, can be per-tensor (scalar) or per-token (column-vector).
-  - Generally this will be per-token as the zero-points are per-token.
+    - Generally this will be per-token as the zero-points are per-token.
 - `scale_b` is the scale for weights, can be per-tensor (scalar) or per-channel (row-vector).
 - `azp_adj` is the precomputed zero-point adjustment term ($` \mathbf 1 \widehat B `$), is per-channel (row-vector).
 - `azp` is the zero-point (`z_a`), is per-token (column-vector).

--- a/docs/cli/README.md
+++ b/docs/cli/README.md
@@ -6,13 +6,13 @@ toc_depth: 4
 The vllm command-line tool is used to run and manage vLLM models. You can start by viewing the help message with:
-```
+```bash
 vllm --help
 ```
 Available Commands:
-```
+```bash
 vllm {chat,complete,serve,bench,collect-env,run-batch}
 ```

--- a/docs/configuration/tpu.md
+++ b/docs/configuration/tpu.md
@@ -40,6 +40,7 @@ Although the first compilation can take some time, for all subsequent server lau
 Use `VLLM_XLA_CACHE_PATH` environment variable to write to shareable storage for future deployed nodes (like when using autoscaling).
 #### Reducing compilation time
 This initial compilation time ranges significantly and is impacted by many of the arguments discussed in this optimization doc. Factors that influence the length of time to compile are things like model size and `--max-num-batch-tokens`. Other arguments you can tune are things like `VLLM_TPU_MOST_MODEL_LEN`.
 ### Optimize based on your data
@@ -71,12 +72,15 @@ The fewer tokens we pad, the less unnecessary computation TPU does, the better p
 However, you need to be careful to choose the padding gap. If the gap is too small, it means the number of buckets is large, leading to increased warmup (precompile) time and higher memory to store the compiled graph. Too many compilaed graphs may lead to HBM OOM. Conversely, an overly large gap yields no performance improvement compared to the default exponential padding.
-**If possible, use the precision that matches the chip’s hardware acceleration**
+#### Quantization
+If possible, use the precision that matches the chip’s hardware acceleration:
 - v5e has int4/int8 hardware acceleration in the MXU
 - v6e has int4/int8 hardware acceleration in the MXU
-Supported quantized formats and features in vLLM on TPU [Jul '25]
+Supported quantized formats and features in vLLM on TPU [Jul '25]:
 - INT8 W8A8
 - INT8 W8A16
 - FP8 KV cache
@@ -84,11 +88,13 @@ Supported quantized formats and features in vLLM on TPU [Jul '25]
 - [WIP] AWQ
 - [WIP] FP4 W4A8
-**Don't set TP to be less than the number of chips on a single-host deployment**
+#### Parallelization
+Don't set TP to be less than the number of chips on a single-host deployment.
 Although it’s common to do this with GPUs, don't try to fragment 2 or 8 different workloads across 8 chips on a single host. If you need 1 or 4 chips, just create an instance with 1 or 4 chips (these are partial-host machine types).
-### Tune your workloads!
+### Tune your workloads
 Although we try to have great default configs, we strongly recommend you check out the [vLLM auto-tuner](../../benchmarks/auto_tune/README.md) to optimize your workloads for your use case.
@@ -99,6 +105,7 @@ Although we try to have great default configs, we strongly recommend you check o
 The auto-tuner provides a profile of optimized configurations as its final step. However, interpreting this profile can be challenging for new users. We plan to expand this section in the future with more detailed guidance. In the meantime, you can learn how to collect a TPU profile using vLLM's native profiling tools [here](../examples/offline_inference/profiling_tpu.md). This profile can provide valuable insights into your workload's performance.
 #### SPMD
 More details to come.
 **Want us to cover something that isn't listed here? Open up an issue please and cite this doc. We'd love to hear your questions or tips.**
--- a/docs/contributing/ci/failures.md
+++ b/docs/contributing/ci/failures.md
@@ -20,19 +20,19 @@ the failure?
 - **Use this title format:**
-    ```
+    ```text
    [CI Failure]: failing-test-job - regex/matching/failing:test
    ```
 - **For the environment field:**
-    ```
+    ```text
- Still failing on main as of commit abcdef123
+    Still failing on main as of commit abcdef123
    ```
 - **In the description, include failing tests:**
-    ```
+    ```text
    FAILED failing/test.py:failing_test1 - Failure description
    FAILED failing/test.py:failing_test2 - Failure description
    https://github.com/orgs/vllm-project/projects/20

--- a/docs/contributing/ci/update_pytorch_version.md
+++ b/docs/contributing/ci/update_pytorch_version.md
@@ -106,6 +106,7 @@ releases (which would take too much time), they can be built from
 source to unblock the update process.
 ### FlashInfer
 Here is how to build and install it from source with `torch2.7.0+cu128` in vLLM [Dockerfile](https://github.com/vllm-project/vllm/blob/27bebcd89792d5c4b08af7a65095759526f2f9e1/docker/Dockerfile#L259-L271):
 ```bash
@@ -121,6 +122,7 @@ public location for immediate installation, such as [this FlashInfer wheel link]
 team if you want to get the package published there.
 ### xFormers
 Similar to FlashInfer, here is how to build and install xFormers from source:
 ```bash
@@ -138,7 +140,7 @@ uv pip install --system \
 ### causal-conv1d
-```
+```bash
 uv pip install 'git+https://github.com/Dao-AILab/causal-conv1d@v1.5.0.post8'
 ```

--- a/docs/contributing/deprecation_policy.md
+++ b/docs/contributing/deprecation_policy.md
@@ -31,7 +31,7 @@ Features that fall under this policy include (at a minimum) the following:
 The deprecation process consists of several clearly defined stages that span
 multiple Y releases:
-**1. Deprecated (Still On By Default)**
+### 1. Deprecated (Still On By Default)
 - **Action**: Feature is marked as deprecated.
 - **Timeline**: A removal version is explicitly stated in the deprecation
@@ -46,7 +46,7 @@ warning (e.g., "This will be removed in v0.10.0").
    - GitHub Issue (RFC) for feedback
    - Documentation and use of the `@typing_extensions.deprecated` decorator for Python APIs
-**2.Deprecated (Off By Default)**
+### 2.Deprecated (Off By Default)
 - **Action**: Feature is disabled by default, but can still be re-enabled via a
 CLI flag or environment variable. Feature throws an error when used without
@@ -55,7 +55,7 @@ re-enabling.
 while signaling imminent removal. Ensures any remaining usage is clearly
 surfaced and blocks silent breakage before full removal.
-**3. Removed**
+### 3. Removed
 - **Action**: Feature is completely removed from the codebase.
 - **Note**: Only features that have passed through the previous deprecation

--- a/docs/contributing/profiling.md
+++ b/docs/contributing/profiling.md
@@ -112,13 +112,13 @@ vllm bench serve \
 In practice, you should set the `--duration` argument to a large value. Whenever you want the server to stop profiling, run:
-```
+```bash
 nsys sessions list
 ```
 to get the session id in the form of `profile-XXXXX`, then run:
-```
+```bash
 nsys stop --session=profile-XXXXX
 ```

--- a/docs/contributing/vulnerability_management.md
+++ b/docs/contributing/vulnerability_management.md
@@ -32,9 +32,9 @@ We prefer to keep all vulnerability-related communication on the security report
 on GitHub. However, if you need to contact the VMT directly for an urgent issue,
 you may contact the following individuals:
- Simon Mo - simon.mo@hey.com
+- Simon Mo - <simon.mo@hey.com>
- Russell Bryant - rbryant@redhat.com
+- Russell Bryant - <rbryant@redhat.com>
- Huzaifa Sidhpurwala - huzaifas@redhat.com
+- Huzaifa Sidhpurwala - <huzaifas@redhat.com>
 ## Slack Discussion