Allow `markdownlint` to run locally (#36398)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

Allow `markdownlint` to run locally (#36398)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
a0f44bb6 · Harry Mellor · GitHub · fde4771b · a0f44bb6 · a0f44bb6
Unverified Commit a0f44bb6 authored Mar 09, 2026 by Harry Mellor Committed by GitHub Mar 08, 2026
7 changed files
--- a/docs/serving/openai_compatible_server.md
+++ b/docs/serving/openai_compatible_server.md
@@ -596,7 +596,7 @@ Audio must be sent as base64-encoded PCM16 audio at 16kHz sample rate, mono chan
 #### Client → Server Events
 | Event | Description |
-|-------|-------------|
+| ----- | ----------- |
 | `input_audio_buffer.append` | Send base64-encoded audio chunk: `{"type": "input_audio_buffer.append", "audio": "<base64>"}` |
 | `input_audio_buffer.commit` | Trigger transcription processing or end: `{"type": "input_audio_buffer.commit", "final": bool}` |
 | `session.update` | Configure session: `{"type": "session.update", "model": "model-name"}` |
@@ -604,7 +604,7 @@ Audio must be sent as base64-encoded PCM16 audio at 16kHz sample rate, mono chan
 #### Server → Client Events
 | Event | Description |
-|-------|-------------|
+| ----- | ----------- |
 | `session.created` | Connection established with session ID and timestamp |
 | `transcription.delta` | Incremental transcription text: `{"type": "transcription.delta", "delta": "text"}` |
 | `transcription.done` | Final transcription with usage stats |

--- a/docs/usage/v1_guide.md
+++ b/docs/usage/v1_guide.md
@@ -83,13 +83,13 @@ based on assigned priority, with FCFS as a tie-breaker), configurable via the
 ### Hardware
-| Hardware         | Status                                        |
+| Hardware      | Status          |
-|------------------|-----------------------------------------------|
+| --------------| --------------- |
-| **NVIDIA**       | <nobr>🟢</nobr>                               |
+| **NVIDIA**    | <nobr>🟢</nobr> |
-| **AMD**          | <nobr>🟢</nobr>                               |
+| **AMD**       | <nobr>🟢</nobr> |
-| **INTEL GPU**    | <nobr>🟢</nobr>                               |
+| **INTEL GPU** | <nobr>🟢</nobr> |
-| **TPU**          | <nobr>🟢</nobr>                               |
+| **TPU**       | <nobr>🟢</nobr> |
-| **CPU**          | <nobr>🟢</nobr>                               |
+| **CPU**       | <nobr>🟢</nobr> |
 !!! note
@@ -104,13 +104,13 @@ based on assigned priority, with FCFS as a tie-breaker), configurable via the
 ### Models
-| Model Type                  | Status                                                                  |
+| Model Type                 | Status                                  |
-|-----------------------------|-------------------------------------------------------------------------|
+| -------------------------- | --------------------------------------- |
-| **Decoder-only Models**     | <nobr>🟢</nobr>                                                         |
+| **Decoder-only Models**    | <nobr>🟢</nobr>                         |
-| **Encoder-Decoder Models**  | <nobr>🟢 (Whisper), 🔴 (Others) </nobr>                                |
+| **Encoder-Decoder Models** | <nobr>🟢 (Whisper), 🔴 (Others) </nobr> |
-| **Pooling Models**          | <nobr>🟢</nobr>                                                         |
+| **Pooling Models**         | <nobr>🟢</nobr>                         |
-| **Mamba Models**            | <nobr>🟢</nobr>                                                         |
+| **Mamba Models**           | <nobr>🟢</nobr>                         |
-| **Multimodal Models**       | <nobr>🟢</nobr>                                                         |
+| **Multimodal Models**      | <nobr>🟢</nobr>                         |
 See below for the status of models that are not yet supported or have more features planned in V1.
@@ -145,7 +145,7 @@ following a similar pattern by implementing support through the [plugin system](
 ### Features
 | Feature                                     | Status                                                                            |
-|---------------------------------------------|-----------------------------------------------------------------------------------|
+| ------------------------------------------- | --------------------------------------------------------------------------------- |
 | **Prefix Caching**                          | <nobr>🟢 Functional</nobr>                                                        |
 | **Chunked Prefill**                         | <nobr>🟢 Functional</nobr>                                                        |
 | **LoRA**                                    | <nobr>🟢 Functional</nobr>                                                        |

--- a/examples/online_serving/dashboards/README.md
+++ b/examples/online_serving/dashboards/README.md
@@ -34,7 +34,7 @@ deployment methods:
 Both platforms provide equivalent monitoring capabilities:
 | Dashboard | Description |
-|-----------|-------------|
+| --------- | ----------- |
 | **Performance Statistics** | Tracks latency, throughput, and performance metrics |
 | **Query Statistics** | Monitors request volume, query performance, and KPIs |

--- a/examples/online_serving/disaggregated_encoder/README.md
+++ b/examples/online_serving/disaggregated_encoder/README.md
@@ -95,7 +95,7 @@ If you enable prefill instance (`--prefill-servers-urls` not disabled), you will
 ## Proxy Instance Flags (`disagg_epd_proxy.py`)
 | Flag | Description |
-|------|-------------|
+| ---- | ----------- |
 | `--encode-servers-urls` | Comma-separated list of encoder endpoints. Every multimodal item extracted from the request is fanned out to one of these URLs in a round-robin fashion. |
 | `--prefill-servers-urls` | Comma-separated list of prefill endpoints. Set to `disable`, `none`, or `""` to skip the dedicated prefill phase and run E+PD (encoder + combined prefill/decode). |
 | `--decode-servers-urls` | Comma-separated list of decode endpoints. Non-stream and stream paths both round-robin over this list. |

--- a/examples/pooling/embed/openai_embedding_long_text/README.md
+++ b/examples/pooling/embed/openai_embedding_long_text/README.md
@@ -34,7 +34,7 @@ python client.py
 ## 📁 Files
 | File | Description |
-|------|-------------|
+| ---- | ----------- |
 | `service.sh` | Server startup script with chunked processing enabled |
 | `client.py` | Comprehensive test client for long text embedding |
@@ -61,7 +61,7 @@ The key parameters for chunked processing are in the `--pooler-config`:
 Chunked processing uses **MEAN aggregation** for cross-chunk combination when input exceeds the model's native maximum length:
 | Component | Behavior | Description |
-|-----------|----------|-------------|
+| --------- | -------- | ----------- |
 | **Within chunks** | Model's native pooling | Uses the model's configured pooling strategy |
 | **Cross-chunk aggregation** | Always MEAN | Weighted averaging based on chunk token counts |
 | **Performance** | Optimal | All chunks processed for complete semantic coverage |
@@ -69,7 +69,7 @@ Chunked processing uses **MEAN aggregation** for cross-chunk combination when in
 ### Environment Variables
 | Variable | Default | Description |
-|----------|---------|-------------|
+| -------- | ------- | ----------- |
 | `MODEL_NAME` | `intfloat/multilingual-e5-large` | Embedding model to use (supports multiple models) |
 | `PORT` | `31090` | Server port |
 | `GPU_COUNT` | `1` | Number of GPUs to use |
@@ -106,7 +106,7 @@ With `MAX_EMBED_LEN=3072000`, you can process:
 ### Chunked Processing Performance
 | Aspect | Behavior | Performance |
-|--------|----------|-------------|
+| ------ | -------- | ----------- |
 | **Chunk Processing** | All chunks processed with native pooling | Consistent with input length |
 | **Cross-chunk Aggregation** | MEAN weighted averaging | Minimal overhead |
 | **Memory Usage** | Proportional to number of chunks | Moderate, scalable |

--- a/tools/pre_commit/generate_attention_backend_docs.py
+++ b/tools/pre_commit/generate_attention_backend_docs.py
@@ -1153,11 +1153,11 @@ def _render_table(
 ) -> list[str]:
    """Render a markdown table from column specs and backend data."""
    header = "| " + " | ".join(name for name, _ in columns) + " |"
-    sep = "|" + "|".join("-" * (len(name) + 2) for name, _ in columns) + "|"
+    sep = "| " + " | ".join("-" * len(name) for name, _ in columns) + " |"
    lines = [header, sep]
    for info in sorted(backends, key=_sort_key):
        row = "| " + " | ".join(fmt(info) for _, fmt in columns) + " |"
-        lines.append(row)
+        lines.append(row.replace("  ", " "))
    return lines
@@ -1268,7 +1268,7 @@ def _priority_table(title: str, backends: list[str]) -> list[str]:
        f"**{title}:**",
        "",
        "| Priority | Backend |",
-        "|----------|---------|",
+        "| -------- | ------- |",
        *[f"| {i} | `{b}` |" for i, b in enumerate(backends, 1)],
        "",
    ]
@@ -1317,7 +1317,7 @@ def generate_legend() -> str:
    return """## Legend
 | Column | Description |
-|--------|-------------|
+| ------ | ----------- |
 | **Dtypes** | Supported model data types (fp16, bf16, fp32) |
 | **KV Dtypes** | Supported KV cache data types (`auto`, `fp8`, `fp8_e4m3`, etc.) |
 | **Block Sizes** | Supported KV cache block sizes (%N means multiples of N) |
@@ -1348,7 +1348,7 @@ def generate_mla_section(
        "configuration.",
        "",
        "| Backend | Description | Compute Cap. | Enable | Disable | Notes |",
-        "|---------|-------------|--------------|--------|---------|-------|",
+        "| ------- | ----------- | ------------ | ------ | ------- | ----- |",
    ]
    for backend in prefill_backends:
@@ -1360,7 +1360,7 @@ def generate_mla_section(
            backend["disable"],
            backend.get("notes", ""),
        )
-        lines.append(row)
+        lines.append(row.replace("  ", " "))
    lines.extend(
        [

--- a/vllm/lora/ops/triton_ops/README_TUNING.md
+++ b/vllm/lora/ops/triton_ops/README_TUNING.md
@@ -43,14 +43,14 @@ Multi-lora shrink/expand Triton kernel tuning follows a similar methodology from
 ### File Naming
-| Kernel Type               | File Name Template                          | Example                                     |
+| Kernel Type               | File Name Template                          | Example                                      |
-|---------------------------|--------------------------------------------|---------------------------------------------|
+| ------------------------- | ------------------------------------------- | -------------------------------------------- |
-| shrink                    | `{gpu_name}_SHRINK.json`                   | `NVIDIA_H200_SHRINK.json`                  |
+| shrink                    | `{gpu_name}_SHRINK.json`                    | `NVIDIA_H200_SHRINK.json`                    |
-| expand                    | `{gpu_name}_EXPAND_{add_input}.json`       | `NVIDIA_H200_EXPAND_TRUE.json`             |
+| expand                    | `{gpu_name}_EXPAND_{add_input}.json`        | `NVIDIA_H200_EXPAND_TRUE.json`               |
 | fused_moe_lora_w13_shrink | `{gpu_name}_FUSED_MOE_LORA_W13_SHRINK.json` | `NVIDIA_H200_FUSED_MOE_LORA_W13_SHRINK.json` |
 | fused_moe_lora_w13_expand | `{gpu_name}_FUSED_MOE_LORA_W13_EXPAND.json` | `NVIDIA_H200_FUSED_MOE_LORA_W13_EXPAND.json` |
-| fused_moe_lora_w2_shrink  | `{gpu_name}_FUSED_MOE_LORA_W2_SHRINK.json`  | `NVIDIA_H200_FUSED_MOE_LORA_W2_SHRINK.json` |
+| fused_moe_lora_w2_shrink  | `{gpu_name}_FUSED_MOE_LORA_W2_SHRINK.json`  | `NVIDIA_H200_FUSED_MOE_LORA_W2_SHRINK.json`  |
-| fused_moe_lora_w2_expand  | `{gpu_name}_FUSED_MOE_LORA_W2_EXPAND.json`  | `NVIDIA_H200_FUSED_MOE_LORA_W2_EXPAND.json` |
+| fused_moe_lora_w2_expand  | `{gpu_name}_FUSED_MOE_LORA_W2_EXPAND.json`  | `NVIDIA_H200_FUSED_MOE_LORA_W2_EXPAND.json`  |
 The `gpu_name` can be automatically detected by calling `torch.cuda.get_device_name()`.