[CI/Build] Add markdown linter (#11857)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>

[CI/Build] Add markdown linter (#11857)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
43f3d9e6 · Rafael Vasquez · GitHub · b25cfab9 · 43f3d9e6 · 43f3d9e6
Unverified Commit 43f3d9e6 authored Jan 12, 2025 by Rafael Vasquez Committed by GitHub Jan 12, 2025
9 changed files
--- a/docs/source/serving/metrics.md
+++ b/docs/source/serving/metrics.md
@@ -7,7 +7,7 @@ OpenAI compatible API server.
 You can start the server using Python, or using [Docker](#deployment-docker):
 ```console
-$ vllm serve unsloth/Llama-3.2-1B-Instruct
+vllm serve unsloth/Llama-3.2-1B-Instruct
 ```
 Then query the endpoint to get the latest metrics from the server:

--- a/docs/source/serving/multimodal_inputs.md
+++ b/docs/source/serving/multimodal_inputs.md
@@ -303,6 +303,7 @@ vllm serve llava-hf/llava-onevision-qwen2-0.5b-ov-hf --task generate --max-model
 ```
 Then, you can use the OpenAI client as follows:
 ```python
 from openai import OpenAI

--- a/docs/source/serving/offline_inference.md
+++ b/docs/source/serving/offline_inference.md
@@ -64,7 +64,7 @@ Dynamic quantization is also supported via the `quantization` option -- see [her
 #### Context length and batch size
-You can further reduce memory usage by limit the context length of the model (`max_model_len` option)
+You can further reduce memory usage by limiting the context length of the model (`max_model_len` option)
 and the maximum batch size (`max_num_seqs` option).
 ```python

--- a/docs/source/serving/openai_compatible_server.md
+++ b/docs/source/serving/openai_compatible_server.md
@@ -5,11 +5,13 @@
 vLLM provides an HTTP server that implements OpenAI's [Completions API](https://platform.openai.com/docs/api-reference/completions), [Chat API](https://platform.openai.com/docs/api-reference/chat), and more!
 You can start the server via the [`vllm serve`](#vllm-serve) command, or through [Docker](#deployment-docker):
 ```bash
 vllm serve NousResearch/Meta-Llama-3-8B-Instruct --dtype auto --api-key token-abc123
 ```
 To call the server, you can use the [official OpenAI Python client](https://github.com/openai/openai-python), or any other HTTP client.
 ```python
 from openai import OpenAI
 client = OpenAI(
@@ -50,6 +52,7 @@ In addition, we have the following custom APIs:
  - Only applicable to [cross-encoder models](../models/pooling_models.md) (`--task score`).
 (chat-template)=
 ## Chat Template
 In order for the language model to support chat protocol, vLLM requires the model to include
@@ -71,6 +74,7 @@ vLLM community provides a set of chat templates for popular models. You can find
 With the inclusion of multi-modal chat APIs, the OpenAI spec now accepts chat messages in a new format which specifies
 both a `type` and a `text` field. An example is provided below:
 ```python
 completion = client.chat.completions.create(
  model="NousResearch/Meta-Llama-3-8B-Instruct",
@@ -80,7 +84,7 @@ completion = client.chat.completions.create(
 )
 ```
-Most chat templates for LLMs expect the `content` field to be a string, but there are some newer models like 
+Most chat templates for LLMs expect the `content` field to be a string, but there are some newer models like
 `meta-llama/Llama-Guard-3-1B` that expect the content to be formatted according to the OpenAI schema in the
 request. vLLM provides best-effort support to detect this automatically, which is logged as a string like
 *"Detected the chat template content format to be..."*, and internally converts incoming requests to match
@@ -115,12 +119,12 @@ completion = client.chat.completions.create(
 ## Extra HTTP Headers
 Only `X-Request-Id` HTTP request header is supported for now. It can be enabled
-with `--enable-request-id-headers`. 
+with `--enable-request-id-headers`.
 > Note that enablement of the headers can impact performance significantly at high QPS
 > rates. We recommend implementing HTTP headers at the router level (e.g. via Istio),
 > rather than within the vLLM layer for this reason.
-> See https://github.com/vllm-project/vllm/pull/11529 for more details.
+> See [this PR](https://github.com/vllm-project/vllm/pull/11529) for more details.
 ```python
 completion = client.chat.completions.create(
@@ -147,6 +151,7 @@ print(completion._request_id)
 ## CLI Reference
 (vllm-serve)=
 ### `vllm serve`
 The `vllm serve` command is used to launch the OpenAI-compatible server.
@@ -175,7 +180,7 @@ uvicorn-log-level: "info"
 To use the above config file:
 ```bash
-$ vllm serve SOME_MODEL --config config.yaml
+vllm serve SOME_MODEL --config config.yaml
 ```
 ```{note}
@@ -186,6 +191,7 @@ The order of priorities is `command line > config file values > defaults`.
 ## API Reference
 (completions-api)=
 ### Completions API
 Our Completions API is compatible with [OpenAI's Completions API](https://platform.openai.com/docs/api-reference/completions);
@@ -212,6 +218,7 @@ The following extra parameters are supported:
 ```
 (chat-api)=
 ### Chat API
 Our Chat API is compatible with [OpenAI's Chat Completions API](https://platform.openai.com/docs/api-reference/chat);
@@ -243,6 +250,7 @@ The following extra parameters are supported:
 ```
 (embeddings-api)=
 ### Embeddings API
 Our Embeddings API is compatible with [OpenAI's Embeddings API](https://platform.openai.com/docs/api-reference/embeddings);
@@ -284,6 +292,7 @@ For chat-like input (i.e. if `messages` is passed), these extra parameters are s
 ```
 (tokenizer-api)=
 ### Tokenizer API
 Our Tokenizer API is a simple wrapper over [HuggingFace-style tokenizers](https://huggingface.co/docs/transformers/en/main_classes/tokenizer).
@@ -293,6 +302,7 @@ It consists of two endpoints:
 - `/detokenize` corresponds to calling `tokenizer.decode()`.
 (pooling-api)=
 ### Pooling API
 Our Pooling API encodes input prompts using a [pooling model](../models/pooling_models.md) and returns the corresponding hidden states.
@@ -302,6 +312,7 @@ The input format is the same as [Embeddings API](#embeddings-api), but the outpu
 Code example: <gh-file:examples/online_serving/openai_pooling_client.py>
 (score-api)=
 ### Score API
 Our Score API applies a cross-encoder model to predict scores for sentence pairs.

--- a/format.sh
+++ b/format.sh
@@ -41,7 +41,7 @@ MYPY_VERSION=$(mypy --version | awk '{print $2}')
 CODESPELL_VERSION=$(codespell --version)
 ISORT_VERSION=$(isort --vn)
 CLANGFORMAT_VERSION=$(clang-format --version | awk '{print $3}')
-SPHINX_LINT_VERSION=$(sphinx-lint --version | awk '{print $2}')
+PYMARKDOWNLNT_VERSION=$(pymarkdownlnt version | awk '{print $1}')
 # # params: tool name, tool version, required version
 tool_version_check() {
@@ -58,7 +58,7 @@ tool_version_check "mypy" "$MYPY_VERSION"
 tool_version_check "isort" "$ISORT_VERSION"
 tool_version_check "codespell" "$CODESPELL_VERSION"
 tool_version_check "clang-format" "$CLANGFORMAT_VERSION"
-tool_version_check "sphinx-lint" "$SPHINX_LINT_VERSION"
+tool_version_check "pymarkdownlnt" "$PYMARKDOWNLNT_VERSION"
 YAPF_FLAGS=(
    '--recursive'
@@ -316,6 +316,6 @@ else
    echo "✨🎉 Format check passed! Congratulations! 🎉✨"
 fi
-echo 'vLLM sphinx-lint:'
+echo 'vLLM doc-lint:'
-tools/sphinx-lint.sh
+tools/doc-lint.sh
-echo 'vLLM sphinx-lint: Done'
+echo 'vLLM doc-lint: Done'
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -101,3 +101,9 @@ markers = [
    "skip_v1: do not run this test with v1",
    "optional: optional tests that are automatically skipped, include --optional to run them",
 ]
+[tool.pymarkdown]
+plugins.md013.enabled = false # line-length
+plugins.md041.enabled = false # first-line-h1
+plugins.md033.enabled = false # inline-html
+plugins.md024.allow_different_nesting = true # no-duplicate-headers
--- a/requirements-lint.txt
+++ b/requirements-lint.txt
@@ -6,7 +6,7 @@ ruff==0.6.5
 codespell==2.3.0
 isort==5.13.2
 clang-format==18.1.5
-sphinx-lint==1.0.0
+pymarkdownlnt==0.9.26
 # type checking
 mypy==1.11.1

--- a/tools/doc-lint.sh
+++ b/tools/doc-lint.sh
+#!/bin/bash
+pymarkdownlnt scan docs -r
--- a/tools/sphinx-lint.sh
+++ b/tools/sphinx-lint.sh
-#!/bin/bash
-sphinx-lint --disable trailing-whitespace,missing-final-newline docs