Migrate docs from Sphinx to MkDocs (#18145)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

Migrate docs from Sphinx to MkDocs (#18145)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
a1fe24d9 · Harry Mellor · GitHub · d0bc2f81 · a1fe24d9 · a1fe24d9
Unverified Commit a1fe24d9 authored May 23, 2025 by Harry Mellor Committed by GitHub May 23, 2025
20 changed files
--- a/.buildkite/test-pipeline.yaml
+++ b/.buildkite/test-pipeline.yaml
@@ -33,14 +33,13 @@ steps:
 - label: Documentation Build # 2min
  mirror_hardwares: [amdexperimental]
-  working_dir: "/vllm-workspace/test_docs/docs"
+  working_dir: "/vllm-workspace/test_docs"
  fast_check: true
  no_gpu: True
  commands:
-  - pip install -r ../../requirements/docs.txt
+  - pip install -r ../requirements/docs.txt
-  - SPHINXOPTS=\"-W\" make html
+  # TODO: add `--strict` once warnings in docstrings are fixed
-  # Check API reference (if it fails, you may have missing mock imports)
+  - mkdocs build
-  - grep \"sig sig-object py\" build/html/api/vllm/vllm.sampling_params.html
 - label: Async Engine, Inputs, Utils, Worker Test # 24min
  mirror_hardwares: [amdexperimental]

--- a/.gitignore
+++ b/.gitignore
@@ -77,11 +77,6 @@ instance/
 # Scrapy stuff:
 .scrapy
-# Sphinx documentation
-docs/_build/
-docs/source/getting_started/examples/
-docs/source/api/vllm
 # PyBuilder
 .pybuilder/
 target/
@@ -151,6 +146,7 @@ venv.bak/
 # mkdocs documentation
 /site
+docs/getting_started/examples
 # mypy
 .mypy_cache/

--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -39,6 +39,7 @@ repos:
  rev: v0.9.29
  hooks:
  - id: pymarkdown
+    exclude: '.*\.inc\.md'
    args: [fix]
 - repo: https://github.com/rhysd/actionlint
  rev: v1.7.7

--- a/.readthedocs.yaml
+++ b/.readthedocs.yaml
@@ -8,12 +8,8 @@ build:
  tools:
    python: "3.12"
-sphinx:
+mkdocs:
-  configuration: docs/source/conf.py
+  configuration: mkdocs.yaml
-  fail_on_warning: true
-# If using Sphinx, optionally build your docs in additional formats such as PDF
-formats: []
 # Optionally declare the Python requirements required to build your docs
 python:

--- a/docker/Dockerfile
+++ b/docker/Dockerfile
@@ -329,7 +329,9 @@ COPY vllm/v1 /usr/local/lib/python3.12/dist-packages/vllm/v1
 # will not be imported by other tests
 RUN mkdir test_docs
 RUN mv docs test_docs/
+RUN cp -r examples test_docs/
 RUN mv vllm test_docs/
+RUN mv mkdocs.yaml test_docs/
 #################### TEST IMAGE ####################
 #################### OPENAI API SERVER ####################

--- a/docs/.nav.yml
+++ b/docs/.nav.yml
+nav:
+  - Home: 
+    - vLLM: README.md
+    - Getting Started:
+      - getting_started/quickstart.md
+      - getting_started/installation
+    - Examples:
+      - LMCache: getting_started/examples/lmcache
+      - getting_started/examples/offline_inference
+      - getting_started/examples/online_serving
+      - getting_started/examples/other
+    - Roadmap: https://roadmap.vllm.ai
+    - Releases: https://github.com/vllm-project/vllm/releases
+  - User Guide:
+    - Inference and Serving:
+      - serving/offline_inference.md
+      - serving/openai_compatible_server.md
+      - serving/*
+      - serving/integrations
+    - Training: training
+    - Deployment:
+      - deployment/*
+      - deployment/frameworks
+      - deployment/integrations
+    - Performance: performance
+    - Models:
+      - models/supported_models.md
+      - models/generative_models.md
+      - models/pooling_models.md
+      - models/extensions
+    - Features:
+      - features/compatibility_matrix.md
+      - features/*
+      - features/quantization
+    - Other:
+      - getting_started/*
+  - Developer Guide:
+    - contributing/overview.md
+    - glob: contributing/*
+      flatten_single_child_sections: true
+    - contributing/model
+    - Design Documents:
+      - V0: design
+      - V1: design/v1
+  - API Reference:
+    - api/README.md
+    - glob: api/vllm/*
+      preserve_directory_names: true
+  - Community:
+    - community/*
+    - vLLM Blog: https://blog.vllm.ai
--- a/docs/Makefile
+++ b/docs/Makefile
-# Minimal makefile for Sphinx documentation
-#
-# You can set these variables from the command line, and also
-# from the environment for the first two.
-SPHINXOPTS    ?=
-SPHINXBUILD   ?= sphinx-build
-SOURCEDIR     = source
-BUILDDIR      = build
-# Put it first so that "make" without argument is like "make help".
-help:
-	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
-.PHONY: help Makefile
-# Catch-all target: route all unknown targets to Sphinx using the new
-# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
-%: Makefile
-	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
-clean:
-	@$(SPHINXBUILD) -M clean "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
-	rm -rf "$(SOURCEDIR)/getting_started/examples"
-	rm -rf "$(SOURCEDIR)/api/vllm"
--- a/docs/README.md
+++ b/docs/README.md
-# vLLM documents
+# Welcome to vLLM
-## Build the docs
+<figure markdown="span">
+  ![](./assets/logos/vllm-logo-text-light.png){ align="center" alt="vLLM" class="no-scaled-link" width="60%" }
- Make sure in `docs` directory
+</figure>
-```bash
+<p style="text-align:center">
-cd docs
+<strong>Easy, fast, and cheap LLM serving for everyone
-```
+</strong>
+</p>
- Install the dependencies:
+<p style="text-align:center">
-```bash
+<script async defer src="https://buttons.github.io/buttons.js"></script>
-pip install -r ../requirements/docs.txt
+<a class="github-button" href="https://github.com/vllm-project/vllm" data-show-count="true" data-size="large" aria-label="Star">Star</a>
-```
+<a class="github-button" href="https://github.com/vllm-project/vllm/subscription" data-icon="octicon-eye" data-size="large" aria-label="Watch">Watch</a>
+<a class="github-button" href="https://github.com/vllm-project/vllm/fork" data-icon="octicon-repo-forked" data-size="large" aria-label="Fork">Fork</a>
- Clean the previous build (optional but recommended):
+</p>
-```bash
+vLLM is a fast and easy-to-use library for LLM inference and serving.
-make clean
-```
+Originally developed in the [Sky Computing Lab](https://sky.cs.berkeley.edu) at UC Berkeley, vLLM has evolved into a community-driven project with contributions from both academia and industry.
- Generate the HTML documentation:
+vLLM is fast with:
-```bash
+- State-of-the-art serving throughput
-make html
+- Efficient management of attention key and value memory with [**PagedAttention**](https://blog.vllm.ai/2023/06/20/vllm.html)
-```
+- Continuous batching of incoming requests
+- Fast model execution with CUDA/HIP graph
-## Open the docs with your browser
+- Quantization: [GPTQ](https://arxiv.org/abs/2210.17323), [AWQ](https://arxiv.org/abs/2306.00978), INT4, INT8, and FP8
+- Optimized CUDA kernels, including integration with FlashAttention and FlashInfer.
- Serve the documentation locally:
+- Speculative decoding
+- Chunked prefill
-```bash
-python -m http.server -d build/html/
+vLLM is flexible and easy to use with:
-```
+- Seamless integration with popular HuggingFace models
-This will start a local server at http://localhost:8000. You can now open your browser and view the documentation.
+- High-throughput serving with various decoding algorithms, including *parallel sampling*, *beam search*, and more
+- Tensor parallelism and pipeline parallelism support for distributed inference
-If port 8000 is already in use, you can specify a different port, for example:
+- Streaming outputs
+- OpenAI-compatible API server
-```bash
+- Support NVIDIA GPUs, AMD CPUs and GPUs, Intel CPUs, Gaudi® accelerators and GPUs, IBM Power CPUs, TPU, and AWS Trainium and Inferentia Accelerators.
-python -m http.server 3000 -d build/html/
+- Prefix caching support
-```
+- Multi-lora support
+For more information, check out the following:
+- [vLLM announcing blog post](https://vllm.ai) (intro to PagedAttention)
+- [vLLM paper](https://arxiv.org/abs/2309.06180) (SOSP 2023)
+- [How continuous batching enables 23x throughput in LLM inference while reducing p50 latency](https://www.anyscale.com/blog/continuous-batching-llm-inference) by Cade Daniel et al.
+- [vLLM Meetups][meetups]
--- a/docs/api/README.md
+++ b/docs/api/README.md
+# Summary
+[](){ #configuration }
+## Configuration
+API documentation for vLLM's configuration classes.
+- [vllm.config.ModelConfig][]
+- [vllm.config.CacheConfig][]
+- [vllm.config.TokenizerPoolConfig][]
+- [vllm.config.LoadConfig][]
+- [vllm.config.ParallelConfig][]
+- [vllm.config.SchedulerConfig][]
+- [vllm.config.DeviceConfig][]
+- [vllm.config.SpeculativeConfig][]
+- [vllm.config.LoRAConfig][]
+- [vllm.config.PromptAdapterConfig][]
+- [vllm.config.MultiModalConfig][]
+- [vllm.config.PoolerConfig][]
+- [vllm.config.DecodingConfig][]
+- [vllm.config.ObservabilityConfig][]
+- [vllm.config.KVTransferConfig][]
+- [vllm.config.CompilationConfig][]
+- [vllm.config.VllmConfig][]
+[](){ #offline-inference-api }
+## Offline Inference
+LLM Class.
+- [vllm.LLM][]
+LLM Inputs.
+- [vllm.inputs.PromptType][]
+- [vllm.inputs.TextPrompt][]
+- [vllm.inputs.TokensPrompt][]
+## vLLM Engines
+Engine classes for offline and online inference.
+- [vllm.LLMEngine][]
+- [vllm.AsyncLLMEngine][]
+## Inference Parameters
+Inference parameters for vLLM APIs.
+[](){ #sampling-params }
+[](){ #pooling-params }
+- [vllm.SamplingParams][]
+- [vllm.PoolingParams][]
+[](){ #multi-modality }
+## Multi-Modality
+vLLM provides experimental support for multi-modal models through the [vllm.multimodal][] package.
+Multi-modal inputs can be passed alongside text and token prompts to [supported models][supported-mm-models]
+via the `multi_modal_data` field in [vllm.inputs.PromptType][].
+Looking to add your own multi-modal model? Please follow the instructions listed [here][supports-multimodal].
+- [vllm.multimodal.MULTIMODAL_REGISTRY][]
+### Inputs
+User-facing inputs.
+- [vllm.multimodal.inputs.MultiModalDataDict][]
+Internal data structures.
+- [vllm.multimodal.inputs.PlaceholderRange][]
+- [vllm.multimodal.inputs.NestedTensors][]
+- [vllm.multimodal.inputs.MultiModalFieldElem][]
+- [vllm.multimodal.inputs.MultiModalFieldConfig][]
+- [vllm.multimodal.inputs.MultiModalKwargsItem][]
+- [vllm.multimodal.inputs.MultiModalKwargs][]
+- [vllm.multimodal.inputs.MultiModalInputs][]
+### Data Parsing
+- [vllm.multimodal.parse][]
+### Data Processing
+- [vllm.multimodal.processing][]
+### Memory Profiling
+- [vllm.multimodal.profiling][]
+### Registry
+- [vllm.multimodal.registry][]
+## Model Development
+- [vllm.model_executor.models.interfaces_base][]
+- [vllm.model_executor.models.interfaces][]
+- [vllm.model_executor.models.adapters][]
--- a/docs/api/vllm/.meta.yml
+++ b/docs/api/vllm/.meta.yml
+search:
+  boost: 0.5
--- a/docs/assets/contributing/dockerfile-stages-dependency.png
+++ b/docs/assets/contributing/dockerfile-stages-dependency.png
--- a/docs/source/assets/deployment/anything-llm-chat-with-doc.png
+++ b/docs/source/assets/deployment/anything-llm-chat-with-doc.png
--- a/docs/source/assets/deployment/anything-llm-chat-without-doc.png
+++ b/docs/source/assets/deployment/anything-llm-chat-without-doc.png
--- a/docs/source/assets/deployment/anything-llm-provider.png
+++ b/docs/source/assets/deployment/anything-llm-provider.png
--- a/docs/source/assets/deployment/anything-llm-upload-doc.png
+++ b/docs/source/assets/deployment/anything-llm-upload-doc.png
--- a/docs/source/assets/deployment/architecture_helm_deployment.png
+++ b/docs/source/assets/deployment/architecture_helm_deployment.png
--- a/docs/source/assets/deployment/chatbox-chat.png
+++ b/docs/source/assets/deployment/chatbox-chat.png
--- a/docs/source/assets/deployment/chatbox-settings.png
+++ b/docs/source/assets/deployment/chatbox-settings.png
--- a/docs/source/assets/deployment/dify-chat.png
+++ b/docs/source/assets/deployment/dify-chat.png
--- a/docs/source/assets/deployment/dify-create-chatbot.png
+++ b/docs/source/assets/deployment/dify-create-chatbot.png