Migrate docs from Sphinx to MkDocs (#18145)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

Migrate docs from Sphinx to MkDocs (#18145)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
a1fe24d9 · Harry Mellor · GitHub · d0bc2f81 · a1fe24d9 · a1fe24d9
Unverified Commit a1fe24d9 authored May 23, 2025 by Harry Mellor Committed by GitHub May 23, 2025
20 changed files
--- a/docs/source/assets/kernel/key.png
+++ b/docs/source/assets/kernel/key.png
--- a/docs/source/assets/kernel/logits_vec.png
+++ b/docs/source/assets/kernel/logits_vec.png
--- a/docs/source/assets/kernel/q_vecs.png
+++ b/docs/source/assets/kernel/q_vecs.png
--- a/docs/source/assets/kernel/query.png
+++ b/docs/source/assets/kernel/query.png
--- a/docs/source/assets/kernel/v_vec.png
+++ b/docs/source/assets/kernel/v_vec.png
--- a/docs/source/assets/kernel/value.png
+++ b/docs/source/assets/kernel/value.png
--- a/docs/source/assets/logos/vllm-logo-only-light.ico
+++ b/docs/source/assets/logos/vllm-logo-only-light.ico
--- a/docs/source/assets/logos/vllm-logo-only-light.png
+++ b/docs/source/assets/logos/vllm-logo-only-light.png
--- a/docs/source/assets/logos/vllm-logo-text-dark.png
+++ b/docs/source/assets/logos/vllm-logo-text-dark.png
--- a/docs/source/assets/logos/vllm-logo-text-light.png
+++ b/docs/source/assets/logos/vllm-logo-text-light.png
--- a/docs/source/community/meetups.md
+++ b/docs/source/community/meetups.md
-(meetups)=
+---
+title: vLLM Meetups
-# vLLM Meetups
+---
+[](){ #meetups }
 We host regular meetups in San Francisco Bay Area every 2 months. We will share the project updates from the vLLM team and have guest speakers from the industry to share their experience and insights. Please find the materials of our previous meetups below:

--- a/docs/source/community/sponsors.md
+++ b/docs/source/community/sponsors.md
--- a/docs/source/contributing/deprecation_policy.md
+++ b/docs/source/contributing/deprecation_policy.md
--- a/docs/source/contributing/dockerfile/dockerfile.md
+++ b/docs/source/contributing/dockerfile/dockerfile.md
 # Dockerfile
 We provide a <gh-file:docker/Dockerfile> to construct the image for running an OpenAI compatible server with vLLM.
-More information about deploying with Docker can be found [here](#deployment-docker).
+More information about deploying with Docker can be found [here][deployment-docker].
 Below is a visual representation of the multi-stage Dockerfile. The build graph contains the following nodes:
@@ -17,11 +17,9 @@ The edges of the build graph represent:
 - `RUN --mount=(.\*)from=...` dependencies (with a dotted line and an empty diamond arrow head)
-  > :::{figure} /assets/contributing/dockerfile-stages-dependency.png
+  > <figure markdown="span">
-  > :align: center
+  >   ![](../../assets/contributing/dockerfile-stages-dependency.png){ align="center" alt="query" width="100%" }
-  > :alt: query
+  > </figure>
-  > :width: 100%
-  > :::
  >
  > Made using: <https://github.com/patrickhoefler/dockerfilegraph>
  >

--- a/docs/contributing/model/README.md
+++ b/docs/contributing/model/README.md
+---
+title: Adding a New Model
+---
+[](){ #new-model }
+This section provides more information on how to integrate a [PyTorch](https://pytorch.org/) model into vLLM.
+Contents:
+- [Basic](basic.md)
+- [Registration](registration.md)
+- [Tests](tests.md)
+- [Multimodal](multimodal.md)
+!!! note
+    The complexity of adding a new model depends heavily on the model's architecture.
+    The process is considerably straightforward if the model shares a similar architecture with an existing model in vLLM.
+    However, for models that include new operators (e.g., a new attention mechanism), the process can be a bit more complex.
+!!! tip
+    If you are encountering issues while integrating your model into vLLM, feel free to open a [GitHub issue](https://github.com/vllm-project/vllm/issues)
+    or ask on our [developer slack](https://slack.vllm.ai).
+    We will be happy to help you out!
--- a/docs/source/contributing/model/basic.md
+++ b/docs/source/contributing/model/basic.md
-(new-model-basic)=
+---
+title: Implementing a Basic Model
-# Implementing a Basic Model
+---
+[](){ #new-model-basic }
 This guide walks you through the steps to implement a basic vLLM model.
@@ -10,9 +11,8 @@ First, clone the PyTorch model code from the source repository.
 For instance, vLLM's [OPT model](gh-file:vllm/model_executor/models/opt.py) was adapted from
 HuggingFace's [modeling_opt.py](https://github.com/huggingface/transformers/blob/main/src/transformers/models/opt/modeling_opt.py) file.
-:::{warning}
+!!! warning
-Make sure to review and adhere to the original code's copyright and licensing terms!
+    Make sure to review and adhere to the original code's copyright and licensing terms!
-:::
 ## 2. Make your code compatible with vLLM
@@ -67,7 +67,7 @@ class MyModel(nn.Module):
        ... 
 ```
- Rewrite the {meth}`~torch.nn.Module.forward` method of your model to remove any unnecessary code, such as training-specific code. Modify the input parameters to treat `input_ids` and `positions` as flattened tensors with a single batch size dimension, without a max-sequence length dimension.
+- Rewrite the [forward][torch.nn.Module.forward] method of your model to remove any unnecessary code, such as training-specific code. Modify the input parameters to treat `input_ids` and `positions` as flattened tensors with a single batch size dimension, without a max-sequence length dimension.
 ```python
 def forward(
@@ -78,10 +78,9 @@ def forward(
    ...
 ```
-:::{note}
+!!! note
-Currently, vLLM supports the basic multi-head attention mechanism and its variant with rotary positional embeddings.
+    Currently, vLLM supports the basic multi-head attention mechanism and its variant with rotary positional embeddings.
-If your model employs a different attention mechanism, you will need to implement a new attention layer in vLLM.
+    If your model employs a different attention mechanism, you will need to implement a new attention layer in vLLM.
-:::
 For reference, check out our [Llama implementation](gh-file:vllm/model_executor/models/llama.py). vLLM already supports a large number of models. It is recommended to find a model similar to yours and adapt it to your model's architecture. Check out <gh-dir:vllm/model_executor/models> for more examples.
@@ -89,7 +88,7 @@ For reference, check out our [Llama implementation](gh-file:vllm/model_executor/
 If your model is too large to fit into a single GPU, you can use tensor parallelism to manage it.
 To do this, substitute your model's linear and embedding layers with their tensor-parallel versions.
-For the embedding layer, you can simply replace {class}`torch.nn.Embedding` with `VocabParallelEmbedding`. For the output LM head, you can use `ParallelLMHead`.
+For the embedding layer, you can simply replace [torch.nn.Embedding][] with `VocabParallelEmbedding`. For the output LM head, you can use `ParallelLMHead`.
 When it comes to the linear layers, we provide the following options to parallelize them:
 - `ReplicatedLinear`: Replicates the inputs and weights across multiple GPUs. No memory saving.
@@ -107,7 +106,7 @@ This method should load the weights from the HuggingFace's checkpoint file and a
 ## 5. Register your model
-See [this page](#new-model-registration) for instructions on how to register your new model to be used by vLLM.
+See [this page][new-model-registration] for instructions on how to register your new model to be used by vLLM.
 ## Frequently Asked Questions

--- a/docs/contributing/model/multimodal.md
+++ b/docs/contributing/model/multimodal.md
--- a/docs/source/contributing/model/registration.md
+++ b/docs/source/contributing/model/registration.md
-(new-model-registration)=
+---
+title: Registering a Model to vLLM
-# Registering a Model to vLLM
+---
+[](){ #new-model-registration }
 vLLM relies on a model registry to determine how to run each model.
-A list of pre-registered architectures can be found [here](#supported-models).
+A list of pre-registered architectures can be found [here][supported-models].
 If your model is not on this list, you must register it to vLLM.
 This page provides detailed instructions on how to do so.
 ## Built-in models
-To add a model directly to the vLLM library, start by forking our [GitHub repository](https://github.com/vllm-project/vllm) and then [build it from source](#build-from-source).
+To add a model directly to the vLLM library, start by forking our [GitHub repository](https://github.com/vllm-project/vllm) and then [build it from source][build-from-source].
 This gives you the ability to modify the codebase and test your model.
-After you have implemented your model (see [tutorial](#new-model-basic)), put it into the <gh-dir:vllm/model_executor/models> directory.
+After you have implemented your model (see [tutorial][new-model-basic]), put it into the <gh-dir:vllm/model_executor/models> directory.
 Then, add your model class to `_VLLM_MODELS` in <gh-file:vllm/model_executor/models/registry.py> so that it is automatically registered upon importing vLLM.
-Finally, update our [list of supported models](#supported-models) to promote your model!
+Finally, update our [list of supported models][supported-models] to promote your model!
-:::{important}
+!!! warning
-The list of models in each section should be maintained in alphabetical order.
+    The list of models in each section should be maintained in alphabetical order.
-:::
 ## Out-of-tree models
 You can load an external model using a plugin without modifying the vLLM codebase.
-:::{seealso}
+!!! info
-[vLLM's Plugin System](#plugin-system)
+    [vLLM's Plugin System][plugin-system]
-:::
 To register the model, use the following code:
@@ -45,11 +44,9 @@ from vllm import ModelRegistry
 ModelRegistry.register_model("YourModelForCausalLM", "your_code:YourModelForCausalLM")
 ```
-:::{important}
+!!! warning
-If your model is a multimodal model, ensure the model class implements the {class}`~vllm.model_executor.models.interfaces.SupportsMultiModal` interface.
+    If your model is a multimodal model, ensure the model class implements the [SupportsMultiModal][vllm.model_executor.models.interfaces.SupportsMultiModal] interface.
-Read more about that [here](#supports-multimodal).
+    Read more about that [here][supports-multimodal].
-:::
-:::{note}
+!!! note
-Although you can directly put these code snippets in your script using `vllm.LLM`, the recommended way is to place these snippets in a vLLM plugin. This ensures compatibility with various vLLM features like distributed inference and the API server.
+    Although you can directly put these code snippets in your script using `vllm.LLM`, the recommended way is to place these snippets in a vLLM plugin. This ensures compatibility with various vLLM features like distributed inference and the API server.
-:::
--- a/docs/source/contributing/model/tests.md
+++ b/docs/source/contributing/model/tests.md
-(new-model-tests)=
+---
+title: Writing Unit Tests
-# Writing Unit Tests
+---
+[](){ #new-model-tests }
 This page explains how to write unit tests to verify the implementation of your model.
@@ -14,14 +15,12 @@ Without them, the CI for your PR will fail.
 Include an example HuggingFace repository for your model in <gh-file:tests/models/registry.py>.
 This enables a unit test that loads dummy weights to ensure that the model can be initialized in vLLM.
-:::{important}
+!!! warning
-The list of models in each section should be maintained in alphabetical order.
+    The list of models in each section should be maintained in alphabetical order.
-:::
-:::{tip}
+!!! tip
-If your model requires a development version of HF Transformers, you can set
+    If your model requires a development version of HF Transformers, you can set
-`min_transformers_version` to skip the test in CI until the model is released.
+    `min_transformers_version` to skip the test in CI until the model is released.
-:::
 ## Optional Tests
@@ -34,16 +33,16 @@ These tests compare the model outputs of vLLM against [HF Transformers](https://
 #### Generative models
-For [generative models](#generative-models), there are two levels of correctness tests, as defined in <gh-file:tests/models/utils.py>:
+For [generative models][generative-models], there are two levels of correctness tests, as defined in <gh-file:tests/models/utils.py>:
 - Exact correctness (`check_outputs_equal`): The text outputted by vLLM should exactly match the text outputted by HF.
 - Logprobs similarity (`check_logprobs_close`): The logprobs outputted by vLLM should be in the top-k logprobs outputted by HF, and vice versa.
 #### Pooling models
-For [pooling models](#pooling-models), we simply check the cosine similarity, as defined in <gh-file:tests/models/embedding/utils.py>.
+For [pooling models][pooling-models], we simply check the cosine similarity, as defined in <gh-file:tests/models/embedding/utils.py>.
-(mm-processing-tests)=
+[](){ #mm-processing-tests }
 ### Multi-modal processing

--- a/docs/source/contributing/overview.md
+++ b/docs/source/contributing/overview.md
@@ -27,7 +27,21 @@ See <gh-file:LICENSE>.
 ## Developing
 Depending on the kind of development you'd like to do (e.g. Python, CUDA), you can choose to build vLLM with or without compilation.
-Check out the [building from source](#build-from-source) documentation for details.
+Check out the [building from source][build-from-source] documentation for details.
+### Building the docs
+Install the dependencies:
+```bash
+pip install -r requirements/docs.txt
+```
+Start the autoreloading MkDocs server:
+```bash
+mkdocs serve
+```
 ## Testing
@@ -48,29 +62,25 @@ pre-commit run mypy-3.9 --hook-stage manual --all-files
 pytest tests/
 ```
-:::{tip}
+!!! tip
-Since the <gh-file:docker/Dockerfile> ships with Python 3.12, all tests in CI (except `mypy`) are run with Python 3.12.
+    Since the <gh-file:docker/Dockerfile> ships with Python 3.12, all tests in CI (except `mypy`) are run with Python 3.12.
-Therefore, we recommend developing with Python 3.12 to minimise the chance of your local environment clashing with our CI environment.
+    Therefore, we recommend developing with Python 3.12 to minimise the chance of your local environment clashing with our CI environment.
-:::
-:::{note}
+!!! note
-Currently, the repository is not fully checked by `mypy`.
+    Currently, the repository is not fully checked by `mypy`.
-:::
-:::{note}
+!!! note
-Currently, not all unit tests pass when run on CPU platforms. If you don't have access to a GPU
+    Currently, not all unit tests pass when run on CPU platforms. If you don't have access to a GPU
-platform to run unit tests locally, rely on the continuous integration system to run the tests for
+    platform to run unit tests locally, rely on the continuous integration system to run the tests for
-now.
+    now.
-:::
 ## Issues
 If you encounter a bug or have a feature request, please [search existing issues](https://github.com/vllm-project/vllm/issues?q=is%3Aissue) first to see if it has already been reported. If not, please [file a new issue](https://github.com/vllm-project/vllm/issues/new/choose), providing as much relevant information as possible.
-:::{important}
+!!! warning
-If you discover a security vulnerability, please follow the instructions [here](gh-file:SECURITY.md#reporting-a-vulnerability).
+    If you discover a security vulnerability, please follow the instructions [here](gh-file:SECURITY.md#reporting-a-vulnerability).
-:::
 ## Pull Requests & Code Reviews
@@ -106,9 +116,8 @@ appropriately to indicate the type of change. Please use one of the following:
 - `[Misc]` for PRs that do not fit the above categories. Please use this
  sparingly.
-:::{note}
+!!! note
-If the PR spans more than one category, please include all relevant prefixes.
+    If the PR spans more than one category, please include all relevant prefixes.
-:::
 ### Code Quality