Merge tag 'v0.9.0' into v0.9.0-ori

7a985548 · zhuwenwen · 45d3785c · dc1440cf · 45d3785c · 45d3785c
Commit 7a985548 authored May 22, 2025 by zhuwenwen
20 changed files
--- a/docs/source/api/multimodal/parse.md
+++ b/docs/source/api/multimodal/parse.md
-# Data Parsing
-## Module Contents
-```{eval-rst}
-.. automodule:: vllm.multimodal.parse
-    :members:
-    :member-order: bysource
-```
--- a/docs/source/api/multimodal/processing.md
+++ b/docs/source/api/multimodal/processing.md
-# Data Processing
-## Module Contents
-```{eval-rst}
-.. automodule:: vllm.multimodal.processing
-    :members:
-    :member-order: bysource
-```
--- a/docs/source/api/multimodal/profiling.md
+++ b/docs/source/api/multimodal/profiling.md
-# Memory Profiling
-## Module Contents
-```{eval-rst}
-.. automodule:: vllm.multimodal.profiling
-    :members:
-    :member-order: bysource
-```
--- a/docs/source/api/multimodal/registry.md
+++ b/docs/source/api/multimodal/registry.md
-# Registry
-## Module Contents
-```{eval-rst}
-.. automodule:: vllm.multimodal.registry
-    :members:
-    :member-order: bysource
-```
--- a/docs/source/api/offline_inference/index.md
+++ b/docs/source/api/offline_inference/index.md
-# Offline Inference
-:::{toctree}
-:caption: Contents
-:maxdepth: 1
-llm
-llm_inputs
-:::
--- a/docs/source/api/offline_inference/llm.md
+++ b/docs/source/api/offline_inference/llm.md
-# LLM Class
-```{eval-rst}
-.. autoclass:: vllm.LLM
-    :members:
-    :show-inheritance:
-```
--- a/docs/source/api/offline_inference/llm_inputs.md
+++ b/docs/source/api/offline_inference/llm_inputs.md
-# LLM Inputs
-```{eval-rst}
-.. autodata:: vllm.inputs.PromptType
-```
-```{eval-rst}
-.. autoclass:: vllm.inputs.TextPrompt
-    :show-inheritance:
-    :members:
-    :member-order: bysource
-```
-```{eval-rst}
-.. autoclass:: vllm.inputs.TokensPrompt
-    :show-inheritance:
-    :members:
-    :member-order: bysource
-```
--- a/docs/source/api/summary.md
+++ b/docs/source/api/summary.md
+# Summary
+(configuration)=
+## Configuration
+API documentation for vLLM's configuration classes.
+```{autodoc2-summary}
+    vllm.config.ModelConfig
+    vllm.config.CacheConfig
+    vllm.config.TokenizerPoolConfig
+    vllm.config.LoadConfig
+    vllm.config.ParallelConfig
+    vllm.config.SchedulerConfig
+    vllm.config.DeviceConfig
+    vllm.config.SpeculativeConfig
+    vllm.config.LoRAConfig
+    vllm.config.PromptAdapterConfig
+    vllm.config.MultiModalConfig
+    vllm.config.PoolerConfig
+    vllm.config.DecodingConfig
+    vllm.config.ObservabilityConfig
+    vllm.config.KVTransferConfig
+    vllm.config.CompilationConfig
+    vllm.config.VllmConfig
+```
+(offline-inference-api)=
+## Offline Inference
+LLM Class.
+```{autodoc2-summary}
+    vllm.LLM
+```
+LLM Inputs.
+```{autodoc2-summary}
+    vllm.inputs.PromptType
+    vllm.inputs.TextPrompt
+    vllm.inputs.TokensPrompt
+```
+## vLLM Engines
+Engine classes for offline and online inference.
+```{autodoc2-summary}
+    vllm.LLMEngine
+    vllm.AsyncLLMEngine
+```
+## Inference Parameters
+Inference parameters for vLLM APIs.
+(sampling-params)=
+(pooling-params)=
+```{autodoc2-summary}
+    vllm.SamplingParams
+    vllm.PoolingParams
+```
+(multi-modality)=
+## Multi-Modality
+vLLM provides experimental support for multi-modal models through the {mod}`vllm.multimodal` package.
+Multi-modal inputs can be passed alongside text and token prompts to [supported models](#supported-mm-models)
+via the `multi_modal_data` field in {class}`vllm.inputs.PromptType`.
+Looking to add your own multi-modal model? Please follow the instructions listed [here](#supports-multimodal).
+```{autodoc2-summary}
+    vllm.multimodal.MULTIMODAL_REGISTRY
+```
+### Inputs
+User-facing inputs.
+```{autodoc2-summary}
+    vllm.multimodal.inputs.MultiModalDataDict
+```
+Internal data structures.
+```{autodoc2-summary}
+    vllm.multimodal.inputs.PlaceholderRange
+    vllm.multimodal.inputs.NestedTensors
+    vllm.multimodal.inputs.MultiModalFieldElem
+    vllm.multimodal.inputs.MultiModalFieldConfig
+    vllm.multimodal.inputs.MultiModalKwargsItem
+    vllm.multimodal.inputs.MultiModalKwargs
+    vllm.multimodal.inputs.MultiModalInputs
+```
+### Data Parsing
+```{autodoc2-summary}
+    vllm.multimodal.parse
+```
+### Data Processing
+```{autodoc2-summary}
+    vllm.multimodal.processing
+```
+### Memory Profiling
+```{autodoc2-summary}
+    vllm.multimodal.profiling
+```
+### Registry
+```{autodoc2-summary}
+    vllm.multimodal.registry
+```
+## Model Development
+```{autodoc2-summary}
+    vllm.model_executor.models.interfaces_base
+    vllm.model_executor.models.interfaces
+    vllm.model_executor.models.adapters
+```
--- a/docs/source/assets/contributing/dockerfile-stages-dependency.png
+++ b/docs/source/assets/contributing/dockerfile-stages-dependency.png
--- a/docs/source/assets/deployment/chatbox-chat.png
+++ b/docs/source/assets/deployment/chatbox-chat.png
--- a/docs/source/assets/deployment/chatbox-settings.png
+++ b/docs/source/assets/deployment/chatbox-settings.png
--- a/docs/source/assets/deployment/dify-chat.png
+++ b/docs/source/assets/deployment/dify-chat.png
--- a/docs/source/assets/deployment/dify-create-chatbot.png
+++ b/docs/source/assets/deployment/dify-create-chatbot.png
--- a/docs/source/assets/deployment/dify-settings.png
+++ b/docs/source/assets/deployment/dify-settings.png
--- a/docs/source/assets/deployment/streamlit-chat.png
+++ b/docs/source/assets/deployment/streamlit-chat.png
--- a/docs/source/autodoc2_docstring_parser.py
+++ b/docs/source/autodoc2_docstring_parser.py
+# SPDX-License-Identifier: Apache-2.0
+from docutils import nodes
+from myst_parser.parsers.sphinx_ import MystParser
+from sphinx.ext.napoleon import docstring
+class NapoleonParser(MystParser):
+    def parse(self, input_string: str, document: nodes.document) -> None:
+        # Get the Sphinx configuration
+        config = document.settings.env.config
+        parsed_content = str(
+            docstring.GoogleDocstring(
+                str(docstring.NumpyDocstring(input_string, config)),
+                config,
+            ))
+        return super().parse(parsed_content, document)
+Parser = NapoleonParser
--- a/docs/source/community/meetups.md
+++ b/docs/source/community/meetups.md
@@ -4,6 +4,7 @@
 We host regular meetups in San Francisco Bay Area every 2 months. We will share the project updates from the vLLM team and have guest speakers from the industry to share their experience and insights. Please find the materials of our previous meetups below:
+- [NYC vLLM Meetup](https://lu.ma/c1rqyf1f), May 7th, 2025. [[Slides]](https://docs.google.com/presentation/d/1_q_aW_ioMJWUImf1s1YM-ZhjXz8cUeL0IJvaquOYBeA/edit?usp=sharing)
 - [Asia Developer Day](https://www.sginnovate.com/event/limited-availability-morning-evening-slots-remaining-inaugural-vllm-asia-developer-day), April 3rd 2025. [[Slides]](https://docs.google.com/presentation/d/19cp6Qu8u48ihB91A064XfaXruNYiBOUKrBxAmDOllOo/edit?usp=sharing).
 - [vLLM x Ollama Inference Night](https://lu.ma/vllm-ollama), March 27th 2025. [[Slides]](https://docs.google.com/presentation/d/16T2PDD1YwRnZ4Tu8Q5r6n53c5Lr5c73UV9Vd2_eBo4U/edit?usp=sharing).
 - [The first vLLM China Meetup](https://mp.weixin.qq.com/s/n77GibL2corAtQHtVEAzfg), March 16th 2025. [[Slides]](https://docs.google.com/presentation/d/1REHvfQMKGnvz6p3Fd23HhSO4c8j5WPGZV0bKYLwnHyQ/edit?usp=sharing).

--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -13,16 +13,17 @@
 # documentation root, use os.path.abspath to make it absolute, like shown here.
 import datetime
-import inspect
 import logging
 import os
+import re
 import sys
+from pathlib import Path
 import requests
-from sphinx.ext import autodoc
 logger = logging.getLogger(__name__)
-sys.path.append(os.path.abspath("../.."))
+REPO_ROOT = Path(__file__).resolve().parent.parent.parent
+sys.path.append(os.path.abspath(REPO_ROOT))
 # -- Project information -----------------------------------------------------
@@ -40,8 +41,7 @@ extensions = [
    "sphinx.ext.linkcode",
    "sphinx.ext.intersphinx",
    "sphinx_copybutton",
-    "sphinx.ext.autodoc",
+    "autodoc2",
-    "sphinx.ext.autosummary",
    "myst_parser",
    "sphinxarg.ext",
    "sphinx_design",
@@ -49,7 +49,19 @@ extensions = [
 ]
 myst_enable_extensions = [
    "colon_fence",
+    "fieldlist",
 ]
+autodoc2_packages = [
+    {
+        "path": "../../vllm",
+        "exclude_dirs": ["__pycache__", "third_party"],
+    },
+]
+autodoc2_output_dir = "api"
+autodoc2_render_plugin = "myst"
+autodoc2_hidden_objects = ["dunder", "private", "inherited"]
+autodoc2_sort_names = True
+autodoc2_index_template = None
 # Add any paths that contain templates here, relative to this directory.
 templates_path = ['_templates']
@@ -77,6 +89,11 @@ html_theme_options = {
    'repository_url': 'https://github.com/vllm-project/vllm',
    'use_repository_button': True,
    'use_edit_page_button': True,
+    # Prevents the full API being added to the left sidebar of every page.
+    # Reduces build time by 2.5x and reduces build size from ~225MB to ~95MB.
+    'collapse_navbar': True,
+    # Makes API visible in the right sidebar on API reference pages.
+    'show_toc_level': 3,
 }
 # Add any paths that contain custom static files (such as style sheets) here,
 # relative to this directory. They are copied after the builtin static files,
@@ -164,73 +181,64 @@ def linkcode_resolve(domain, info):
        return None
    if not info['module']:
        return None
-    filename = info['module'].replace('.', '/')
-    module = info['module']
+    # Get path from module name
+    file = Path(f"{info['module'].replace('.', '/')}.py")
-    # try to determine the correct file and line number to link to
+    path = REPO_ROOT / file
-    obj = sys.modules[module]
+    if not path.exists():
+        path = REPO_ROOT / file.with_suffix("") / "__init__.py"
-    # get as specific as we can
+    if not path.exists():
-    lineno: int = 0
+        return None
-    filename: str = ""
-    try:
+    # Get the line number of the object
-        for part in info['fullname'].split('.'):
+    with open(path) as f:
-            obj = getattr(obj, part)
+        lines = f.readlines()
+    name = info['fullname'].split(".")[-1]
-            # Skip decorator wrappers by checking if the object is a function
+    pattern = fr"^( {{4}})*((def|class) )?{name}\b.*"
-            # and has a __wrapped__ attribute (which decorators typically set)
+    for lineno, line in enumerate(lines, 1):
-            while hasattr(obj, '__wrapped__'):
+        if not line or line.startswith("#"):
-                obj = obj.__wrapped__
+            continue
+        if re.match(pattern, line):
-            if not (inspect.isclass(obj) or inspect.isfunction(obj)
+            break
-                    or inspect.ismethod(obj)):
-                obj = obj.__class__  # Get the class of the instance
+    # If the line number is not found, return None
+    if lineno == len(lines):
-            lineno = inspect.getsourcelines(obj)[1]
+        return None
-            filename = (inspect.getsourcefile(obj)
-                        or f"{filename}.py").split("vllm/", 1)[1]
+    # If the line number is found, create the URL
-    except Exception:
+    filename = path.relative_to(REPO_ROOT)
-        # For some things, like a class member, won't work, so
+    if "checkouts" in path.parts:
-        # we'll use the line number of the parent (the class)
-        pass
-    if filename.startswith("checkouts/"):
        # a PR build on readthedocs
-        pr_number = filename.split("/")[1]
+        pr_number = REPO_ROOT.name
-        filename = filename.split("/", 2)[2]
        base, branch = get_repo_base_and_branch(pr_number)
        if base and branch:
            return f"https://github.com/{base}/blob/{branch}/{filename}#L{lineno}"
    # Otherwise, link to the source file on the main branch
    return f"https://github.com/vllm-project/vllm/blob/main/{filename}#L{lineno}"
-# Mock out external dependencies here, otherwise the autodoc pages may be blank.
+# Mock out external dependencies here, otherwise sphinx-argparse won't work.
 autodoc_mock_imports = [
+    "huggingface_hub",
+    "pydantic",
+    "zmq",
+    "cloudpickle",
+    "aiohttp",
+    "starlette",
    "blake3",
-    "compressed_tensors",
    "cpuinfo",
-    "cv2",
-    "torch",
    "transformers",
    "psutil",
-    "prometheus_client",
-    "sentencepiece",
    "vllm._C",
    "PIL",
    "numpy",
-    'triton',
    "tqdm",
-    "tensorizer",
+    # The mocks below are required by
-    "pynvml",
+    # docs/source/serving/openai_compatible_server.md's
-    "outlines",
+    # vllm.entrypoints.openai.cli_args
-    "xgrammar",
+    "openai",
-    "librosa",
+    "fastapi",
-    "soundfile",
+    "partial_json_parser",
-    "gguf",
-    "lark",
-    "decord",
 ]
 for mock_target in autodoc_mock_imports:
@@ -241,18 +249,6 @@ for mock_target in autodoc_mock_imports:
            "been loaded into sys.modules when the sphinx build starts.",
            mock_target)
-class MockedClassDocumenter(autodoc.ClassDocumenter):
-    """Remove note about base class when a class is derived from object."""
-    def add_line(self, line: str, source: str, *lineno: int) -> None:
-        if line == "   Bases: :py:class:`object`":
-            return
-        super().add_line(line, source, *lineno)
-autodoc.ClassDocumenter = MockedClassDocumenter
 intersphinx_mapping = {
    "python": ("https://docs.python.org/3", None),
    "typing_extensions":
@@ -264,7 +260,4 @@ intersphinx_mapping = {
    "psutil": ("https://psutil.readthedocs.io/en/stable", None),
 }
-autodoc_preserve_defaults = True
-autodoc_warningiserror = True
 navigation_with_keys = False
--- a/docs/source/contributing/deprecation_policy.md
+++ b/docs/source/contributing/deprecation_policy.md
+# Deprecation Policy
+This document outlines the official policy and process for deprecating features
+in the vLLM project.
+## Overview
+vLLM uses a structured "deprecation pipeline" to guide the lifecycle of
+deprecated features. This policy ensures that users are given clear and
+sufficient notice when a feature is deprecated and that deprecations proceed in
+a consistent and predictable manner.
+We aim to strike a balance between continued innovation and respecting users’
+reliance on existing functionality. Deprecations are tied to our **minor (Y)
+releases** following semantic versioning (X.Y.Z), where:
+- **X** is a major version (rare)
+- **Y** is a minor version (used for significant changes, including deprecations/removals)
+- **Z** is a patch version (used for fixes and safer enhancements)
+Features that fall under this policy include (at a minimum) the following:
+- CLI flags
+- Environment variables
+- Configuration files
+- APIs in the OpenAI-compatible API server
+- Public Python APIs for the `vllm` library
+## Deprecation Pipeline
+The deprecation process consists of several clearly defined stages that span
+multiple Y releases:
+**1. Deprecated (Still On By Default)**
+- **Action**: Feature is marked as deprecated.
+- **Timeline**: A removal version is explicitly stated in the deprecation
+warning (e.g., "This will be removed in v0.10.0").
+- **Communication**: Deprecation is noted in the following, as applicable:
+  - Help strings
+  - Log output
+  - API responses
+  - `/metrics` output (for metrics features)
+  - User-facing documentation
+  - Release notes
+  - GitHub Issue (RFC) for feedback
+  - Documentation and use of the `@typing_extensions.deprecated` decorator for Python APIs
+**2.Deprecated (Off By Default)**
+- **Action**: Feature is disabled by default, but can still be re-enabled via a
+CLI flag or environment variable. Feature throws an error when used without
+re-enabling.
+- **Purpose**: Allows users who missed earlier warnings a temporary escape hatch
+while signaling imminent removal. Ensures any remaining usage is clearly
+surfaced and blocks silent breakage before full removal.
+**3. Removed**
+- **Action**: Feature is completely removed from the codebase.
+- **Note**: Only features that have passed through the previous deprecation
+stages will be removed.
+## Example Timeline
+Assume a feature is deprecated in `v0.9.0`.
+| Release       | Status                                                                                          |
+|---------------|-------------------------------------------------------------------------------------------------|
+| `v0.9.0`      | Feature is deprecated with clear removal version listed.                                        |
+| `v0.10.0`     | Feature is now off by default, throws an error when used, and can be re-enabled for legacy use. |
+| `v0.11.0`     | Feature is removed.                                                                             |
+## Important Guidelines
+- **No Removals in Patch Releases**: Removing deprecated features in patch
+(`.Z`) releases is disallowed to avoid surprising users.
+- **Grace Period for Existing Deprecations**: Any feature deprecated **before
+this policy** will have its grace period start **now**, not retroactively.
+- **Documentation is Critical**: Ensure every stage of the pipeline is
+documented clearly for users.
+## Final Notes
+This policy is a living document and may evolve as the needs of the project and
+its users change. Community feedback is welcome and encouraged as we refine the
+process.
--- a/docs/source/contributing/overview.md
+++ b/docs/source/contributing/overview.md
@@ -17,7 +17,7 @@ Unsure on where to start? Check out the following links for tasks to work on:
 - [Good first issues](https://github.com/vllm-project/vllm/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22good%20first%20issue%22)
  - [Selected onboarding tasks](gh-project:6)
- [New model requests](https://github.com/vllm-project/vllm/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22new%20model%22)
+- [New model requests](https://github.com/vllm-project/vllm/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22new-model%22)
  - [Models with multi-modal capabilities](gh-project:10)
 ## License
@@ -40,6 +40,10 @@ pre-commit install --hook-type pre-commit --hook-type commit-msg
 # You can manually run pre-commit with
 pre-commit run --all-files
+# To manually run something from CI that does not run
+# locally by default, you can run:
+pre-commit run mypy-3.9 --hook-stage manual --all-files
 # Unit tests
 pytest tests/
 ```
@@ -54,6 +58,12 @@ Therefore, we recommend developing with Python 3.12 to minimise the chance of yo
 Currently, the repository is not fully checked by `mypy`.
 :::
+:::{note}
+Currently, not all unit tests pass when run on CPU platforms. If you don't have access to a GPU
+platform to run unit tests locally, rely on the continuous integration system to run the tests for
+now.
+:::
 ## Issues
 If you encounter a bug or have a feature request, please [search existing issues](https://github.com/vllm-project/vllm/issues?q=is%3Aissue) first to see if it has already been reported. If not, please [file a new issue](https://github.com/vllm-project/vllm/issues/new/choose), providing as much relevant information as possible.