Commit 7a985548 authored by zhuwenwen's avatar zhuwenwen
Browse files

Merge tag 'v0.9.0' into v0.9.0-ori

parents 45d3785c dc1440cf
# Data Parsing
## Module Contents
```{eval-rst}
.. automodule:: vllm.multimodal.parse
:members:
:member-order: bysource
```
# Data Processing
## Module Contents
```{eval-rst}
.. automodule:: vllm.multimodal.processing
:members:
:member-order: bysource
```
# Memory Profiling
## Module Contents
```{eval-rst}
.. automodule:: vllm.multimodal.profiling
:members:
:member-order: bysource
```
# Registry
## Module Contents
```{eval-rst}
.. automodule:: vllm.multimodal.registry
:members:
:member-order: bysource
```
# Offline Inference
:::{toctree}
:caption: Contents
:maxdepth: 1
llm
llm_inputs
:::
# LLM Class
```{eval-rst}
.. autoclass:: vllm.LLM
:members:
:show-inheritance:
```
# LLM Inputs
```{eval-rst}
.. autodata:: vllm.inputs.PromptType
```
```{eval-rst}
.. autoclass:: vllm.inputs.TextPrompt
:show-inheritance:
:members:
:member-order: bysource
```
```{eval-rst}
.. autoclass:: vllm.inputs.TokensPrompt
:show-inheritance:
:members:
:member-order: bysource
```
# Summary
(configuration)=
## Configuration
API documentation for vLLM's configuration classes.
```{autodoc2-summary}
vllm.config.ModelConfig
vllm.config.CacheConfig
vllm.config.TokenizerPoolConfig
vllm.config.LoadConfig
vllm.config.ParallelConfig
vllm.config.SchedulerConfig
vllm.config.DeviceConfig
vllm.config.SpeculativeConfig
vllm.config.LoRAConfig
vllm.config.PromptAdapterConfig
vllm.config.MultiModalConfig
vllm.config.PoolerConfig
vllm.config.DecodingConfig
vllm.config.ObservabilityConfig
vllm.config.KVTransferConfig
vllm.config.CompilationConfig
vllm.config.VllmConfig
```
(offline-inference-api)=
## Offline Inference
LLM Class.
```{autodoc2-summary}
vllm.LLM
```
LLM Inputs.
```{autodoc2-summary}
vllm.inputs.PromptType
vllm.inputs.TextPrompt
vllm.inputs.TokensPrompt
```
## vLLM Engines
Engine classes for offline and online inference.
```{autodoc2-summary}
vllm.LLMEngine
vllm.AsyncLLMEngine
```
## Inference Parameters
Inference parameters for vLLM APIs.
(sampling-params)=
(pooling-params)=
```{autodoc2-summary}
vllm.SamplingParams
vllm.PoolingParams
```
(multi-modality)=
## Multi-Modality
vLLM provides experimental support for multi-modal models through the {mod}`vllm.multimodal` package.
Multi-modal inputs can be passed alongside text and token prompts to [supported models](#supported-mm-models)
via the `multi_modal_data` field in {class}`vllm.inputs.PromptType`.
Looking to add your own multi-modal model? Please follow the instructions listed [here](#supports-multimodal).
```{autodoc2-summary}
vllm.multimodal.MULTIMODAL_REGISTRY
```
### Inputs
User-facing inputs.
```{autodoc2-summary}
vllm.multimodal.inputs.MultiModalDataDict
```
Internal data structures.
```{autodoc2-summary}
vllm.multimodal.inputs.PlaceholderRange
vllm.multimodal.inputs.NestedTensors
vllm.multimodal.inputs.MultiModalFieldElem
vllm.multimodal.inputs.MultiModalFieldConfig
vllm.multimodal.inputs.MultiModalKwargsItem
vllm.multimodal.inputs.MultiModalKwargs
vllm.multimodal.inputs.MultiModalInputs
```
### Data Parsing
```{autodoc2-summary}
vllm.multimodal.parse
```
### Data Processing
```{autodoc2-summary}
vllm.multimodal.processing
```
### Memory Profiling
```{autodoc2-summary}
vllm.multimodal.profiling
```
### Registry
```{autodoc2-summary}
vllm.multimodal.registry
```
## Model Development
```{autodoc2-summary}
vllm.model_executor.models.interfaces_base
vllm.model_executor.models.interfaces
vllm.model_executor.models.adapters
```
# SPDX-License-Identifier: Apache-2.0
from docutils import nodes
from myst_parser.parsers.sphinx_ import MystParser
from sphinx.ext.napoleon import docstring
class NapoleonParser(MystParser):
def parse(self, input_string: str, document: nodes.document) -> None:
# Get the Sphinx configuration
config = document.settings.env.config
parsed_content = str(
docstring.GoogleDocstring(
str(docstring.NumpyDocstring(input_string, config)),
config,
))
return super().parse(parsed_content, document)
Parser = NapoleonParser
...@@ -4,6 +4,7 @@ ...@@ -4,6 +4,7 @@
We host regular meetups in San Francisco Bay Area every 2 months. We will share the project updates from the vLLM team and have guest speakers from the industry to share their experience and insights. Please find the materials of our previous meetups below: We host regular meetups in San Francisco Bay Area every 2 months. We will share the project updates from the vLLM team and have guest speakers from the industry to share their experience and insights. Please find the materials of our previous meetups below:
- [NYC vLLM Meetup](https://lu.ma/c1rqyf1f), May 7th, 2025. [[Slides]](https://docs.google.com/presentation/d/1_q_aW_ioMJWUImf1s1YM-ZhjXz8cUeL0IJvaquOYBeA/edit?usp=sharing)
- [Asia Developer Day](https://www.sginnovate.com/event/limited-availability-morning-evening-slots-remaining-inaugural-vllm-asia-developer-day), April 3rd 2025. [[Slides]](https://docs.google.com/presentation/d/19cp6Qu8u48ihB91A064XfaXruNYiBOUKrBxAmDOllOo/edit?usp=sharing). - [Asia Developer Day](https://www.sginnovate.com/event/limited-availability-morning-evening-slots-remaining-inaugural-vllm-asia-developer-day), April 3rd 2025. [[Slides]](https://docs.google.com/presentation/d/19cp6Qu8u48ihB91A064XfaXruNYiBOUKrBxAmDOllOo/edit?usp=sharing).
- [vLLM x Ollama Inference Night](https://lu.ma/vllm-ollama), March 27th 2025. [[Slides]](https://docs.google.com/presentation/d/16T2PDD1YwRnZ4Tu8Q5r6n53c5Lr5c73UV9Vd2_eBo4U/edit?usp=sharing). - [vLLM x Ollama Inference Night](https://lu.ma/vllm-ollama), March 27th 2025. [[Slides]](https://docs.google.com/presentation/d/16T2PDD1YwRnZ4Tu8Q5r6n53c5Lr5c73UV9Vd2_eBo4U/edit?usp=sharing).
- [The first vLLM China Meetup](https://mp.weixin.qq.com/s/n77GibL2corAtQHtVEAzfg), March 16th 2025. [[Slides]](https://docs.google.com/presentation/d/1REHvfQMKGnvz6p3Fd23HhSO4c8j5WPGZV0bKYLwnHyQ/edit?usp=sharing). - [The first vLLM China Meetup](https://mp.weixin.qq.com/s/n77GibL2corAtQHtVEAzfg), March 16th 2025. [[Slides]](https://docs.google.com/presentation/d/1REHvfQMKGnvz6p3Fd23HhSO4c8j5WPGZV0bKYLwnHyQ/edit?usp=sharing).
......
...@@ -13,16 +13,17 @@ ...@@ -13,16 +13,17 @@
# documentation root, use os.path.abspath to make it absolute, like shown here. # documentation root, use os.path.abspath to make it absolute, like shown here.
import datetime import datetime
import inspect
import logging import logging
import os import os
import re
import sys import sys
from pathlib import Path
import requests import requests
from sphinx.ext import autodoc
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
sys.path.append(os.path.abspath("../..")) REPO_ROOT = Path(__file__).resolve().parent.parent.parent
sys.path.append(os.path.abspath(REPO_ROOT))
# -- Project information ----------------------------------------------------- # -- Project information -----------------------------------------------------
...@@ -40,8 +41,7 @@ extensions = [ ...@@ -40,8 +41,7 @@ extensions = [
"sphinx.ext.linkcode", "sphinx.ext.linkcode",
"sphinx.ext.intersphinx", "sphinx.ext.intersphinx",
"sphinx_copybutton", "sphinx_copybutton",
"sphinx.ext.autodoc", "autodoc2",
"sphinx.ext.autosummary",
"myst_parser", "myst_parser",
"sphinxarg.ext", "sphinxarg.ext",
"sphinx_design", "sphinx_design",
...@@ -49,7 +49,19 @@ extensions = [ ...@@ -49,7 +49,19 @@ extensions = [
] ]
myst_enable_extensions = [ myst_enable_extensions = [
"colon_fence", "colon_fence",
"fieldlist",
] ]
autodoc2_packages = [
{
"path": "../../vllm",
"exclude_dirs": ["__pycache__", "third_party"],
},
]
autodoc2_output_dir = "api"
autodoc2_render_plugin = "myst"
autodoc2_hidden_objects = ["dunder", "private", "inherited"]
autodoc2_sort_names = True
autodoc2_index_template = None
# Add any paths that contain templates here, relative to this directory. # Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates'] templates_path = ['_templates']
...@@ -77,6 +89,11 @@ html_theme_options = { ...@@ -77,6 +89,11 @@ html_theme_options = {
'repository_url': 'https://github.com/vllm-project/vllm', 'repository_url': 'https://github.com/vllm-project/vllm',
'use_repository_button': True, 'use_repository_button': True,
'use_edit_page_button': True, 'use_edit_page_button': True,
# Prevents the full API being added to the left sidebar of every page.
# Reduces build time by 2.5x and reduces build size from ~225MB to ~95MB.
'collapse_navbar': True,
# Makes API visible in the right sidebar on API reference pages.
'show_toc_level': 3,
} }
# Add any paths that contain custom static files (such as style sheets) here, # Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files, # relative to this directory. They are copied after the builtin static files,
...@@ -164,73 +181,64 @@ def linkcode_resolve(domain, info): ...@@ -164,73 +181,64 @@ def linkcode_resolve(domain, info):
return None return None
if not info['module']: if not info['module']:
return None return None
filename = info['module'].replace('.', '/')
module = info['module'] # Get path from module name
file = Path(f"{info['module'].replace('.', '/')}.py")
# try to determine the correct file and line number to link to path = REPO_ROOT / file
obj = sys.modules[module] if not path.exists():
path = REPO_ROOT / file.with_suffix("") / "__init__.py"
# get as specific as we can if not path.exists():
lineno: int = 0 return None
filename: str = ""
try: # Get the line number of the object
for part in info['fullname'].split('.'): with open(path) as f:
obj = getattr(obj, part) lines = f.readlines()
name = info['fullname'].split(".")[-1]
# Skip decorator wrappers by checking if the object is a function pattern = fr"^( {{4}})*((def|class) )?{name}\b.*"
# and has a __wrapped__ attribute (which decorators typically set) for lineno, line in enumerate(lines, 1):
while hasattr(obj, '__wrapped__'): if not line or line.startswith("#"):
obj = obj.__wrapped__ continue
if re.match(pattern, line):
if not (inspect.isclass(obj) or inspect.isfunction(obj) break
or inspect.ismethod(obj)):
obj = obj.__class__ # Get the class of the instance # If the line number is not found, return None
if lineno == len(lines):
lineno = inspect.getsourcelines(obj)[1] return None
filename = (inspect.getsourcefile(obj)
or f"{filename}.py").split("vllm/", 1)[1] # If the line number is found, create the URL
except Exception: filename = path.relative_to(REPO_ROOT)
# For some things, like a class member, won't work, so if "checkouts" in path.parts:
# we'll use the line number of the parent (the class)
pass
if filename.startswith("checkouts/"):
# a PR build on readthedocs # a PR build on readthedocs
pr_number = filename.split("/")[1] pr_number = REPO_ROOT.name
filename = filename.split("/", 2)[2]
base, branch = get_repo_base_and_branch(pr_number) base, branch = get_repo_base_and_branch(pr_number)
if base and branch: if base and branch:
return f"https://github.com/{base}/blob/{branch}/{filename}#L{lineno}" return f"https://github.com/{base}/blob/{branch}/{filename}#L{lineno}"
# Otherwise, link to the source file on the main branch # Otherwise, link to the source file on the main branch
return f"https://github.com/vllm-project/vllm/blob/main/{filename}#L{lineno}" return f"https://github.com/vllm-project/vllm/blob/main/{filename}#L{lineno}"
# Mock out external dependencies here, otherwise the autodoc pages may be blank. # Mock out external dependencies here, otherwise sphinx-argparse won't work.
autodoc_mock_imports = [ autodoc_mock_imports = [
"huggingface_hub",
"pydantic",
"zmq",
"cloudpickle",
"aiohttp",
"starlette",
"blake3", "blake3",
"compressed_tensors",
"cpuinfo", "cpuinfo",
"cv2",
"torch",
"transformers", "transformers",
"psutil", "psutil",
"prometheus_client",
"sentencepiece",
"vllm._C", "vllm._C",
"PIL", "PIL",
"numpy", "numpy",
'triton',
"tqdm", "tqdm",
"tensorizer", # The mocks below are required by
"pynvml", # docs/source/serving/openai_compatible_server.md's
"outlines", # vllm.entrypoints.openai.cli_args
"xgrammar", "openai",
"librosa", "fastapi",
"soundfile", "partial_json_parser",
"gguf",
"lark",
"decord",
] ]
for mock_target in autodoc_mock_imports: for mock_target in autodoc_mock_imports:
...@@ -241,18 +249,6 @@ for mock_target in autodoc_mock_imports: ...@@ -241,18 +249,6 @@ for mock_target in autodoc_mock_imports:
"been loaded into sys.modules when the sphinx build starts.", "been loaded into sys.modules when the sphinx build starts.",
mock_target) mock_target)
class MockedClassDocumenter(autodoc.ClassDocumenter):
"""Remove note about base class when a class is derived from object."""
def add_line(self, line: str, source: str, *lineno: int) -> None:
if line == " Bases: :py:class:`object`":
return
super().add_line(line, source, *lineno)
autodoc.ClassDocumenter = MockedClassDocumenter
intersphinx_mapping = { intersphinx_mapping = {
"python": ("https://docs.python.org/3", None), "python": ("https://docs.python.org/3", None),
"typing_extensions": "typing_extensions":
...@@ -264,7 +260,4 @@ intersphinx_mapping = { ...@@ -264,7 +260,4 @@ intersphinx_mapping = {
"psutil": ("https://psutil.readthedocs.io/en/stable", None), "psutil": ("https://psutil.readthedocs.io/en/stable", None),
} }
autodoc_preserve_defaults = True
autodoc_warningiserror = True
navigation_with_keys = False navigation_with_keys = False
# Deprecation Policy
This document outlines the official policy and process for deprecating features
in the vLLM project.
## Overview
vLLM uses a structured "deprecation pipeline" to guide the lifecycle of
deprecated features. This policy ensures that users are given clear and
sufficient notice when a feature is deprecated and that deprecations proceed in
a consistent and predictable manner.
We aim to strike a balance between continued innovation and respecting users’
reliance on existing functionality. Deprecations are tied to our **minor (Y)
releases** following semantic versioning (X.Y.Z), where:
- **X** is a major version (rare)
- **Y** is a minor version (used for significant changes, including deprecations/removals)
- **Z** is a patch version (used for fixes and safer enhancements)
Features that fall under this policy include (at a minimum) the following:
- CLI flags
- Environment variables
- Configuration files
- APIs in the OpenAI-compatible API server
- Public Python APIs for the `vllm` library
## Deprecation Pipeline
The deprecation process consists of several clearly defined stages that span
multiple Y releases:
**1. Deprecated (Still On By Default)**
- **Action**: Feature is marked as deprecated.
- **Timeline**: A removal version is explicitly stated in the deprecation
warning (e.g., "This will be removed in v0.10.0").
- **Communication**: Deprecation is noted in the following, as applicable:
- Help strings
- Log output
- API responses
- `/metrics` output (for metrics features)
- User-facing documentation
- Release notes
- GitHub Issue (RFC) for feedback
- Documentation and use of the `@typing_extensions.deprecated` decorator for Python APIs
**2.Deprecated (Off By Default)**
- **Action**: Feature is disabled by default, but can still be re-enabled via a
CLI flag or environment variable. Feature throws an error when used without
re-enabling.
- **Purpose**: Allows users who missed earlier warnings a temporary escape hatch
while signaling imminent removal. Ensures any remaining usage is clearly
surfaced and blocks silent breakage before full removal.
**3. Removed**
- **Action**: Feature is completely removed from the codebase.
- **Note**: Only features that have passed through the previous deprecation
stages will be removed.
## Example Timeline
Assume a feature is deprecated in `v0.9.0`.
| Release | Status |
|---------------|-------------------------------------------------------------------------------------------------|
| `v0.9.0` | Feature is deprecated with clear removal version listed. |
| `v0.10.0` | Feature is now off by default, throws an error when used, and can be re-enabled for legacy use. |
| `v0.11.0` | Feature is removed. |
## Important Guidelines
- **No Removals in Patch Releases**: Removing deprecated features in patch
(`.Z`) releases is disallowed to avoid surprising users.
- **Grace Period for Existing Deprecations**: Any feature deprecated **before
this policy** will have its grace period start **now**, not retroactively.
- **Documentation is Critical**: Ensure every stage of the pipeline is
documented clearly for users.
## Final Notes
This policy is a living document and may evolve as the needs of the project and
its users change. Community feedback is welcome and encouraged as we refine the
process.
...@@ -17,7 +17,7 @@ Unsure on where to start? Check out the following links for tasks to work on: ...@@ -17,7 +17,7 @@ Unsure on where to start? Check out the following links for tasks to work on:
- [Good first issues](https://github.com/vllm-project/vllm/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22good%20first%20issue%22) - [Good first issues](https://github.com/vllm-project/vllm/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22good%20first%20issue%22)
- [Selected onboarding tasks](gh-project:6) - [Selected onboarding tasks](gh-project:6)
- [New model requests](https://github.com/vllm-project/vllm/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22new%20model%22) - [New model requests](https://github.com/vllm-project/vllm/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22new-model%22)
- [Models with multi-modal capabilities](gh-project:10) - [Models with multi-modal capabilities](gh-project:10)
## License ## License
...@@ -40,6 +40,10 @@ pre-commit install --hook-type pre-commit --hook-type commit-msg ...@@ -40,6 +40,10 @@ pre-commit install --hook-type pre-commit --hook-type commit-msg
# You can manually run pre-commit with # You can manually run pre-commit with
pre-commit run --all-files pre-commit run --all-files
# To manually run something from CI that does not run
# locally by default, you can run:
pre-commit run mypy-3.9 --hook-stage manual --all-files
# Unit tests # Unit tests
pytest tests/ pytest tests/
``` ```
...@@ -54,6 +58,12 @@ Therefore, we recommend developing with Python 3.12 to minimise the chance of yo ...@@ -54,6 +58,12 @@ Therefore, we recommend developing with Python 3.12 to minimise the chance of yo
Currently, the repository is not fully checked by `mypy`. Currently, the repository is not fully checked by `mypy`.
::: :::
:::{note}
Currently, not all unit tests pass when run on CPU platforms. If you don't have access to a GPU
platform to run unit tests locally, rely on the continuous integration system to run the tests for
now.
:::
## Issues ## Issues
If you encounter a bug or have a feature request, please [search existing issues](https://github.com/vllm-project/vllm/issues?q=is%3Aissue) first to see if it has already been reported. If not, please [file a new issue](https://github.com/vllm-project/vllm/issues/new/choose), providing as much relevant information as possible. If you encounter a bug or have a feature request, please [search existing issues](https://github.com/vllm-project/vllm/issues?q=is%3Aissue) first to see if it has already been reported. If not, please [file a new issue](https://github.com/vllm-project/vllm/issues/new/choose), providing as much relevant information as possible.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment