Unverified Commit 6cd40a5b authored by Cyrus Leung's avatar Cyrus Leung Committed by GitHub
Browse files

[Doc][4/N] Reorganize API Reference (#11843)


Signed-off-by: default avatarDarkLight1337 <tlleungac@connect.ust.hk>
parent aba8d6ee
...@@ -38,7 +38,7 @@ steps: ...@@ -38,7 +38,7 @@ steps:
- pip install -r requirements-docs.txt - pip install -r requirements-docs.txt
- SPHINXOPTS=\"-W\" make html - SPHINXOPTS=\"-W\" make html
# Check API reference (if it fails, you may have missing mock imports) # Check API reference (if it fails, you may have missing mock imports)
- grep \"sig sig-object py\" build/html/dev/sampling_params.html - grep \"sig sig-object py\" build/html/api/params.html
- label: Async Engine, Inputs, Utils, Worker Test # 24min - label: Async Engine, Inputs, Utils, Worker Test # 24min
fast_check: true fast_check: true
......
...@@ -2,8 +2,8 @@ ...@@ -2,8 +2,8 @@
# to run the OpenAI compatible server. # to run the OpenAI compatible server.
# Please update any changes made here to # Please update any changes made here to
# docs/source/dev/dockerfile/dockerfile.md and # docs/source/contributing/dockerfile/dockerfile.md and
# docs/source/assets/dev/dockerfile-stages-dependency.png # docs/source/assets/contributing/dockerfile-stages-dependency.png
ARG CUDA_VERSION=12.4.1 ARG CUDA_VERSION=12.4.1
#################### BASE BUILD IMAGE #################### #################### BASE BUILD IMAGE ####################
......
...@@ -11,18 +11,8 @@ vLLM provides experimental support for multi-modal models through the {mod}`vllm ...@@ -11,18 +11,8 @@ vLLM provides experimental support for multi-modal models through the {mod}`vllm
Multi-modal inputs can be passed alongside text and token prompts to [supported models](#supported-mm-models) Multi-modal inputs can be passed alongside text and token prompts to [supported models](#supported-mm-models)
via the `multi_modal_data` field in {class}`vllm.inputs.PromptType`. via the `multi_modal_data` field in {class}`vllm.inputs.PromptType`.
Currently, vLLM only has built-in support for image data. You can extend vLLM to process additional modalities
by following [this guide](#adding-multimodal-plugin).
Looking to add your own multi-modal model? Please follow the instructions listed [here](#enabling-multimodal-inputs). Looking to add your own multi-modal model? Please follow the instructions listed [here](#enabling-multimodal-inputs).
## Guides
```{toctree}
:maxdepth: 1
adding_multimodal_plugin
```
## Module Contents ## Module Contents
......
# Optional Parameters
Optional parameters for vLLM APIs.
(sampling-params)=
## Sampling Parameters
```{eval-rst}
.. autoclass:: vllm.SamplingParams
:members:
```
(pooling-params)=
## Pooling Parameters
```{eval-rst}
.. autoclass:: vllm.PoolingParams
:members:
```
...@@ -17,7 +17,7 @@ The edges of the build graph represent: ...@@ -17,7 +17,7 @@ The edges of the build graph represent:
- `RUN --mount=(.\*)from=...` dependencies (with a dotted line and an empty diamond arrow head) - `RUN --mount=(.\*)from=...` dependencies (with a dotted line and an empty diamond arrow head)
> ```{figure} ../../assets/dev/dockerfile-stages-dependency.png > ```{figure} /assets/contributing/dockerfile-stages-dependency.png
> :align: center > :align: center
> :alt: query > :alt: query
> :width: 100% > :width: 100%
......
...@@ -53,7 +53,7 @@ for output in outputs: ...@@ -53,7 +53,7 @@ for output in outputs:
``` ```
More API details can be found in the {doc}`Offline Inference More API details can be found in the {doc}`Offline Inference
</dev/offline_inference/offline_index>` section of the API docs. </api/offline_inference/index>` section of the API docs.
The code for the `LLM` class can be found in <gh-file:vllm/entrypoints/llm.py>. The code for the `LLM` class can be found in <gh-file:vllm/entrypoints/llm.py>.
......
(adding-multimodal-plugin)=
# Adding a Multimodal Plugin
This document teaches you how to add a new modality to vLLM.
Each modality in vLLM is represented by a {class}`~vllm.multimodal.MultiModalPlugin` and registered to {data}`~vllm.multimodal.MULTIMODAL_REGISTRY`.
For vLLM to recognize a new modality type, you have to create a new plugin and then pass it to {meth}`~vllm.multimodal.MultiModalRegistry.register_plugin`.
The remainder of this document details how to define custom {class}`~vllm.multimodal.MultiModalPlugin` s.
```{note}
This article is a work in progress.
```
% TODO: Add more instructions on how to add new plugins once embeddings is in.
# Pooling Parameters
```{eval-rst}
.. autoclass:: vllm.PoolingParams
:members:
```
# Sampling Parameters
```{eval-rst}
.. autoclass:: vllm.SamplingParams
:members:
```
...@@ -42,7 +42,7 @@ The first line of this example imports the classes {class}`~vllm.LLM` and {class ...@@ -42,7 +42,7 @@ The first line of this example imports the classes {class}`~vllm.LLM` and {class
from vllm import LLM, SamplingParams from vllm import LLM, SamplingParams
``` ```
The next section defines a list of input prompts and sampling parameters for text generation. The [sampling temperature](https://arxiv.org/html/2402.05201v1) is set to `0.8` and the [nucleus sampling probability](https://en.wikipedia.org/wiki/Top-p_sampling) is set to `0.95`. You can find more information about the sampling parameters [here](https://docs.vllm.ai/en/stable/dev/sampling_params.html). The next section defines a list of input prompts and sampling parameters for text generation. The [sampling temperature](https://arxiv.org/html/2402.05201v1) is set to `0.8` and the [nucleus sampling probability](https://en.wikipedia.org/wiki/Top-p_sampling) is set to `0.95`. You can find more information about the sampling parameters [here](#sampling-params).
```python ```python
prompts = [ prompts = [
......
...@@ -137,10 +137,10 @@ community/sponsors ...@@ -137,10 +137,10 @@ community/sponsors
:caption: API Reference :caption: API Reference
:maxdepth: 2 :maxdepth: 2
dev/sampling_params api/offline_inference/index
dev/pooling_params api/engine/index
dev/offline_inference/offline_index api/multimodal/index
dev/engine/engine_index api/params
``` ```
% Design Documents: Details about vLLM internals % Design Documents: Details about vLLM internals
...@@ -154,7 +154,6 @@ design/huggingface_integration ...@@ -154,7 +154,6 @@ design/huggingface_integration
design/plugin_system design/plugin_system
design/kernel/paged_attention design/kernel/paged_attention
design/input_processing/model_inputs_index design/input_processing/model_inputs_index
design/multimodal/multimodal_index
design/automatic_prefix_caching design/automatic_prefix_caching
design/multiprocessing design/multiprocessing
``` ```
......
...@@ -23,7 +23,7 @@ The available APIs depend on the type of model that is being run: ...@@ -23,7 +23,7 @@ The available APIs depend on the type of model that is being run:
Please refer to the above pages for more details about each API. Please refer to the above pages for more details about each API.
```{seealso} ```{seealso}
[API Reference](/dev/offline_inference/offline_index) [API Reference](/api/offline_inference/index)
``` ```
## Configuration Options ## Configuration Options
......
...@@ -195,7 +195,7 @@ Code example: <gh-file:examples/online_serving/openai_completion_client.py> ...@@ -195,7 +195,7 @@ Code example: <gh-file:examples/online_serving/openai_completion_client.py>
#### Extra parameters #### Extra parameters
The following [sampling parameters (click through to see documentation)](../dev/sampling_params.md) are supported. The following [sampling parameters](#sampling-params) are supported.
```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py ```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
:language: python :language: python
...@@ -226,7 +226,7 @@ Code example: <gh-file:examples/online_serving/openai_chat_completion_client.py> ...@@ -226,7 +226,7 @@ Code example: <gh-file:examples/online_serving/openai_chat_completion_client.py>
#### Extra parameters #### Extra parameters
The following [sampling parameters (click through to see documentation)](../dev/sampling_params.md) are supported. The following [sampling parameters](#sampling-params) are supported.
```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py ```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
:language: python :language: python
...@@ -259,7 +259,7 @@ Code example: <gh-file:examples/online_serving/openai_embedding_client.py> ...@@ -259,7 +259,7 @@ Code example: <gh-file:examples/online_serving/openai_embedding_client.py>
#### Extra parameters #### Extra parameters
The following [pooling parameters (click through to see documentation)](../dev/pooling_params.md) are supported. The following [pooling parameters](#pooling-params) are supported.
```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py ```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
:language: python :language: python
...@@ -447,7 +447,7 @@ Response: ...@@ -447,7 +447,7 @@ Response:
#### Extra parameters #### Extra parameters
The following [pooling parameters (click through to see documentation)](../dev/pooling_params.md) are supported. The following [pooling parameters](#pooling-params) are supported.
```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py ```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
:language: python :language: python
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment