Unverified Commit a1fe24d9 authored by Harry Mellor's avatar Harry Mellor Committed by GitHub
Browse files

Migrate docs from Sphinx to MkDocs (#18145)


Signed-off-by: default avatarHarry Mellor <19981378+hmellor@users.noreply.github.com>
parent d0bc2f81
...@@ -33,14 +33,13 @@ steps: ...@@ -33,14 +33,13 @@ steps:
- label: Documentation Build # 2min - label: Documentation Build # 2min
mirror_hardwares: [amdexperimental] mirror_hardwares: [amdexperimental]
working_dir: "/vllm-workspace/test_docs/docs" working_dir: "/vllm-workspace/test_docs"
fast_check: true fast_check: true
no_gpu: True no_gpu: True
commands: commands:
- pip install -r ../../requirements/docs.txt - pip install -r ../requirements/docs.txt
- SPHINXOPTS=\"-W\" make html # TODO: add `--strict` once warnings in docstrings are fixed
# Check API reference (if it fails, you may have missing mock imports) - mkdocs build
- grep \"sig sig-object py\" build/html/api/vllm/vllm.sampling_params.html
- label: Async Engine, Inputs, Utils, Worker Test # 24min - label: Async Engine, Inputs, Utils, Worker Test # 24min
mirror_hardwares: [amdexperimental] mirror_hardwares: [amdexperimental]
......
...@@ -77,11 +77,6 @@ instance/ ...@@ -77,11 +77,6 @@ instance/
# Scrapy stuff: # Scrapy stuff:
.scrapy .scrapy
# Sphinx documentation
docs/_build/
docs/source/getting_started/examples/
docs/source/api/vllm
# PyBuilder # PyBuilder
.pybuilder/ .pybuilder/
target/ target/
...@@ -151,6 +146,7 @@ venv.bak/ ...@@ -151,6 +146,7 @@ venv.bak/
# mkdocs documentation # mkdocs documentation
/site /site
docs/getting_started/examples
# mypy # mypy
.mypy_cache/ .mypy_cache/
......
...@@ -39,6 +39,7 @@ repos: ...@@ -39,6 +39,7 @@ repos:
rev: v0.9.29 rev: v0.9.29
hooks: hooks:
- id: pymarkdown - id: pymarkdown
exclude: '.*\.inc\.md'
args: [fix] args: [fix]
- repo: https://github.com/rhysd/actionlint - repo: https://github.com/rhysd/actionlint
rev: v1.7.7 rev: v1.7.7
......
...@@ -8,12 +8,8 @@ build: ...@@ -8,12 +8,8 @@ build:
tools: tools:
python: "3.12" python: "3.12"
sphinx: mkdocs:
configuration: docs/source/conf.py configuration: mkdocs.yaml
fail_on_warning: true
# If using Sphinx, optionally build your docs in additional formats such as PDF
formats: []
# Optionally declare the Python requirements required to build your docs # Optionally declare the Python requirements required to build your docs
python: python:
......
...@@ -329,7 +329,9 @@ COPY vllm/v1 /usr/local/lib/python3.12/dist-packages/vllm/v1 ...@@ -329,7 +329,9 @@ COPY vllm/v1 /usr/local/lib/python3.12/dist-packages/vllm/v1
# will not be imported by other tests # will not be imported by other tests
RUN mkdir test_docs RUN mkdir test_docs
RUN mv docs test_docs/ RUN mv docs test_docs/
RUN cp -r examples test_docs/
RUN mv vllm test_docs/ RUN mv vllm test_docs/
RUN mv mkdocs.yaml test_docs/
#################### TEST IMAGE #################### #################### TEST IMAGE ####################
#################### OPENAI API SERVER #################### #################### OPENAI API SERVER ####################
......
nav:
- Home:
- vLLM: README.md
- Getting Started:
- getting_started/quickstart.md
- getting_started/installation
- Examples:
- LMCache: getting_started/examples/lmcache
- getting_started/examples/offline_inference
- getting_started/examples/online_serving
- getting_started/examples/other
- Roadmap: https://roadmap.vllm.ai
- Releases: https://github.com/vllm-project/vllm/releases
- User Guide:
- Inference and Serving:
- serving/offline_inference.md
- serving/openai_compatible_server.md
- serving/*
- serving/integrations
- Training: training
- Deployment:
- deployment/*
- deployment/frameworks
- deployment/integrations
- Performance: performance
- Models:
- models/supported_models.md
- models/generative_models.md
- models/pooling_models.md
- models/extensions
- Features:
- features/compatibility_matrix.md
- features/*
- features/quantization
- Other:
- getting_started/*
- Developer Guide:
- contributing/overview.md
- glob: contributing/*
flatten_single_child_sections: true
- contributing/model
- Design Documents:
- V0: design
- V1: design/v1
- API Reference:
- api/README.md
- glob: api/vllm/*
preserve_directory_names: true
- Community:
- community/*
- vLLM Blog: https://blog.vllm.ai
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
clean:
@$(SPHINXBUILD) -M clean "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
rm -rf "$(SOURCEDIR)/getting_started/examples"
rm -rf "$(SOURCEDIR)/api/vllm"
# vLLM documents # Welcome to vLLM
## Build the docs <figure markdown="span">
![](./assets/logos/vllm-logo-text-light.png){ align="center" alt="vLLM" class="no-scaled-link" width="60%" }
- Make sure in `docs` directory </figure>
```bash <p style="text-align:center">
cd docs <strong>Easy, fast, and cheap LLM serving for everyone
``` </strong>
</p>
- Install the dependencies:
<p style="text-align:center">
```bash <script async defer src="https://buttons.github.io/buttons.js"></script>
pip install -r ../requirements/docs.txt <a class="github-button" href="https://github.com/vllm-project/vllm" data-show-count="true" data-size="large" aria-label="Star">Star</a>
``` <a class="github-button" href="https://github.com/vllm-project/vllm/subscription" data-icon="octicon-eye" data-size="large" aria-label="Watch">Watch</a>
<a class="github-button" href="https://github.com/vllm-project/vllm/fork" data-icon="octicon-repo-forked" data-size="large" aria-label="Fork">Fork</a>
- Clean the previous build (optional but recommended): </p>
```bash vLLM is a fast and easy-to-use library for LLM inference and serving.
make clean
``` Originally developed in the [Sky Computing Lab](https://sky.cs.berkeley.edu) at UC Berkeley, vLLM has evolved into a community-driven project with contributions from both academia and industry.
- Generate the HTML documentation: vLLM is fast with:
```bash - State-of-the-art serving throughput
make html - Efficient management of attention key and value memory with [**PagedAttention**](https://blog.vllm.ai/2023/06/20/vllm.html)
``` - Continuous batching of incoming requests
- Fast model execution with CUDA/HIP graph
## Open the docs with your browser - Quantization: [GPTQ](https://arxiv.org/abs/2210.17323), [AWQ](https://arxiv.org/abs/2306.00978), INT4, INT8, and FP8
- Optimized CUDA kernels, including integration with FlashAttention and FlashInfer.
- Serve the documentation locally: - Speculative decoding
- Chunked prefill
```bash
python -m http.server -d build/html/ vLLM is flexible and easy to use with:
```
- Seamless integration with popular HuggingFace models
This will start a local server at http://localhost:8000. You can now open your browser and view the documentation. - High-throughput serving with various decoding algorithms, including *parallel sampling*, *beam search*, and more
- Tensor parallelism and pipeline parallelism support for distributed inference
If port 8000 is already in use, you can specify a different port, for example: - Streaming outputs
- OpenAI-compatible API server
```bash - Support NVIDIA GPUs, AMD CPUs and GPUs, Intel CPUs, Gaudi® accelerators and GPUs, IBM Power CPUs, TPU, and AWS Trainium and Inferentia Accelerators.
python -m http.server 3000 -d build/html/ - Prefix caching support
``` - Multi-lora support
For more information, check out the following:
- [vLLM announcing blog post](https://vllm.ai) (intro to PagedAttention)
- [vLLM paper](https://arxiv.org/abs/2309.06180) (SOSP 2023)
- [How continuous batching enables 23x throughput in LLM inference while reducing p50 latency](https://www.anyscale.com/blog/continuous-batching-llm-inference) by Cade Daniel et al.
- [vLLM Meetups][meetups]
# Summary
[](){ #configuration }
## Configuration
API documentation for vLLM's configuration classes.
- [vllm.config.ModelConfig][]
- [vllm.config.CacheConfig][]
- [vllm.config.TokenizerPoolConfig][]
- [vllm.config.LoadConfig][]
- [vllm.config.ParallelConfig][]
- [vllm.config.SchedulerConfig][]
- [vllm.config.DeviceConfig][]
- [vllm.config.SpeculativeConfig][]
- [vllm.config.LoRAConfig][]
- [vllm.config.PromptAdapterConfig][]
- [vllm.config.MultiModalConfig][]
- [vllm.config.PoolerConfig][]
- [vllm.config.DecodingConfig][]
- [vllm.config.ObservabilityConfig][]
- [vllm.config.KVTransferConfig][]
- [vllm.config.CompilationConfig][]
- [vllm.config.VllmConfig][]
[](){ #offline-inference-api }
## Offline Inference
LLM Class.
- [vllm.LLM][]
LLM Inputs.
- [vllm.inputs.PromptType][]
- [vllm.inputs.TextPrompt][]
- [vllm.inputs.TokensPrompt][]
## vLLM Engines
Engine classes for offline and online inference.
- [vllm.LLMEngine][]
- [vllm.AsyncLLMEngine][]
## Inference Parameters
Inference parameters for vLLM APIs.
[](){ #sampling-params }
[](){ #pooling-params }
- [vllm.SamplingParams][]
- [vllm.PoolingParams][]
[](){ #multi-modality }
## Multi-Modality
vLLM provides experimental support for multi-modal models through the [vllm.multimodal][] package.
Multi-modal inputs can be passed alongside text and token prompts to [supported models][supported-mm-models]
via the `multi_modal_data` field in [vllm.inputs.PromptType][].
Looking to add your own multi-modal model? Please follow the instructions listed [here][supports-multimodal].
- [vllm.multimodal.MULTIMODAL_REGISTRY][]
### Inputs
User-facing inputs.
- [vllm.multimodal.inputs.MultiModalDataDict][]
Internal data structures.
- [vllm.multimodal.inputs.PlaceholderRange][]
- [vllm.multimodal.inputs.NestedTensors][]
- [vllm.multimodal.inputs.MultiModalFieldElem][]
- [vllm.multimodal.inputs.MultiModalFieldConfig][]
- [vllm.multimodal.inputs.MultiModalKwargsItem][]
- [vllm.multimodal.inputs.MultiModalKwargs][]
- [vllm.multimodal.inputs.MultiModalInputs][]
### Data Parsing
- [vllm.multimodal.parse][]
### Data Processing
- [vllm.multimodal.processing][]
### Memory Profiling
- [vllm.multimodal.profiling][]
### Registry
- [vllm.multimodal.registry][]
## Model Development
- [vllm.model_executor.models.interfaces_base][]
- [vllm.model_executor.models.interfaces][]
- [vllm.model_executor.models.adapters][]
search:
boost: 0.5
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment