Unverified Commit 82e2339b authored by Cyrus Leung's avatar Cyrus Leung Committed by GitHub
Browse files

[Doc] Move examples and further reorganize user guide (#18666)


Signed-off-by: default avatarDarkLight1337 <tlleungac@connect.ust.hk>
parent 9553fdb4
...@@ -6,11 +6,6 @@ ...@@ -6,11 +6,6 @@
[tool.ruff] [tool.ruff]
line-length = 88 line-length = 88
exclude = [
# External file, leaving license intact
"examples/other/fp8/quantizer/quantize.py",
"vllm/vllm_flash_attn/flash_attn_interface.pyi"
]
[tool.ruff.lint.per-file-ignores] [tool.ruff.lint.per-file-ignores]
"vllm/third_party/**" = ["ALL"] "vllm/third_party/**" = ["ALL"]
......
...@@ -246,7 +246,7 @@ steps: ...@@ -246,7 +246,7 @@ steps:
- python3 offline_inference/vision_language.py --seed 0 - python3 offline_inference/vision_language.py --seed 0
- python3 offline_inference/vision_language_embedding.py --seed 0 - python3 offline_inference/vision_language_embedding.py --seed 0
- python3 offline_inference/vision_language_multi_image.py --seed 0 - python3 offline_inference/vision_language_multi_image.py --seed 0
- VLLM_USE_V1=0 python3 other/tensorize_vllm_model.py --model facebook/opt-125m serialize --serialized-directory /tmp/ --suffix v1 && python3 other/tensorize_vllm_model.py --model facebook/opt-125m deserialize --path-to-tensors /tmp/vllm/facebook/opt-125m/v1/model.tensors - VLLM_USE_V1=0 python3 others/tensorize_vllm_model.py --model facebook/opt-125m serialize --serialized-directory /tmp/ --suffix v1 && python3 others/tensorize_vllm_model.py --model facebook/opt-125m deserialize --path-to-tensors /tmp/vllm/facebook/opt-125m/v1/model.tensors
- python3 offline_inference/encoder_decoder.py - python3 offline_inference/encoder_decoder.py
- python3 offline_inference/encoder_decoder_multimodal.py --model-type whisper --seed 0 - python3 offline_inference/encoder_decoder_multimodal.py --model-type whisper --seed 0
- python3 offline_inference/basic/classify.py - python3 offline_inference/basic/classify.py
......
...@@ -146,7 +146,7 @@ venv.bak/ ...@@ -146,7 +146,7 @@ venv.bak/
# mkdocs documentation # mkdocs documentation
/site /site
docs/getting_started/examples docs/examples
# mypy # mypy
.mypy_cache/ .mypy_cache/
......
...@@ -6,11 +6,6 @@ ...@@ -6,11 +6,6 @@
[tool.ruff] [tool.ruff]
line-length = 88 line-length = 88
exclude = [
# External file, leaving license intact
"examples/other/fp8/quantizer/quantize.py",
"vllm/vllm_flash_attn/flash_attn_interface.pyi"
]
[tool.ruff.lint.per-file-ignores] [tool.ruff.lint.per-file-ignores]
"vllm/third_party/**" = ["ALL"] "vllm/third_party/**" = ["ALL"]
......
...@@ -5,11 +5,9 @@ nav: ...@@ -5,11 +5,9 @@ nav:
- getting_started/quickstart.md - getting_started/quickstart.md
- getting_started/installation - getting_started/installation
- Examples: - Examples:
- Offline Inference: getting_started/examples/offline_inference - Offline Inference: examples/offline_inference
- Online Serving: getting_started/examples/online_serving - Online Serving: examples/online_serving
- Others: - Others: examples/others
- LMCache: getting_started/examples/lmcache
- getting_started/examples/other/*
- Quick Links: - Quick Links:
- User Guide: usage/README.md - User Guide: usage/README.md
- Developer Guide: contributing/README.md - Developer Guide: contributing/README.md
...@@ -19,6 +17,7 @@ nav: ...@@ -19,6 +17,7 @@ nav:
- Releases: https://github.com/vllm-project/vllm/releases - Releases: https://github.com/vllm-project/vllm/releases
- User Guide: - User Guide:
- Summary: usage/README.md - Summary: usage/README.md
- usage/v1_guide.md
- General: - General:
- usage/* - usage/*
- Inference and Serving: - Inference and Serving:
......
# Configuration Options # Configuration Options
This section lists the most common options for running the vLLM engine. This section lists the most common options for running vLLM.
For a full list, refer to the [configuration][configuration] page.
There are three main levels of configuration, from highest priority to lowest priority:
- [Request parameters][completions-api] and [input arguments][sampling-params]
- [Engine arguments](./engine_args.md)
- [Environment variables](./env_vars.md)
...@@ -61,7 +61,7 @@ These are documented under [Inferencing and Serving -> Production Metrics](../.. ...@@ -61,7 +61,7 @@ These are documented under [Inferencing and Serving -> Production Metrics](../..
### Grafana Dashboard ### Grafana Dashboard
vLLM also provides [a reference example](https://docs.vllm.ai/en/latest/getting_started/examples/prometheus_grafana.html) for how to collect and store these metrics using Prometheus and visualize them using a Grafana dashboard. vLLM also provides [a reference example](https://docs.vllm.ai/en/latest/examples/prometheus_grafana.html) for how to collect and store these metrics using Prometheus and visualize them using a Grafana dashboard.
The subset of metrics exposed in the Grafana dashboard gives us an indication of which metrics are especially important: The subset of metrics exposed in the Grafana dashboard gives us an indication of which metrics are especially important:
...@@ -673,7 +673,7 @@ v0 has support for OpenTelemetry tracing: ...@@ -673,7 +673,7 @@ v0 has support for OpenTelemetry tracing:
- [OpenTelemetry blog - [OpenTelemetry blog
post](https://opentelemetry.io/blog/2024/llm-observability/) post](https://opentelemetry.io/blog/2024/llm-observability/)
- [User-facing - [User-facing
docs](https://docs.vllm.ai/en/latest/getting_started/examples/opentelemetry.html) docs](https://docs.vllm.ai/en/latest/examples/opentelemetry.html)
- [Blog - [Blog
post](https://medium.com/@ronen.schaffer/follow-the-trail-supercharging-vllm-with-opentelemetry-distributed-tracing-aa655229b46f) post](https://medium.com/@ronen.schaffer/follow-the-trail-supercharging-vllm-with-opentelemetry-distributed-tracing-aa655229b46f)
- [IBM product - [IBM product
......
...@@ -9,7 +9,7 @@ from typing import Literal ...@@ -9,7 +9,7 @@ from typing import Literal
ROOT_DIR = Path(__file__).parent.parent.parent.parent ROOT_DIR = Path(__file__).parent.parent.parent.parent
ROOT_DIR_RELATIVE = '../../../../..' ROOT_DIR_RELATIVE = '../../../../..'
EXAMPLE_DIR = ROOT_DIR / "examples" EXAMPLE_DIR = ROOT_DIR / "examples"
EXAMPLE_DOC_DIR = ROOT_DIR / "docs/getting_started/examples" EXAMPLE_DOC_DIR = ROOT_DIR / "docs/examples"
print(ROOT_DIR.resolve()) print(ROOT_DIR.resolve())
print(EXAMPLE_DIR.resolve()) print(EXAMPLE_DIR.resolve())
print(EXAMPLE_DOC_DIR.resolve()) print(EXAMPLE_DOC_DIR.resolve())
......
...@@ -10,7 +10,7 @@ shorter Pod startup times and CPU memory usage. Tensor encryption is also suppor ...@@ -10,7 +10,7 @@ shorter Pod startup times and CPU memory usage. Tensor encryption is also suppor
For more information on CoreWeave's Tensorizer, please refer to For more information on CoreWeave's Tensorizer, please refer to
[CoreWeave's Tensorizer documentation](https://github.com/coreweave/tensorizer). For more information on serializing a vLLM model, as well a general usage guide to using Tensorizer with vLLM, see [CoreWeave's Tensorizer documentation](https://github.com/coreweave/tensorizer). For more information on serializing a vLLM model, as well a general usage guide to using Tensorizer with vLLM, see
the [vLLM example script](https://docs.vllm.ai/en/latest/getting_started/examples/tensorize_vllm_model.html). the [vLLM example script](https://docs.vllm.ai/en/latest/examples/tensorize_vllm_model.html).
!!! note !!! note
Note that to use this feature you will need to install `tensorizer` by running `pip install vllm[tensorizer]`. Note that to use this feature you will need to install `tensorizer` by running `pip install vllm[tensorizer]`.
...@@ -6,6 +6,6 @@ vLLM can be used to generate the completions for RLHF. The best way to do this i ...@@ -6,6 +6,6 @@ vLLM can be used to generate the completions for RLHF. The best way to do this i
See the following basic examples to get started if you don't want to use an existing library: See the following basic examples to get started if you don't want to use an existing library:
- [Training and inference processes are located on separate GPUs (inspired by OpenRLHF)](https://docs.vllm.ai/en/latest/getting_started/examples/rlhf.html) - [Training and inference processes are located on separate GPUs (inspired by OpenRLHF)](../examples/offline_inference/rlhf.md)
- [Training and inference processes are colocated on the same GPUs using Ray](https://docs.vllm.ai/en/latest/getting_started/examples/rlhf_colocate.html) - [Training and inference processes are colocated on the same GPUs using Ray](../examples/offline_inference/rlhf_colocate.md)
- [Utilities for performing RLHF with vLLM](https://docs.vllm.ai/en/latest/getting_started/examples/rlhf_utils.html) - [Utilities for performing RLHF with vLLM](../examples/offline_inference/rlhf_utils.md)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment