Unverified Commit 371f7e4c authored by Cyrus Leung's avatar Cyrus Leung Committed by GitHub
Browse files

[Doc] Fix broken links and unlinked docs, add shortcuts to home sidebar (#18627)


Signed-off-by: default avatarDarkLight1337 <tlleungac@connect.ust.hk>
parent 15b45ffb
...@@ -9,6 +9,11 @@ nav: ...@@ -9,6 +9,11 @@ nav:
- getting_started/examples/offline_inference - getting_started/examples/offline_inference
- getting_started/examples/online_serving - getting_started/examples/online_serving
- getting_started/examples/other - getting_started/examples/other
- Quick Links:
- User Guide: serving/offline_inference.md
- Developer Guide: contributing/overview.md
- API Reference: api/README.md
- Timeline:
- Roadmap: https://roadmap.vllm.ai - Roadmap: https://roadmap.vllm.ai
- Releases: https://github.com/vllm-project/vllm/releases - Releases: https://github.com/vllm-project/vllm/releases
- User Guide: - User Guide:
...@@ -38,7 +43,7 @@ nav: ...@@ -38,7 +43,7 @@ nav:
- contributing/overview.md - contributing/overview.md
- glob: contributing/* - glob: contributing/*
flatten_single_child_sections: true flatten_single_child_sections: true
- contributing/model - Model Implementation: contributing/model
- Design Documents: - Design Documents:
- V0: design - V0: design
- V1: design/v1 - V1: design/v1
......
...@@ -33,14 +33,14 @@ These tests compare the model outputs of vLLM against [HF Transformers](https:// ...@@ -33,14 +33,14 @@ These tests compare the model outputs of vLLM against [HF Transformers](https://
#### Generative models #### Generative models
For [generative models][generative-models], there are two levels of correctness tests, as defined in <gh-file:tests/models/utils.py>: For [generative models](../../models/generative_models.md), there are two levels of correctness tests, as defined in <gh-file:tests/models/utils.py>:
- Exact correctness (`check_outputs_equal`): The text outputted by vLLM should exactly match the text outputted by HF. - Exact correctness (`check_outputs_equal`): The text outputted by vLLM should exactly match the text outputted by HF.
- Logprobs similarity (`check_logprobs_close`): The logprobs outputted by vLLM should be in the top-k logprobs outputted by HF, and vice versa. - Logprobs similarity (`check_logprobs_close`): The logprobs outputted by vLLM should be in the top-k logprobs outputted by HF, and vice versa.
#### Pooling models #### Pooling models
For [pooling models][pooling-models], we simply check the cosine similarity, as defined in <gh-file:tests/models/embedding/utils.py>. For [pooling models](../../models/pooling_models.md), we simply check the cosine similarity, as defined in <gh-file:tests/models/utils.py>.
[](){ #mm-processing-tests } [](){ #mm-processing-tests }
......
...@@ -170,7 +170,7 @@ A variety of speculative models of this type are available on HF hub: ...@@ -170,7 +170,7 @@ A variety of speculative models of this type are available on HF hub:
## Speculating using EAGLE based draft models ## Speculating using EAGLE based draft models
The following code configures vLLM to use speculative decoding where proposals are generated by The following code configures vLLM to use speculative decoding where proposals are generated by
an [EAGLE (Extrapolation Algorithm for Greater Language-model Efficiency)](https://arxiv.org/pdf/2401.15077) based draft model. A more detailed example for offline mode, including how to extract request level acceptance rate, can be found [here](<gh-file:examples/offline_inference/eagle.py>). an [EAGLE (Extrapolation Algorithm for Greater Language-model Efficiency)](https://arxiv.org/pdf/2401.15077) based draft model. A more detailed example for offline mode, including how to extract request level acceptance rate, can be found [here](gh-file:examples/offline_inference/eagle.py).
```python ```python
from vllm import LLM, SamplingParams from vllm import LLM, SamplingParams
......
...@@ -3,7 +3,7 @@ title: Supported Models ...@@ -3,7 +3,7 @@ title: Supported Models
--- ---
[](){ #supported-models } [](){ #supported-models }
vLLM supports [generative](generative-models) and [pooling](pooling-models) models across various tasks. vLLM supports [generative](./generative_models.md) and [pooling](./pooling_models.md) models across various tasks.
If a model supports more than one task, you can set the task via the `--task` argument. If a model supports more than one task, you can set the task via the `--task` argument.
For each task, we list the model architectures that have been implemented in vLLM. For each task, we list the model architectures that have been implemented in vLLM.
...@@ -376,7 +376,7 @@ Specified using `--task generate`. ...@@ -376,7 +376,7 @@ Specified using `--task generate`.
### Pooling Models ### Pooling Models
See [this page](pooling-models) for more information on how to use pooling models. See [this page](./pooling_models.md) for more information on how to use pooling models.
!!! warning !!! warning
Since some model architectures support both generative and pooling tasks, Since some model architectures support both generative and pooling tasks,
...@@ -628,7 +628,7 @@ Specified using `--task generate`. ...@@ -628,7 +628,7 @@ Specified using `--task generate`.
### Pooling Models ### Pooling Models
See [this page](pooling-models) for more information on how to use pooling models. See [this page](./pooling_models.md) for more information on how to use pooling models.
!!! warning !!! warning
Since some model architectures support both generative and pooling tasks, Since some model architectures support both generative and pooling tasks,
......
...@@ -5,7 +5,7 @@ title: OpenAI-Compatible Server ...@@ -5,7 +5,7 @@ title: OpenAI-Compatible Server
vLLM provides an HTTP server that implements OpenAI's [Completions API](https://platform.openai.com/docs/api-reference/completions), [Chat API](https://platform.openai.com/docs/api-reference/chat), and more! This functionality lets you serve models and interact with them using an HTTP client. vLLM provides an HTTP server that implements OpenAI's [Completions API](https://platform.openai.com/docs/api-reference/completions), [Chat API](https://platform.openai.com/docs/api-reference/chat), and more! This functionality lets you serve models and interact with them using an HTTP client.
In your terminal, you can [install](../getting_started/installation.md) vLLM, then start the server with the [`vllm serve`][serve-args] command. (You can also use our [Docker][deployment-docker] image.) In your terminal, you can [install](../getting_started/installation/README.md) vLLM, then start the server with the [`vllm serve`][serve-args] command. (You can also use our [Docker][deployment-docker] image.)
```bash ```bash
vllm serve NousResearch/Meta-Llama-3-8B-Instruct --dtype auto --api-key token-abc123 vllm serve NousResearch/Meta-Llama-3-8B-Instruct --dtype auto --api-key token-abc123
......
# Seed Parameter Behavior in vLLM # Seed Parameter Behavior
## Overview ## Overview
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment