Unverified Commit b4bab816 authored by Harry Mellor's avatar Harry Mellor Committed by GitHub
Browse files

Remove unnecessary explicit title anchors and use relative links instead (#20620)


Signed-off-by: default avatarHarry Mellor <19981378+hmellor@users.noreply.github.com>
parent b91cb3fa
--- ---
title: Offline Inference title: Offline Inference
--- ---
[](){ #offline-inference }
Offline inference is possible in your own code using vLLM's [`LLM`][vllm.LLM] class. Offline inference is possible in your own code using vLLM's [`LLM`][vllm.LLM] class.
...@@ -18,8 +17,8 @@ llm = LLM(model="facebook/opt-125m") ...@@ -18,8 +17,8 @@ llm = LLM(model="facebook/opt-125m")
After initializing the `LLM` instance, use the available APIs to perform model inference. After initializing the `LLM` instance, use the available APIs to perform model inference.
The available APIs depend on the model type: The available APIs depend on the model type:
- [Generative models][generative-models] output logprobs which are sampled from to obtain the final output text. - [Generative models](../models/generative_models.md) output logprobs which are sampled from to obtain the final output text.
- [Pooling models][pooling-models] output their hidden states directly. - [Pooling models](../models/pooling_models.md) output their hidden states directly.
!!! info !!! info
[API Reference][offline-inference-api] [API Reference][offline-inference-api]
......
--- ---
title: OpenAI-Compatible Server title: OpenAI-Compatible Server
--- ---
[](){ #serving-openai-compatible-server }
vLLM provides an HTTP server that implements OpenAI's [Completions API](https://platform.openai.com/docs/api-reference/completions), [Chat API](https://platform.openai.com/docs/api-reference/chat), and more! This functionality lets you serve models and interact with them using an HTTP client. vLLM provides an HTTP server that implements OpenAI's [Completions API](https://platform.openai.com/docs/api-reference/completions), [Chat API](https://platform.openai.com/docs/api-reference/chat), and more! This functionality lets you serve models and interact with them using an HTTP client.
In your terminal, you can [install](../getting_started/installation/README.md) vLLM, then start the server with the [`vllm serve`][serve-args] command. (You can also use our [Docker][deployment-docker] image.) In your terminal, you can [install](../getting_started/installation/README.md) vLLM, then start the server with the [`vllm serve`](../configuration/serve_args.md) command. (You can also use our [Docker](../deployment/docker.md) image.)
```bash ```bash
vllm serve NousResearch/Meta-Llama-3-8B-Instruct \ vllm serve NousResearch/Meta-Llama-3-8B-Instruct \
...@@ -208,7 +207,7 @@ you can use the [official OpenAI Python client](https://github.com/openai/openai ...@@ -208,7 +207,7 @@ you can use the [official OpenAI Python client](https://github.com/openai/openai
We support both [Vision](https://platform.openai.com/docs/guides/vision)- and We support both [Vision](https://platform.openai.com/docs/guides/vision)- and
[Audio](https://platform.openai.com/docs/guides/audio?audio-generation-quickstart-example=audio-in)-related parameters; [Audio](https://platform.openai.com/docs/guides/audio?audio-generation-quickstart-example=audio-in)-related parameters;
see our [Multimodal Inputs][multimodal-inputs] guide for more information. see our [Multimodal Inputs](../features/multimodal_inputs.md) guide for more information.
- *Note: `image_url.detail` parameter is not supported.* - *Note: `image_url.detail` parameter is not supported.*
Code example: <gh-file:examples/online_serving/openai_chat_completion_client.py> Code example: <gh-file:examples/online_serving/openai_chat_completion_client.py>
......
--- ---
title: Frequently Asked Questions title: Frequently Asked Questions
--- ---
[](){ #faq }
> Q: How can I serve multiple models on a single port using the OpenAI API? > Q: How can I serve multiple models on a single port using the OpenAI API?
...@@ -12,7 +11,7 @@ A: Assuming that you're referring to using OpenAI compatible server to serve mul ...@@ -12,7 +11,7 @@ A: Assuming that you're referring to using OpenAI compatible server to serve mul
> Q: Which model to use for offline inference embedding? > Q: Which model to use for offline inference embedding?
A: You can try [e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) and [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5); A: You can try [e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) and [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5);
more are listed [here][supported-models]. more are listed [here](../models/supported_models.md).
By extracting hidden states, vLLM can automatically convert text generation models like [Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B), By extracting hidden states, vLLM can automatically convert text generation models like [Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B),
[Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) into embedding models, [Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) into embedding models,
......
...@@ -4,7 +4,7 @@ vLLM exposes a number of metrics that can be used to monitor the health of the ...@@ -4,7 +4,7 @@ vLLM exposes a number of metrics that can be used to monitor the health of the
system. These metrics are exposed via the `/metrics` endpoint on the vLLM system. These metrics are exposed via the `/metrics` endpoint on the vLLM
OpenAI compatible API server. OpenAI compatible API server.
You can start the server using Python, or using [Docker][deployment-docker]: You can start the server using Python, or using [Docker](../deployment/docker.md):
```bash ```bash
vllm serve unsloth/Llama-3.2-1B-Instruct vllm serve unsloth/Llama-3.2-1B-Instruct
......
--- ---
title: Troubleshooting title: Troubleshooting
--- ---
[](){ #troubleshooting }
This document outlines some troubleshooting strategies you can consider. If you think you've discovered a bug, please [search existing issues](https://github.com/vllm-project/vllm/issues?q=is%3Aissue) first to see if it has already been reported. If not, please [file a new issue](https://github.com/vllm-project/vllm/issues/new/choose), providing as much relevant information as possible. This document outlines some troubleshooting strategies you can consider. If you think you've discovered a bug, please [search existing issues](https://github.com/vllm-project/vllm/issues?q=is%3Aissue) first to see if it has already been reported. If not, please [file a new issue](https://github.com/vllm-project/vllm/issues/new/choose), providing as much relevant information as possible.
...@@ -267,7 +266,7 @@ or: ...@@ -267,7 +266,7 @@ or:
ValueError: Model architectures ['<arch>'] are not supported for now. Supported architectures: [...] ValueError: Model architectures ['<arch>'] are not supported for now. Supported architectures: [...]
``` ```
But you are sure that the model is in the [list of supported models][supported-models], there may be some issue with vLLM's model resolution. In that case, please follow [these steps](../configuration/model_resolution.md) to explicitly specify the vLLM implementation for the model. But you are sure that the model is in the [list of supported models](../models/supported_models.md), there may be some issue with vLLM's model resolution. In that case, please follow [these steps](../configuration/model_resolution.md) to explicitly specify the vLLM implementation for the model.
## Failed to infer device type ## Failed to infer device type
......
...@@ -90,7 +90,7 @@ vLLM V1 currently excludes model architectures with the `SupportsV0Only` protoco ...@@ -90,7 +90,7 @@ vLLM V1 currently excludes model architectures with the `SupportsV0Only` protoco
!!! tip !!! tip
This corresponds to the V1 column in our [list of supported models][supported-models]. This corresponds to the V1 column in our [list of supported models](../models/supported_models.md).
See below for the status of models that are not yet supported or have more features planned in V1. See below for the status of models that are not yet supported or have more features planned in V1.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment