Remove unnecessary explicit title anchors and use relative links instead (#20620)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

Remove unnecessary explicit title anchors and use relative links instead (#20620)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
b4bab816 · Harry Mellor · GitHub · b91cb3fa · b4bab816 · b4bab816
Unverified Commit b4bab816 authored Jul 08, 2025 by Harry Mellor Committed by GitHub Jul 08, 2025
6 changed files
--- a/docs/serving/offline_inference.md
+++ b/docs/serving/offline_inference.md
 ---
 title: Offline Inference
 ---
-[](){ #offline-inference }

 Offline inference is possible in your own code using vLLM's [`LLM`][vllm.LLM] class.

@@ -18,8 +17,8 @@ llm = LLM(model="facebook/opt-125m")
 After initializing the `LLM` instance, use the available APIs to perform model inference.
 The available APIs depend on the model type:

- [Generative models][generative-models] output logprobs which are sampled from to obtain the final output text.
- [Pooling models][pooling-models] output their hidden states directly.
+- [Generative models](../models/generative_models.md) output logprobs which are sampled from to obtain the final output text.
+- [Pooling models](../models/pooling_models.md) output their hidden states directly.

 !!! info
    [API Reference][offline-inference-api]

--- a/docs/serving/openai_compatible_server.md
+++ b/docs/serving/openai_compatible_server.md
 ---
 title: OpenAI-Compatible Server
 ---
-[](){ #serving-openai-compatible-server }

 vLLM provides an HTTP server that implements OpenAI's [Completions API](https://platform.openai.com/docs/api-reference/completions), [Chat API](https://platform.openai.com/docs/api-reference/chat), and more! This functionality lets you serve models and interact with them using an HTTP client.

-In your terminal, you can [install](../getting_started/installation/README.md) vLLM, then start the server with the [`vllm serve`][serve-args] command. (You can also use our [Docker][deployment-docker] image.)
+In your terminal, you can [install](../getting_started/installation/README.md) vLLM, then start the server with the [`vllm serve`](../configuration/serve_args.md) command. (You can also use our [Docker](../deployment/docker.md) image.)

 ```bash
 vllm serve NousResearch/Meta-Llama-3-8B-Instruct \
@@ -208,7 +207,7 @@ you can use the [official OpenAI Python client](https://github.com/openai/openai

 We support both [Vision](https://platform.openai.com/docs/guides/vision)- and
 [Audio](https://platform.openai.com/docs/guides/audio?audio-generation-quickstart-example=audio-in)-related parameters;
-see our [Multimodal Inputs][multimodal-inputs] guide for more information.
+see our [Multimodal Inputs](../features/multimodal_inputs.md) guide for more information.
 - *Note: `image_url.detail` parameter is not supported.*

 Code example: <gh-file:examples/online_serving/openai_chat_completion_client.py>

--- a/docs/usage/faq.md
+++ b/docs/usage/faq.md
 ---
 title: Frequently Asked Questions
 ---
-[](){ #faq }

 > Q: How can I serve multiple models on a single port using the OpenAI API?

@@ -12,7 +11,7 @@ A: Assuming that you're referring to using OpenAI compatible server to serve mul
 > Q: Which model to use for offline inference embedding?

 A: You can try [e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) and [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5);
-more are listed [here][supported-models].
+more are listed [here](../models/supported_models.md).

 By extracting hidden states, vLLM can automatically convert text generation models like [Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B),
 [Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) into embedding models,

--- a/docs/usage/metrics.md
+++ b/docs/usage/metrics.md
@@ -4,7 +4,7 @@ vLLM exposes a number of metrics that can be used to monitor the health of the
 system. These metrics are exposed via the `/metrics` endpoint on the vLLM
 OpenAI compatible API server.

-You can start the server using Python, or using [Docker][deployment-docker]:
+You can start the server using Python, or using [Docker](../deployment/docker.md):

 ```bash
 vllm serve unsloth/Llama-3.2-1B-Instruct

--- a/docs/usage/troubleshooting.md
+++ b/docs/usage/troubleshooting.md
 ---
 title: Troubleshooting
 ---
-[](){ #troubleshooting }

 This document outlines some troubleshooting strategies you can consider. If you think you've discovered a bug, please [search existing issues](https://github.com/vllm-project/vllm/issues?q=is%3Aissue) first to see if it has already been reported. If not, please [file a new issue](https://github.com/vllm-project/vllm/issues/new/choose), providing as much relevant information as possible.

@@ -267,7 +266,7 @@ or:
 ValueError: Model architectures ['<arch>'] are not supported for now. Supported architectures: [...]
 ```

-But you are sure that the model is in the [list of supported models][supported-models], there may be some issue with vLLM's model resolution. In that case, please follow [these steps](../configuration/model_resolution.md) to explicitly specify the vLLM implementation for the model.
+But you are sure that the model is in the [list of supported models](../models/supported_models.md), there may be some issue with vLLM's model resolution. In that case, please follow [these steps](../configuration/model_resolution.md) to explicitly specify the vLLM implementation for the model.

 ## Failed to infer device type


--- a/docs/usage/v1_guide.md
+++ b/docs/usage/v1_guide.md
@@ -90,7 +90,7 @@ vLLM V1 currently excludes model architectures with the `SupportsV0Only` protoco

 !!! tip

-    This corresponds to the V1 column in our [list of supported models][supported-models].
+    This corresponds to the V1 column in our [list of supported models](../models/supported_models.md).

 See below for the status of models that are not yet supported or have more features planned in V1.