"docs/vscode:/vscode.git/clone" did not exist on "3f674a49b5033a6ed778ab960e86e03cfa64aa1f"
Unverified Commit 43f3d9e6 authored by Rafael Vasquez's avatar Rafael Vasquez Committed by GitHub
Browse files

[CI/Build] Add markdown linter (#11857)


Signed-off-by: default avatarRafael Vasquez <rafvasq21@gmail.com>
parent b25cfab9
...@@ -7,7 +7,7 @@ OpenAI compatible API server. ...@@ -7,7 +7,7 @@ OpenAI compatible API server.
You can start the server using Python, or using [Docker](#deployment-docker): You can start the server using Python, or using [Docker](#deployment-docker):
```console ```console
$ vllm serve unsloth/Llama-3.2-1B-Instruct vllm serve unsloth/Llama-3.2-1B-Instruct
``` ```
Then query the endpoint to get the latest metrics from the server: Then query the endpoint to get the latest metrics from the server:
......
...@@ -303,6 +303,7 @@ vllm serve llava-hf/llava-onevision-qwen2-0.5b-ov-hf --task generate --max-model ...@@ -303,6 +303,7 @@ vllm serve llava-hf/llava-onevision-qwen2-0.5b-ov-hf --task generate --max-model
``` ```
Then, you can use the OpenAI client as follows: Then, you can use the OpenAI client as follows:
```python ```python
from openai import OpenAI from openai import OpenAI
......
...@@ -64,7 +64,7 @@ Dynamic quantization is also supported via the `quantization` option -- see [her ...@@ -64,7 +64,7 @@ Dynamic quantization is also supported via the `quantization` option -- see [her
#### Context length and batch size #### Context length and batch size
You can further reduce memory usage by limit the context length of the model (`max_model_len` option) You can further reduce memory usage by limiting the context length of the model (`max_model_len` option)
and the maximum batch size (`max_num_seqs` option). and the maximum batch size (`max_num_seqs` option).
```python ```python
......
...@@ -5,11 +5,13 @@ ...@@ -5,11 +5,13 @@
vLLM provides an HTTP server that implements OpenAI's [Completions API](https://platform.openai.com/docs/api-reference/completions), [Chat API](https://platform.openai.com/docs/api-reference/chat), and more! vLLM provides an HTTP server that implements OpenAI's [Completions API](https://platform.openai.com/docs/api-reference/completions), [Chat API](https://platform.openai.com/docs/api-reference/chat), and more!
You can start the server via the [`vllm serve`](#vllm-serve) command, or through [Docker](#deployment-docker): You can start the server via the [`vllm serve`](#vllm-serve) command, or through [Docker](#deployment-docker):
```bash ```bash
vllm serve NousResearch/Meta-Llama-3-8B-Instruct --dtype auto --api-key token-abc123 vllm serve NousResearch/Meta-Llama-3-8B-Instruct --dtype auto --api-key token-abc123
``` ```
To call the server, you can use the [official OpenAI Python client](https://github.com/openai/openai-python), or any other HTTP client. To call the server, you can use the [official OpenAI Python client](https://github.com/openai/openai-python), or any other HTTP client.
```python ```python
from openai import OpenAI from openai import OpenAI
client = OpenAI( client = OpenAI(
...@@ -50,6 +52,7 @@ In addition, we have the following custom APIs: ...@@ -50,6 +52,7 @@ In addition, we have the following custom APIs:
- Only applicable to [cross-encoder models](../models/pooling_models.md) (`--task score`). - Only applicable to [cross-encoder models](../models/pooling_models.md) (`--task score`).
(chat-template)= (chat-template)=
## Chat Template ## Chat Template
In order for the language model to support chat protocol, vLLM requires the model to include In order for the language model to support chat protocol, vLLM requires the model to include
...@@ -71,6 +74,7 @@ vLLM community provides a set of chat templates for popular models. You can find ...@@ -71,6 +74,7 @@ vLLM community provides a set of chat templates for popular models. You can find
With the inclusion of multi-modal chat APIs, the OpenAI spec now accepts chat messages in a new format which specifies With the inclusion of multi-modal chat APIs, the OpenAI spec now accepts chat messages in a new format which specifies
both a `type` and a `text` field. An example is provided below: both a `type` and a `text` field. An example is provided below:
```python ```python
completion = client.chat.completions.create( completion = client.chat.completions.create(
model="NousResearch/Meta-Llama-3-8B-Instruct", model="NousResearch/Meta-Llama-3-8B-Instruct",
...@@ -80,7 +84,7 @@ completion = client.chat.completions.create( ...@@ -80,7 +84,7 @@ completion = client.chat.completions.create(
) )
``` ```
Most chat templates for LLMs expect the `content` field to be a string, but there are some newer models like Most chat templates for LLMs expect the `content` field to be a string, but there are some newer models like
`meta-llama/Llama-Guard-3-1B` that expect the content to be formatted according to the OpenAI schema in the `meta-llama/Llama-Guard-3-1B` that expect the content to be formatted according to the OpenAI schema in the
request. vLLM provides best-effort support to detect this automatically, which is logged as a string like request. vLLM provides best-effort support to detect this automatically, which is logged as a string like
*"Detected the chat template content format to be..."*, and internally converts incoming requests to match *"Detected the chat template content format to be..."*, and internally converts incoming requests to match
...@@ -115,12 +119,12 @@ completion = client.chat.completions.create( ...@@ -115,12 +119,12 @@ completion = client.chat.completions.create(
## Extra HTTP Headers ## Extra HTTP Headers
Only `X-Request-Id` HTTP request header is supported for now. It can be enabled Only `X-Request-Id` HTTP request header is supported for now. It can be enabled
with `--enable-request-id-headers`. with `--enable-request-id-headers`.
> Note that enablement of the headers can impact performance significantly at high QPS > Note that enablement of the headers can impact performance significantly at high QPS
> rates. We recommend implementing HTTP headers at the router level (e.g. via Istio), > rates. We recommend implementing HTTP headers at the router level (e.g. via Istio),
> rather than within the vLLM layer for this reason. > rather than within the vLLM layer for this reason.
> See https://github.com/vllm-project/vllm/pull/11529 for more details. > See [this PR](https://github.com/vllm-project/vllm/pull/11529) for more details.
```python ```python
completion = client.chat.completions.create( completion = client.chat.completions.create(
...@@ -147,6 +151,7 @@ print(completion._request_id) ...@@ -147,6 +151,7 @@ print(completion._request_id)
## CLI Reference ## CLI Reference
(vllm-serve)= (vllm-serve)=
### `vllm serve` ### `vllm serve`
The `vllm serve` command is used to launch the OpenAI-compatible server. The `vllm serve` command is used to launch the OpenAI-compatible server.
...@@ -175,7 +180,7 @@ uvicorn-log-level: "info" ...@@ -175,7 +180,7 @@ uvicorn-log-level: "info"
To use the above config file: To use the above config file:
```bash ```bash
$ vllm serve SOME_MODEL --config config.yaml vllm serve SOME_MODEL --config config.yaml
``` ```
```{note} ```{note}
...@@ -186,6 +191,7 @@ The order of priorities is `command line > config file values > defaults`. ...@@ -186,6 +191,7 @@ The order of priorities is `command line > config file values > defaults`.
## API Reference ## API Reference
(completions-api)= (completions-api)=
### Completions API ### Completions API
Our Completions API is compatible with [OpenAI's Completions API](https://platform.openai.com/docs/api-reference/completions); Our Completions API is compatible with [OpenAI's Completions API](https://platform.openai.com/docs/api-reference/completions);
...@@ -212,6 +218,7 @@ The following extra parameters are supported: ...@@ -212,6 +218,7 @@ The following extra parameters are supported:
``` ```
(chat-api)= (chat-api)=
### Chat API ### Chat API
Our Chat API is compatible with [OpenAI's Chat Completions API](https://platform.openai.com/docs/api-reference/chat); Our Chat API is compatible with [OpenAI's Chat Completions API](https://platform.openai.com/docs/api-reference/chat);
...@@ -243,6 +250,7 @@ The following extra parameters are supported: ...@@ -243,6 +250,7 @@ The following extra parameters are supported:
``` ```
(embeddings-api)= (embeddings-api)=
### Embeddings API ### Embeddings API
Our Embeddings API is compatible with [OpenAI's Embeddings API](https://platform.openai.com/docs/api-reference/embeddings); Our Embeddings API is compatible with [OpenAI's Embeddings API](https://platform.openai.com/docs/api-reference/embeddings);
...@@ -284,6 +292,7 @@ For chat-like input (i.e. if `messages` is passed), these extra parameters are s ...@@ -284,6 +292,7 @@ For chat-like input (i.e. if `messages` is passed), these extra parameters are s
``` ```
(tokenizer-api)= (tokenizer-api)=
### Tokenizer API ### Tokenizer API
Our Tokenizer API is a simple wrapper over [HuggingFace-style tokenizers](https://huggingface.co/docs/transformers/en/main_classes/tokenizer). Our Tokenizer API is a simple wrapper over [HuggingFace-style tokenizers](https://huggingface.co/docs/transformers/en/main_classes/tokenizer).
...@@ -293,6 +302,7 @@ It consists of two endpoints: ...@@ -293,6 +302,7 @@ It consists of two endpoints:
- `/detokenize` corresponds to calling `tokenizer.decode()`. - `/detokenize` corresponds to calling `tokenizer.decode()`.
(pooling-api)= (pooling-api)=
### Pooling API ### Pooling API
Our Pooling API encodes input prompts using a [pooling model](../models/pooling_models.md) and returns the corresponding hidden states. Our Pooling API encodes input prompts using a [pooling model](../models/pooling_models.md) and returns the corresponding hidden states.
...@@ -302,6 +312,7 @@ The input format is the same as [Embeddings API](#embeddings-api), but the outpu ...@@ -302,6 +312,7 @@ The input format is the same as [Embeddings API](#embeddings-api), but the outpu
Code example: <gh-file:examples/online_serving/openai_pooling_client.py> Code example: <gh-file:examples/online_serving/openai_pooling_client.py>
(score-api)= (score-api)=
### Score API ### Score API
Our Score API applies a cross-encoder model to predict scores for sentence pairs. Our Score API applies a cross-encoder model to predict scores for sentence pairs.
......
...@@ -41,7 +41,7 @@ MYPY_VERSION=$(mypy --version | awk '{print $2}') ...@@ -41,7 +41,7 @@ MYPY_VERSION=$(mypy --version | awk '{print $2}')
CODESPELL_VERSION=$(codespell --version) CODESPELL_VERSION=$(codespell --version)
ISORT_VERSION=$(isort --vn) ISORT_VERSION=$(isort --vn)
CLANGFORMAT_VERSION=$(clang-format --version | awk '{print $3}') CLANGFORMAT_VERSION=$(clang-format --version | awk '{print $3}')
SPHINX_LINT_VERSION=$(sphinx-lint --version | awk '{print $2}') PYMARKDOWNLNT_VERSION=$(pymarkdownlnt version | awk '{print $1}')
# # params: tool name, tool version, required version # # params: tool name, tool version, required version
tool_version_check() { tool_version_check() {
...@@ -58,7 +58,7 @@ tool_version_check "mypy" "$MYPY_VERSION" ...@@ -58,7 +58,7 @@ tool_version_check "mypy" "$MYPY_VERSION"
tool_version_check "isort" "$ISORT_VERSION" tool_version_check "isort" "$ISORT_VERSION"
tool_version_check "codespell" "$CODESPELL_VERSION" tool_version_check "codespell" "$CODESPELL_VERSION"
tool_version_check "clang-format" "$CLANGFORMAT_VERSION" tool_version_check "clang-format" "$CLANGFORMAT_VERSION"
tool_version_check "sphinx-lint" "$SPHINX_LINT_VERSION" tool_version_check "pymarkdownlnt" "$PYMARKDOWNLNT_VERSION"
YAPF_FLAGS=( YAPF_FLAGS=(
'--recursive' '--recursive'
...@@ -316,6 +316,6 @@ else ...@@ -316,6 +316,6 @@ else
echo "✨🎉 Format check passed! Congratulations! 🎉✨" echo "✨🎉 Format check passed! Congratulations! 🎉✨"
fi fi
echo 'vLLM sphinx-lint:' echo 'vLLM doc-lint:'
tools/sphinx-lint.sh tools/doc-lint.sh
echo 'vLLM sphinx-lint: Done' echo 'vLLM doc-lint: Done'
...@@ -101,3 +101,9 @@ markers = [ ...@@ -101,3 +101,9 @@ markers = [
"skip_v1: do not run this test with v1", "skip_v1: do not run this test with v1",
"optional: optional tests that are automatically skipped, include --optional to run them", "optional: optional tests that are automatically skipped, include --optional to run them",
] ]
[tool.pymarkdown]
plugins.md013.enabled = false # line-length
plugins.md041.enabled = false # first-line-h1
plugins.md033.enabled = false # inline-html
plugins.md024.allow_different_nesting = true # no-duplicate-headers
...@@ -6,7 +6,7 @@ ruff==0.6.5 ...@@ -6,7 +6,7 @@ ruff==0.6.5
codespell==2.3.0 codespell==2.3.0
isort==5.13.2 isort==5.13.2
clang-format==18.1.5 clang-format==18.1.5
sphinx-lint==1.0.0 pymarkdownlnt==0.9.26
# type checking # type checking
mypy==1.11.1 mypy==1.11.1
......
#!/bin/bash
pymarkdownlnt scan docs -r
#!/bin/bash
sphinx-lint --disable trailing-whitespace,missing-final-newline docs
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment