Move cli args docs to its own page (#18228) (#18264)

Signed-off-by: Trevor Royer <troyer@redhat.com>

Move cli args docs to its own page (#18228) (#18264)
Signed-off-by: Trevor Royer <troyer@redhat.com>
55f1a468 · Trevor Royer · GitHub · fd195b19 · 55f1a468 · 55f1a468
Unverified Commit 55f1a468 authored May 16, 2025 by Trevor Royer Committed by GitHub May 16, 2025
4 changed files
--- a/docs/source/index.md
+++ b/docs/source/index.md
@@ -117,6 +117,7 @@ training/rlhf.md
 serving/offline_inference
 serving/openai_compatible_server
+serving/serve_args
 serving/multimodal_inputs
 serving/distributed_serving
 serving/metrics

--- a/docs/source/serving/engine_args.md
+++ b/docs/source/serving/engine_args.md
@@ -7,6 +7,8 @@ Engine arguments control the behavior of the vLLM engine.
 - For [offline inference](#offline-inference), they are part of the arguments to `LLM` class.
 - For [online serving](#openai-compatible-server), they are part of the arguments to `vllm serve`.
+For references to all arguments available from `vllm serve` see the [serve args](#serve-args) documentation.
 Below, you can find an explanation of every engine argument:
 <!--- pyml disable-num-lines 7 no-space-in-emphasis -->

--- a/docs/source/serving/openai_compatible_server.md
+++ b/docs/source/serving/openai_compatible_server.md
@@ -4,7 +4,7 @@
 vLLM provides an HTTP server that implements OpenAI's [Completions API](https://platform.openai.com/docs/api-reference/completions), [Chat API](https://platform.openai.com/docs/api-reference/chat), and more! This functionality lets you serve models and interact with them using an HTTP client.
-In your terminal, you can [install](../getting_started/installation.md) vLLM, then start the server with the [`vllm serve`](#vllm-serve) command. (You can also use our [Docker](#deployment-docker) image.)
+In your terminal, you can [install](../getting_started/installation.md) vLLM, then start the server with the [`vllm serve`](#serve-args) command. (You can also use our [Docker](#deployment-docker) image.)
 ```bash
 vllm serve NousResearch/Meta-Llama-3-8B-Instruct --dtype auto --api-key token-abc123
@@ -168,54 +168,6 @@ completion = client.completions.create(
 print(completion._request_id)
 ```
-## CLI Reference
-(vllm-serve)=
-### `vllm serve`
-The `vllm serve` command is used to launch the OpenAI-compatible server.
-:::{tip}
-The vast majority of command-line arguments are based on those for offline inference.
-See [here](configuration-options) for some common options.
-:::
-:::{argparse}
-:module: vllm.entrypoints.openai.cli_args
-:func: create_parser_for_docs
-:prog: vllm serve
-:::
-#### Configuration file
-You can load CLI arguments via a [YAML](https://yaml.org/) config file.
-The argument names must be the long form of those outlined [above](#vllm-serve).
-For example:
-```yaml
-# config.yaml
-model: meta-llama/Llama-3.1-8B-Instruct
-host: "127.0.0.1"
-port: 6379
-uvicorn-log-level: "info"
-```
-To use the above config file:
-```bash
-vllm serve --config config.yaml
-```
-:::{note}
-In case an argument is supplied simultaneously using command line and the config file, the value from the command line will take precedence.
-The order of priorities is `command line > config file values > defaults`.
-e.g. `vllm serve SOME_MODEL --config config.yaml`, SOME_MODEL takes precedence over `model` in config file.
-:::
 ## API Reference
 (completions-api)=

--- a/docs/source/serving/serve_args.md
+++ b/docs/source/serving/serve_args.md
+(serve-args)=
+# Server Arguments
+The `vllm serve` command is used to launch the OpenAI-compatible server.
+## CLI Arguments
+The following are all arguments available from the `vllm serve` command:
+<!--- pyml disable-num-lines 7 no-space-in-emphasis -->
+```{eval-rst}
+.. argparse::
+    :module: vllm.entrypoints.openai.cli_args
+    :func: create_parser_for_docs
+    :prog: vllm serve
+    :nodefaultconst:
+    :markdownhelp:
+```
+## Configuration file
+You can load CLI arguments via a [YAML](https://yaml.org/) config file.
+The argument names must be the long form of those outlined [above](#serve-args).
+For example:
+```yaml
+# config.yaml
+model: meta-llama/Llama-3.1-8B-Instruct
+host: "127.0.0.1"
+port: 6379
+uvicorn-log-level: "info"
+```
+To use the above config file:
+```bash
+vllm serve --config config.yaml
+```
+:::{note}
+In case an argument is supplied simultaneously using command line and the config file, the value from the command line will take precedence.
+The order of priorities is `command line > config file values > defaults`.
+e.g. `vllm serve SOME_MODEL --config config.yaml`, SOME_MODEL takes precedence over `model` in config file.
+:::