@@ -163,6 +163,7 @@ vLLM workers are configured through command-line arguments. Key parameters inclu
...
@@ -163,6 +163,7 @@ vLLM workers are configured through command-line arguments. Key parameters inclu
-`--model`: Model to serve (e.g., `Qwen/Qwen3-0.6B`)
-`--model`: Model to serve (e.g., `Qwen/Qwen3-0.6B`)
-`--is-prefill-worker`: Enable prefill-only mode for disaggregated serving
-`--is-prefill-worker`: Enable prefill-only mode for disaggregated serving
-`--metrics-endpoint-port`: Port for publishing KV metrics to Dynamo
-`--metrics-endpoint-port`: Port for publishing KV metrics to Dynamo
-`--connector`: Specify which kv_transfer_config you want vllm to use `[nixl, lmcache, kvbm, none]`. This is a helper flag which overwrites the engines KVTransferConfig.
See `args.py` for the full list of configuration options and their defaults.
See `args.py` for the full list of configuration options and their defaults.