Unverified Commit 1cb194a0 authored by Cyrus Leung's avatar Cyrus Leung Committed by GitHub
Browse files

[Doc] Reorganize user guide (#18661)


Signed-off-by: default avatarDarkLight1337 <tlleungac@connect.ust.hk>
parent 2cd4d58d
# Security Guide
# Security
## Inter-Node Communication
......
......@@ -23,7 +23,7 @@ It'd be better to store the model in a local disk. Additionally, have a look at
## Out of memory
If the model is too large to fit in a single GPU, you will get an out-of-memory (OOM) error. Consider adopting [these options][reducing-memory-usage] to reduce the memory consumption.
If the model is too large to fit in a single GPU, you will get an out-of-memory (OOM) error. Consider adopting [these options](../configuration/conserving_memory.md) to reduce the memory consumption.
## Generation quality changed
......@@ -159,7 +159,7 @@ If you have seen a warning in your logs like this:
WARNING 12-11 14:50:37 multiproc_worker_utils.py:281] CUDA was previously
initialized. We must use the `spawn` multiprocessing start method. Setting
VLLM_WORKER_MULTIPROC_METHOD to 'spawn'. See
https://docs.vllm.ai/en/latest/getting_started/troubleshooting.html#python-multiprocessing
https://docs.vllm.ai/en/latest/usage/troubleshooting.html#python-multiprocessing
for more information.
```
......@@ -258,7 +258,7 @@ or:
ValueError: Model architectures ['<arch>'] are not supported for now. Supported architectures: [...]
```
But you are sure that the model is in the [list of supported models][supported-models], there may be some issue with vLLM's model resolution. In that case, please follow [these steps][model-resolution] to explicitly specify the vLLM implementation for the model.
But you are sure that the model is in the [list of supported models][supported-models], there may be some issue with vLLM's model resolution. In that case, please follow [these steps](../configuration/model_resolution.md) to explicitly specify the vLLM implementation for the model.
## Failed to infer device type
......
# vLLM V1 User Guide
# vLLM V1
V1 is now enabled by default for all supported use cases, and we will gradually enable it for every use case we plan to support. Please share any feedback on [GitHub](https://github.com/vllm-project/vllm) or in the [vLLM Slack](https://inviter.co/vllm-slack).
......
......@@ -164,7 +164,7 @@ def get_vllm_port() -> Optional[int]:
raise ValueError(
f"VLLM_PORT '{port}' appears to be a URI. "
"This may be caused by a Kubernetes service discovery issue"
"check the warning in: https://docs.vllm.ai/en/stable/serving/env_vars.html"
"check the warning in: https://docs.vllm.ai/en/stable/usage/env_vars.html"
)
except Exception:
pass
......
......@@ -2531,7 +2531,7 @@ def _maybe_force_spawn():
logger.warning(
"We must use the `spawn` multiprocessing start method. "
"Overriding VLLM_WORKER_MULTIPROC_METHOD to 'spawn'. "
"See https://docs.vllm.ai/en/latest/getting_started/"
"See https://docs.vllm.ai/en/latest/usage/"
"troubleshooting.html#python-multiprocessing "
"for more information. Reason: %s", reason)
os.environ["VLLM_WORKER_MULTIPROC_METHOD"] = "spawn"
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment