Unverified Commit bbc1b296 authored by cong-meta's avatar cong-meta Committed by GitHub
Browse files

Update troubleshooting.md and remind VLLM_TRACE_FUNCTION usage (#27069)


Signed-off-by: default avatarcong-meta <prowindy@hotmail.com>
parent acb1bfa6
...@@ -38,7 +38,7 @@ If other strategies don't solve the problem, it's likely that the vLLM instance ...@@ -38,7 +38,7 @@ If other strategies don't solve the problem, it's likely that the vLLM instance
- `export VLLM_LOG_STATS_INTERVAL=1.` to get log statistics more frequently for tracking running queue, waiting queue and cache hit states. - `export VLLM_LOG_STATS_INTERVAL=1.` to get log statistics more frequently for tracking running queue, waiting queue and cache hit states.
- `export CUDA_LAUNCH_BLOCKING=1` to identify which CUDA kernel is causing the problem. - `export CUDA_LAUNCH_BLOCKING=1` to identify which CUDA kernel is causing the problem.
- `export NCCL_DEBUG=TRACE` to turn on more logging for NCCL. - `export NCCL_DEBUG=TRACE` to turn on more logging for NCCL.
- `export VLLM_TRACE_FUNCTION=1` to record all function calls for inspection in the log files to tell which function crashes or hangs. Do not use this flag unless absolutely needed for debugging, it will cause significant delays in startup time. - `export VLLM_TRACE_FUNCTION=1` to record all function calls for inspection in the log files to tell which function crashes or hangs. (WARNING: This flag will slow down the token generation by **over 100x**. Do not use unless absolutely needed.)
## Breakpoints ## Breakpoints
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment