Unverified Commit 1ae270c5 authored by Lianmin Zheng's avatar Lianmin Zheng Committed by GitHub
Browse files

[Doc] fix docs (#1949)

parent c77c1e05
...@@ -36,17 +36,17 @@ The core features include: ...@@ -36,17 +36,17 @@ The core features include:
:caption: Frontend Tutorial :caption: Frontend Tutorial
frontend/frontend.md frontend/frontend.md
frontend/choices_methods.md
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
:caption: References :caption: References
references/supported_models.md
references/sampling_params.md references/sampling_params.md
references/hyperparameter_tuning.md references/hyperparameter_tuning.md
references/supported_models.md
references/benchmark_and_profiling.md references/benchmark_and_profiling.md
references/choices_methods.md
references/custom_chat_template.md references/custom_chat_template.md
references/contributor_guide.md references/contributor_guide.md
references/troubleshooting.md references/troubleshooting.md
......
...@@ -26,9 +26,9 @@ Data parallelism is better for throughput. When there is enough GPU memory, alwa ...@@ -26,9 +26,9 @@ Data parallelism is better for throughput. When there is enough GPU memory, alwa
### Avoid out-of-memory by Tuning `--chunked-prefill-size`, `--mem-fraction-static`, `--max-running-requests` ### Avoid out-of-memory by Tuning `--chunked-prefill-size`, `--mem-fraction-static`, `--max-running-requests`
If you see out of memory (OOM) errors, you can try to tune the following parameters. If you see out of memory (OOM) errors, you can try to tune the following parameters.
If OOM happens during prefill, try to decrease `--chunked-prefill-size` to `4096` or `2048`. - If OOM happens during prefill, try to decrease `--chunked-prefill-size` to `4096` or `2048`.
If OOM happens during decoding, try to decrease `--max-running-requests`. - If OOM happens during decoding, try to decrease `--max-running-requests`.
You can also try to decrease `--mem-fraction-static`, which reduces the memory usage of the KV cache memory pool and helps both prefill and decoding. - You can also try to decrease `--mem-fraction-static`, which reduces the memory usage of the KV cache memory pool and helps both prefill and decoding.
### Try Advanced Options ### Try Advanced Options
- To enable the experimental overlapped scheduler, add `--enable-overlap-scheduler`. It overlaps CPU scheduler with GPU computation and can accelerate almost all workloads. This does not work for constrained decoding currenly. - To enable the experimental overlapped scheduler, add `--enable-overlap-scheduler`. It overlaps CPU scheduler with GPU computation and can accelerate almost all workloads. This does not work for constrained decoding currenly.
......
...@@ -4,9 +4,9 @@ This page lists some common errors and tips for fixing them. ...@@ -4,9 +4,9 @@ This page lists some common errors and tips for fixing them.
## CUDA out of memory ## CUDA out of memory
If you see out of memory (OOM) errors, you can try to tune the following parameters. If you see out of memory (OOM) errors, you can try to tune the following parameters.
If OOM happens during prefill, try to decrease `--chunked-prefill-size` to `4096` or `2048`. - If OOM happens during prefill, try to decrease `--chunked-prefill-size` to `4096` or `2048`.
If OOM happens during decoding, try to decrease `--max-running-requests`. - If OOM happens during decoding, try to decrease `--max-running-requests`.
You can also try to decrease `--mem-fraction-static`, which reduces the memory usage of the KV cache memory pool and helps both prefill and decoding. - You can also try to decrease `--mem-fraction-static`, which reduces the memory usage of the KV cache memory pool and helps both prefill and decoding.
## CUDA error: an illegal memory access was encountered ## CUDA error: an illegal memory access was encountered
This error may be due to kernel errors or out-of-memory issues. This error may be due to kernel errors or out-of-memory issues.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment