troubleshooting.md 860 Bytes
Newer Older
Ying Sheng's avatar
Ying Sheng committed
1
2
3
4
# Troubleshooting

This page lists some common errors and tips for fixing them.

Lianmin Zheng's avatar
Lianmin Zheng committed
5
## CUDA out of memory
6
If you see out of memory (OOM) errors, you can try to tune the following parameters.
Lianmin Zheng's avatar
Lianmin Zheng committed
7
8
9
- If OOM happens during prefill, try to decrease `--chunked-prefill-size` to `4096` or `2048`.
- If OOM happens during decoding, try to decrease `--max-running-requests`.
- You can also try to decrease `--mem-fraction-static`, which reduces the memory usage of the KV cache memory pool and helps both prefill and decoding.
Lianmin Zheng's avatar
Lianmin Zheng committed
10

Ying Sheng's avatar
Ying Sheng committed
11
12
## CUDA error: an illegal memory access was encountered
This error may be due to kernel errors or out-of-memory issues.
Lianmin Zheng's avatar
Lianmin Zheng committed
13
- If it is a kernel error, it is not easy to fix. Please file an issue on the GitHub.
Xiaoyu Zhang's avatar
Xiaoyu Zhang committed
14
- If it is out-of-memory, sometimes it will report this error instead of "Out-of-memory." Please refer to the above section to avoid the OOM.