- **Extensive Model Support**: Supports a wide range of generative models (Llama 3, Gemma 2, Mistral, QWen, DeepSeek, LLaVA, etc.) and embedding models (e5-mistral), with easy extensibility for integrating new models.
- **Extensive Model Support**: Supports a wide range of generative models (Llama 3, Gemma 2, Mistral, QWen, DeepSeek, LLaVA, etc.) and embedding models (e5-mistral), with easy extensibility for integrating new models.
- **Active Community**: SGLang is open-source and backed by an active community with industry adoption.
- **Active Community**: SGLang is open-source and backed by an active community with industry adoption.
@@ -194,7 +194,7 @@ Since we compute penalty algorithms through CUDA, the logic stores relevant para
...
@@ -194,7 +194,7 @@ Since we compute penalty algorithms through CUDA, the logic stores relevant para
You can run your own benchmark with desired parameters on your own hardware to make sure it's not OOMing before using.
You can run your own benchmark with desired parameters on your own hardware to make sure it's not OOMing before using.
Tuning `--mem-fraction-static` and/or `--max-running-requests` will help. See [here](hyperparameter_tuning.md#minor-tune---max-prefill-tokens---mem-fraction-static---max-running-requests) for more information.
Tuning `--mem-fraction-static` and/or `--max-running-requests` will help.
@@ -5,9 +5,9 @@ This page lists some common errors and tips for fixing them.
...
@@ -5,9 +5,9 @@ This page lists some common errors and tips for fixing them.
## CUDA error: an illegal memory access was encountered
## CUDA error: an illegal memory access was encountered
This error may be due to kernel errors or out-of-memory issues.
This error may be due to kernel errors or out-of-memory issues.
- If it is a kernel error, it is not easy to fix.
- If it is a kernel error, it is not easy to fix.
- If it is out-of-memory, sometimes it will report this error instead of "Out-of-memory." In this case, try setting a smaller value for `--mem-fraction-static`. The default value of `--mem-fraction-static` is around 0.8 - 0.9. https://github.com/sgl-project/sglang/blob/1edd4e07d6ad52f4f63e7f6beaa5987c1e1cf621/python/sglang/srt/server_args.py#L92-L102
- If it is out-of-memory, sometimes it will report this error instead of "Out-of-memory." In this case, try setting a smaller value for `--mem-fraction-static`. The default value of `--mem-fraction-static` is around 0.8 - 0.9.
## The server hangs
## The server hangs
If the server hangs, try disabling some optimizations when launching the server.
If the server hangs, try disabling some optimizations when launching the server.