"SGLang provides OpenAI-compatible APIs to enable a smooth transition from OpenAI services to self-hosted local models.\n",
"SGLang provides OpenAI-compatible APIs to enable a smooth transition from OpenAI services to self-hosted local models.\n",
"A complete reference for the API is available in the [OpenAI API Reference](https://platform.openai.com/docs/guides/embeddings).\n",
"A complete reference for the API is available in the [OpenAI API Reference](https://platform.openai.com/docs/guides/embeddings).\n",
"\n",
"\n",
"This tutorial covers the embedding APIs for embedding models, such as \n",
"This tutorial covers the embedding APIs for embedding models. For a list of the supported models see the [corresponding overview page](https://docs.sglang.ai/supported_models/embedding_models.html)\n"
"A complete reference for the API is available in the [OpenAI API Reference](https://platform.openai.com/docs/guides/vision).\n",
"A complete reference for the API is available in the [OpenAI API Reference](https://platform.openai.com/docs/guides/vision).\n",
"This tutorial covers the vision APIs for vision language models.\n",
"This tutorial covers the vision APIs for vision language models.\n",
"\n",
"\n",
"SGLang supports various vision language models such as Llama 3.2, LLaVA-OneVision, Qwen2.5-VL, Gemma3 and [more](https://docs.sglang.ai/supported_models/multimodal_language_models): \n",
"SGLang supports various vision language models such as Llama 3.2, LLaVA-OneVision, Qwen2.5-VL, Gemma3 and [more](https://docs.sglang.ai/supported_models/multimodal_language_models).\n",
"As an alternative to the OpenAI API, you can also use the [SGLang offline engine](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py)."
"As an alternative to the OpenAI API, you can also use the [SGLang offline engine](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py)."
SGLang provides many optimizations specifically designed for the DeepSeek models, making it the inference engine recommended by the official [DeepSeek team](https://github.com/deepseek-ai/DeepSeek-V3/tree/main?tab=readme-ov-file#62-inference-with-sglang-recommended) from Day 0.
SGLang provides many optimizations specifically designed for the DeepSeek models, making it the inference engine recommended by the official [DeepSeek team](https://github.com/deepseek-ai/DeepSeek-V3/tree/main?tab=readme-ov-file#62-inference-with-sglang-recommended) from Day 0.
This document outlines current optimizations for DeepSeek.
This document outlines current optimizations for DeepSeek.
Additionally, the SGLang team is actively developing enhancements following this[Roadmap](https://github.com/sgl-project/sglang/issues/2591).
For an overview of the implemented features see the completed[Roadmap](https://github.com/sgl-project/sglang/issues/2591).
## Launch DeepSeek V3 with SGLang
## Launch DeepSeek V3 with SGLang
...
@@ -221,6 +221,6 @@ Important Notes:
...
@@ -221,6 +221,6 @@ Important Notes:
## FAQ
## FAQ
1.**Question**: What should I do if model loading takes too long and NCCL timeout occurs?
**Q: Model loading is taking too long, and I'm encountering an NCCL timeout. What should I do?**
**Answer**: You can try to add `--dist-timeout 3600` when launching the model, this allows for 1-hour timeout.
A: If you're experiencing extended model loading times and an NCCL timeout, you can try increasing the timeout duration. Add the argument`--dist-timeout 3600` when launching your model. This will set the timeout to one hour, which often resolves the issue.
1. Use `export CUDA_HOME=/usr/local/cuda-<your-cuda-version>` to set the `CUDA_HOME` environment variable.
1. Use `export CUDA_HOME=/usr/local/cuda-<your-cuda-version>` to set the `CUDA_HOME` environment variable.
2. Install FlashInfer first following [FlashInfer installation doc](https://docs.flashinfer.ai/installation.html), then install SGLang as described above.
2. Install FlashInfer first following [FlashInfer installation doc](https://docs.flashinfer.ai/installation.html), then install SGLang as described above.
- If you encounter `ImportError; cannot import name 'is_valid_list_of_images' from 'transformers.models.llama.image_processing_llama'`, try to use the specified version of `transformers` in [pyproject.toml](https://github.com/sgl-project/sglang/blob/main/python/pyproject.toml). Currently, just running `pip install transformers==4.51.1`.