"vscode:/vscode.git/clone" did not exist on "b085224fb978c720c3f932c087e7b91a39fd34c3"
Unverified Commit e235be16 authored by simveit's avatar simveit Committed by GitHub
Browse files

Fix some issues with current docs. (#6588)

parent 5ccf8fe1
...@@ -9,9 +9,7 @@ ...@@ -9,9 +9,7 @@
"SGLang provides OpenAI-compatible APIs to enable a smooth transition from OpenAI services to self-hosted local models.\n", "SGLang provides OpenAI-compatible APIs to enable a smooth transition from OpenAI services to self-hosted local models.\n",
"A complete reference for the API is available in the [OpenAI API Reference](https://platform.openai.com/docs/guides/embeddings).\n", "A complete reference for the API is available in the [OpenAI API Reference](https://platform.openai.com/docs/guides/embeddings).\n",
"\n", "\n",
"This tutorial covers the embedding APIs for embedding models, such as \n", "This tutorial covers the embedding APIs for embedding models. For a list of the supported models see the [corresponding overview page](https://docs.sglang.ai/supported_models/embedding_models.html)\n"
"- [intfloat/e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) \n",
"- [Alibaba-NLP/gte-Qwen2-7B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-7B-instruct) \n"
] ]
}, },
{ {
......
...@@ -10,13 +10,7 @@ ...@@ -10,13 +10,7 @@
"A complete reference for the API is available in the [OpenAI API Reference](https://platform.openai.com/docs/guides/vision).\n", "A complete reference for the API is available in the [OpenAI API Reference](https://platform.openai.com/docs/guides/vision).\n",
"This tutorial covers the vision APIs for vision language models.\n", "This tutorial covers the vision APIs for vision language models.\n",
"\n", "\n",
"SGLang supports various vision language models such as Llama 3.2, LLaVA-OneVision, Qwen2.5-VL, Gemma3 and [more](https://docs.sglang.ai/supported_models/multimodal_language_models): \n", "SGLang supports various vision language models such as Llama 3.2, LLaVA-OneVision, Qwen2.5-VL, Gemma3 and [more](https://docs.sglang.ai/supported_models/multimodal_language_models).\n",
"- [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) \n",
"- [lmms-lab/llava-onevision-qwen2-72b-ov-chat](https://huggingface.co/lmms-lab/llava-onevision-qwen2-72b-ov-chat) \n",
"- [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)\n",
"- [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it)\n",
"- [openbmb/MiniCPM-V](https://huggingface.co/openbmb/MiniCPM-V)\n",
"- [deepseek-ai/deepseek-vl2](https://huggingface.co/deepseek-ai/deepseek-vl2)\n",
"\n", "\n",
"As an alternative to the OpenAI API, you can also use the [SGLang offline engine](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py)." "As an alternative to the OpenAI API, you can also use the [SGLang offline engine](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py)."
] ]
......
...@@ -28,6 +28,11 @@ The core features include: ...@@ -28,6 +28,11 @@ The core features include:
backend/openai_api_embeddings.ipynb backend/openai_api_embeddings.ipynb
backend/native_api.ipynb backend/native_api.ipynb
backend/offline_engine_api.ipynb backend/offline_engine_api.ipynb
.. toctree::
:maxdepth: 1
:caption: Advanced Backend Configurations
backend/server_arguments.md backend/server_arguments.md
backend/sampling_params.md backend/sampling_params.md
backend/hyperparameter_tuning.md backend/hyperparameter_tuning.md
...@@ -77,4 +82,4 @@ The core features include: ...@@ -77,4 +82,4 @@ The core features include:
references/general references/general
references/hardware references/hardware
references/advanced_deploy references/advanced_deploy
references/performance_tuning references/performance_analysis_and_optimization
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
SGLang provides many optimizations specifically designed for the DeepSeek models, making it the inference engine recommended by the official [DeepSeek team](https://github.com/deepseek-ai/DeepSeek-V3/tree/main?tab=readme-ov-file#62-inference-with-sglang-recommended) from Day 0. SGLang provides many optimizations specifically designed for the DeepSeek models, making it the inference engine recommended by the official [DeepSeek team](https://github.com/deepseek-ai/DeepSeek-V3/tree/main?tab=readme-ov-file#62-inference-with-sglang-recommended) from Day 0.
This document outlines current optimizations for DeepSeek. This document outlines current optimizations for DeepSeek.
Additionally, the SGLang team is actively developing enhancements following this [Roadmap](https://github.com/sgl-project/sglang/issues/2591). For an overview of the implemented features see the completed [Roadmap](https://github.com/sgl-project/sglang/issues/2591).
## Launch DeepSeek V3 with SGLang ## Launch DeepSeek V3 with SGLang
...@@ -221,6 +221,6 @@ Important Notes: ...@@ -221,6 +221,6 @@ Important Notes:
## FAQ ## FAQ
1. **Question**: What should I do if model loading takes too long and NCCL timeout occurs? **Q: Model loading is taking too long, and I'm encountering an NCCL timeout. What should I do?**
**Answer**: You can try to add `--dist-timeout 3600` when launching the model, this allows for 1-hour timeout. A: If you're experiencing extended model loading times and an NCCL timeout, you can try increasing the timeout duration. Add the argument `--dist-timeout 3600` when launching your model. This will set the timeout to one hour, which often resolves the issue.
Performance Tuning Performance Analysis & Optimization
==================== ===================================
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
......
...@@ -23,8 +23,6 @@ uv pip install "sglang[all]>=0.4.6.post5" ...@@ -23,8 +23,6 @@ uv pip install "sglang[all]>=0.4.6.post5"
1. Use `export CUDA_HOME=/usr/local/cuda-<your-cuda-version>` to set the `CUDA_HOME` environment variable. 1. Use `export CUDA_HOME=/usr/local/cuda-<your-cuda-version>` to set the `CUDA_HOME` environment variable.
2. Install FlashInfer first following [FlashInfer installation doc](https://docs.flashinfer.ai/installation.html), then install SGLang as described above. 2. Install FlashInfer first following [FlashInfer installation doc](https://docs.flashinfer.ai/installation.html), then install SGLang as described above.
- If you encounter `ImportError; cannot import name 'is_valid_list_of_images' from 'transformers.models.llama.image_processing_llama'`, try to use the specified version of `transformers` in [pyproject.toml](https://github.com/sgl-project/sglang/blob/main/python/pyproject.toml). Currently, just running `pip install transformers==4.51.1`.
## Method 2: From source ## Method 2: From source
```bash ```bash
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment