"src/vscode:/vscode.git/clone" did not exist on "eba7e7a6d7e0adecea4e8fd69c89deec646f401d"
Unverified Commit 3c7bfd7e authored by Chayenne's avatar Chayenne Committed by GitHub
Browse files

Docs: Fix layout with sub-section (#3710)

parent bb121214
...@@ -39,4 +39,6 @@ compile: ...@@ -39,4 +39,6 @@ compile:
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
clean: clean:
rm -rf $(BUILDDIR)/* logs/timing.log find . -name "*.ipynb" -exec nbstripout {} \;
rm -rf $(BUILDDIR)
rm -rf logs
...@@ -20,19 +20,16 @@ Update your Jupyter notebooks in the appropriate subdirectories under `docs/`. I ...@@ -20,19 +20,16 @@ Update your Jupyter notebooks in the appropriate subdirectories under `docs/`. I
# 1) Compile all Jupyter notebooks # 1) Compile all Jupyter notebooks
make compile make compile
# 2) Generate static HTML # 2) Compile and Preview documentation locally
make html
# 3) Preview documentation locally
# Open your browser at the displayed port to view the docs # Open your browser at the displayed port to view the docs
bash serve.sh bash serve.sh
# 4) Clean notebook outputs # 3) Clean notebook outputs
# nbstripout removes notebook outputs so your PR stays clean # nbstripout removes notebook outputs so your PR stays clean
pip install nbstripout pip install nbstripout
find . -name '*.ipynb' -exec nbstripout {} \; find . -name '*.ipynb' -exec nbstripout {} \;
# 5) Pre-commit checks and create a PR # 4) Pre-commit checks and create a PR
# After these checks pass, push your changes and open a PR on your branch # After these checks pass, push your changes and open a PR on your branch
pre-commit run --all-files pre-commit run --all-files
``` ```
......
# Custom Chat Template in SGLang Runtime # Custom Chat Template
**NOTE**: There are two chat template systems in SGLang project. This document is about setting a custom chat template for the OpenAI-compatible API server (defined at [conversation.py](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/conversation.py)). It is NOT related to the chat template used in the SGLang language frontend (defined at [chat_template.py](https://github.com/sgl-project/sglang/blob/main/python/sglang/lang/chat_template.py)). **NOTE**: There are two chat template systems in SGLang project. This document is about setting a custom chat template for the OpenAI-compatible API server (defined at [conversation.py](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/conversation.py)). It is NOT related to the chat template used in the SGLang language frontend (defined at [chat_template.py](https://github.com/sgl-project/sglang/blob/main/python/sglang/lang/chat_template.py)).
......
# Guide on Hyperparameter Tuning # Hyperparameter Tuning
## Achieving Peak Throughput ## Achieving Peak Throughput
Achieving a large batch size is the most important thing for attaining high throughput. Achieving a large batch size is the most important thing for attaining high throughput.
......
...@@ -4,7 +4,7 @@ ...@@ -4,7 +4,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Native APIs\n", "# SGLang Native APIs\n",
"\n", "\n",
"Apart from the OpenAI compatible APIs, the SGLang Runtime also provides its native server APIs. We introduce these following APIs:\n", "Apart from the OpenAI compatible APIs, the SGLang Runtime also provides its native server APIs. We introduce these following APIs:\n",
"\n", "\n",
......
# Sampling Parameters in SGLang Runtime # Sampling Parameters
This doc describes the sampling parameters of the SGLang Runtime. This doc describes the sampling parameters of the SGLang Runtime.
It is the low-level endpoint of the runtime. It is the low-level endpoint of the runtime.
......
...@@ -4,7 +4,7 @@ ...@@ -4,7 +4,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Quick Start: Sending Requests\n", "# Sending Requests\n",
"This notebook provides a quick-start guide to use SGLang in chat completions after installation.\n", "This notebook provides a quick-start guide to use SGLang in chat completions after installation.\n",
"\n", "\n",
"- For Vision Language Models, see [OpenAI APIs - Vision](../backend/openai_api_vision.ipynb).\n", "- For Vision Language Models, see [OpenAI APIs - Vision](../backend/openai_api_vision.ipynb).\n",
...@@ -16,16 +16,7 @@ ...@@ -16,16 +16,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Launch A Server\n", "## Launch A Server"
"\n",
"This code block is equivalent to executing \n",
"\n",
"```bash\n",
"python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct \\\n",
" --host 0.0.0.0\n",
"```\n",
"\n",
"in your terminal and wait for the server to be ready. Once the server is running, you can send test requests using curl or requests. The server implements the [OpenAI-compatible APIs](https://platform.openai.com/docs/api-reference/chat)."
] ]
}, },
{ {
...@@ -42,6 +33,9 @@ ...@@ -42,6 +33,9 @@
"else:\n", "else:\n",
" from sglang.utils import launch_server_cmd\n", " from sglang.utils import launch_server_cmd\n",
"\n", "\n",
"# This is equivalent to running the following command in your terminal\n",
"\n",
"# python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --host 0.0.0.0\n",
"\n", "\n",
"server_process, port = launch_server_cmd(\n", "server_process, port = launch_server_cmd(\n",
" \"\"\"\n", " \"\"\"\n",
......
# Frontend: Structured Generation Language (SGLang) # Structured Generation Language
The frontend language can be used with local models or API models. It is an alternative to the OpenAI API. You may find it easier to use for complex prompting workflow. The frontend language can be used with local models or API models. It is an alternative to the OpenAI API. You may find it easier to use for complex prompting workflow.
## Quick Start ## Quick Start
......
...@@ -12,7 +12,7 @@ The core features include: ...@@ -12,7 +12,7 @@ The core features include:
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
:caption: Getting Started :caption: Installation
start/install.md start/install.md
...@@ -26,10 +26,20 @@ The core features include: ...@@ -26,10 +26,20 @@ The core features include:
backend/openai_api_embeddings.ipynb backend/openai_api_embeddings.ipynb
backend/native_api.ipynb backend/native_api.ipynb
backend/offline_engine_api.ipynb backend/offline_engine_api.ipynb
backend/structured_outputs.ipynb backend/server_arguments.md
backend/sampling_params.md
backend/hyperparameter_tuning.md
.. toctree::
:maxdepth: 1
:caption: Advanced Features
backend/speculative_decoding.ipynb backend/speculative_decoding.ipynb
backend/structured_outputs.ipynb
backend/function_calling.ipynb backend/function_calling.ipynb
backend/server_arguments.md backend/custom_chat_template.md
backend/quantization.md
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
...@@ -44,48 +54,11 @@ The core features include: ...@@ -44,48 +54,11 @@ The core features include:
router/router.md router/router.md
References
==========
General
---------------------
.. toctree::
:maxdepth: 1
references/supported_models.md
references/contribution_guide.md
references/troubleshooting.md
references/faq.md
references/learn_more.md
Hardware
--------------------------
.. toctree::
:maxdepth: 1
references/AMD.md
references/amd_configure.md
references/nvidia_jetson.md
Advanced Models & Deployment
------------------------------
.. toctree::
:maxdepth: 1
references/deepseek.md
references/multi_node.md
references/multi_node_inference_k8s_lws.md
references/modelscope.md
Performance & Tuning
--------------------
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
:caption: References
references/sampling_params.md references/general
references/hyperparameter_tuning.md references/hardware
references/benchmark_and_profiling.md references/advanced_deploy
references/accuracy_evaluation.md references/performance_tuning
references/custom_chat_template.md
references/quantization.md
Multi-Node Deployment
==========================
.. toctree::
:maxdepth: 1
deepseek.md
multi_node.md
k8s.md
General Guidance
==========
.. toctree::
:maxdepth: 1
supported_models.md
contribution_guide.md
troubleshooting.md
faq.md
learn_more.md
modelscope.md
Hardware Supports
==========
.. toctree::
:maxdepth: 1
amd.md
nvidia_jetson.md
# Deploying a RoCE Network-Based SGLANG Two-Node Inference Service on a Kubernetes (K8S) Cluster # Kubernetes
This docs is for deploying a RoCE Network-Based SGLANG Two-Node Inference Service on a Kubernetes (K8S) Cluster.
LeaderWorkerSet (LWS) is a Kubernetes API that aims to address common deployment patterns of AI/ML inference workloads. A major use case is for multi-host/multi-node distributed inference. LeaderWorkerSet (LWS) is a Kubernetes API that aims to address common deployment patterns of AI/ML inference workloads. A major use case is for multi-host/multi-node distributed inference.
......
# Run Multi-Node Inference # Multi-Node Deployment
## Llama 3.1 405B ## Llama 3.1 405B
......
Performance Tuning
====================
.. toctree::
:maxdepth: 1
benchmark_and_profiling.md
accuracy_evaluation.md
...@@ -13,6 +13,7 @@ sphinx ...@@ -13,6 +13,7 @@ sphinx
sphinx-book-theme sphinx-book-theme
sphinx-copybutton sphinx-copybutton
sphinx-tabs sphinx-tabs
nbstripout
sphinxcontrib-mermaid sphinxcontrib-mermaid
urllib3<2.0.0 urllib3<2.0.0
gguf>=0.10.0 gguf>=0.10.0
make clean
make html
python3 -m http.server --d _build/html python3 -m http.server --d _build/html
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment