README.md 3.44 KB
Newer Older
Lianmin Zheng's avatar
Lianmin Zheng committed
1
# SGLang Documentation
2
We recommend new contributors start from writing documentation, which helps you quickly understand SGLang codebase. Most documentation files are located under the `docs/` folder. We prefer **Jupyter Notebooks** over Markdown so that all examples can be executed and validated by our docs CI pipeline.
Lianmin Zheng's avatar
Lianmin Zheng committed
3

4
## Docs Workflow
Lianmin Zheng's avatar
Lianmin Zheng committed
5

6
7
8
### Install Dependency

```bash
Lianmin Zheng's avatar
Lianmin Zheng committed
9
10
11
pip install -r requirements.txt
```

12
13
14
15
16
17
18
19
20
21
22
23
### Update Documentation

Update your Jupyter notebooks in the appropriate subdirectories under `docs/`. If you add new files, remember to update `index.rst` (or relevant `.rst` files) accordingly.

- **`pre-commit run --all-files`** manually runs all configured checks, applying fixes if possible. If it fails the first time, re-run it to ensure lint errors are fully resolved. Make sure your code passes all checks **before** creating a Pull Request.
- **Do not commit** directly to the `main` branch. Always create a new branch (e.g., `feature/my-new-feature`), push your changes, and open a PR from that branch.

```bash
# 1) Compile all Jupyter notebooks
make compile

# 2) Generate static HTML
Lianmin Zheng's avatar
Lianmin Zheng committed
24
25
make html

26
27
28
# 3) Preview documentation locally
# Open your browser at the displayed port to view the docs
bash serve.sh
Lianmin Zheng's avatar
Lianmin Zheng committed
29

30
31
32
33
# 4) Clean notebook outputs
# nbstripout removes notebook outputs so your PR stays clean
pip install nbstripout
find . -name '*.ipynb' -exec nbstripout {} \;
34

35
36
37
# 5) Pre-commit checks and create a PR
# After these checks pass, push your changes and open a PR on your branch
pre-commit run --all-files
Lianmin Zheng's avatar
Lianmin Zheng committed
38
```
39
---
40

41
### **Port Allocation and CI Efficiency**
42

43
**To launch and kill the server:**
44
45

```python
46
47
48
49
50
51
52
53
54
55
56
57
58
from sglang.test.test_utils import is_in_ci
from sglang.utils import wait_for_server, print_highlight, terminate_process

if is_in_ci():
    from patch import launch_server_cmd
else:
    from sglang.utils import launch_server_cmd

server_process, port = launch_server_cmd(
    """
python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct \
 --host 0.0.0.0
"""
59
60
)

61
wait_for_server(f"http://localhost:{port}")
62

63
# Terminate Server
64
terminate_process(server_process)
Lianmin Zheng's avatar
Lianmin Zheng committed
65
```
66
67

**To launch and kill the engine:**
Lianmin Zheng's avatar
Lianmin Zheng committed
68

69
70
71
72
```python
# Launch Engine
import sglang as sgl
import asyncio
73
74
75
76
from sglang.test.test_utils import is_in_ci

if is_in_ci():
    import patch
77
78
79
80
81

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

# Terminalte Engine
llm.shutdown()
82
```
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105

### **Why this approach?**

- **Dynamic Port Allocation**: Avoids port conflicts by selecting an available port at runtime, enabling multiple server instances to run in parallel.
- **Optimized for CI**: The `patch` version of `launch_server_cmd` and `sgl.Engine()` in CI environments helps manage GPU memory dynamically, preventing conflicts and improving test parallelism.
- **Better Parallel Execution**: Ensures smooth concurrent tests by avoiding fixed port collisions and optimizing memory usage.

### **Model Selection**

For demonstrations in the docs, **prefer smaller models** to reduce memory consumption and speed up inference. Running larger models in CI can lead to instability due to memory constraints.

### **Prompt Alignment Example**

When designing prompts, ensure they align with SGLang’s structured formatting. For example:

```python
prompt = """You are an AI assistant. Answer concisely and accurately.

User: What is the capital of France?
Assistant: The capital of France is Paris."""
```

This keeps responses aligned with expected behavior and improves reliability across different files.