test_process.md 2.04 KB
Newer Older
Ying Sheng's avatar
Ying Sheng committed
1
# SRT Unit Tests
Lianmin Zheng's avatar
Lianmin Zheng committed
2

3
### Latency Alignment
4
Make sure your changes do not slow down the following benchmarks
Lianmin Zheng's avatar
Lianmin Zheng committed
5
```
6
# single gpu
7
python -m sglang.bench_latency --model-path meta-llama/Llama-2-7b-chat-hf --mem-fraction-static 0.8 --batch 32 --input-len 512 --output-len 256
8
9
10
11
12
13
14
15
python -m sglang.bench_latency --model-path meta-llama/Llama-2-7b-chat-hf --mem-fraction-static 0.8 --batch 1 --input-len 512 --output-len 256

# multiple gpu
python -m sglang.bench_latency --model-path meta-llama/Meta-Llama-3-70B --tp 8 --mem-fraction-static 0.6 --batch 32 --input-len 8192 --output-len 1
python -m sglang.bench_latency --model-path meta-llama/Meta-Llama-3-70B --tp 8 --mem-fraction-static 0.6 --batch 1 --input-len 8100 --output-len 32

# moe model
python -m sglang.bench_latency --model-path databricks/dbrx-base --tp 8 --mem-fraction-static 0.6 --batch 4 --input-len 1024 --output-len 32
Lianmin Zheng's avatar
Lianmin Zheng committed
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
```

### High-level API

```
python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000
```

```
cd test/lang
python3 test_srt_backend.py
```

### Performance

#### MMLU
```
cd benchmark/mmlu
```
Follow README.md to download the data.

```
python3 bench_sglang.py --nsub 3

# Expected performance on A10G
# Total latency: 8.200
# Average accuracy: 0.413
```

45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
#### GSM-8K
```
cd benchmark/gsm8k
```
Follow README.md to download the data.

```
python3 bench_sglang.py --num-q 200

# Expected performance on A10G
# Latency: 32.103
# Accuracy: 0.250
```

#### More
Please also test `benchmark/hellaswag`, `benchmark/latency_throughput`.

Lianmin Zheng's avatar
Lianmin Zheng committed
62
63
64
65
66
67
68
69
70
71
72
### More Models

#### LLaVA

```
python3 -m sglang.launch_server --model-path liuhaotian/llava-v1.5-7b --tokenizer-path llava-hf/llava-1.5-7b-hf --port 30000
```

```
cd benchmark/llava_bench
python3 bench_sglang.py
73
74
75

# Expected performance on A10G
# Latency: 50.031
Lianmin Zheng's avatar
Lianmin Zheng committed
76
77
78
79
80
81
82
83
84
85
86
87
88
```

## SGLang Unit Tests
```
export ANTHROPIC_API_KEY=
export OPENAI_API_KEY=
python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000
```

```
cd test/lang
python3 run_all.py
```
Lianmin Zheng's avatar
Lianmin Zheng committed
89
90
91
92
93
94

## OpenAI API server
```
cd test/srt
python test_openai_server.py
```