test_process.md 1.29 KB
Newer Older
Lianmin Zheng's avatar
Lianmin Zheng committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
## SRT Unit Tests

### Low-level API
```
cd sglang/test/srt/model

python3 test_llama_low_api.py
python3 test_llama_extend.py
python3 test_llava_low_api.py
python3 bench_llama_low_api.py
```

### High-level API

```
python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000
```

```
cd test/lang
python3 test_srt_backend.py
```

### Performance

#### MMLU
```
cd benchmark/mmlu
```
Follow README.md to download the data.

```
python3 bench_sglang.py --nsub 3

# Expected performance on A10G
# Total latency: 8.200
# Average accuracy: 0.413
```

40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
#### GSM-8K
```
cd benchmark/gsm8k
```
Follow README.md to download the data.

```
python3 bench_sglang.py --num-q 200

# Expected performance on A10G
# Latency: 32.103
# Accuracy: 0.250
```

#### More
Please also test `benchmark/hellaswag`, `benchmark/latency_throughput`.

Lianmin Zheng's avatar
Lianmin Zheng committed
57
58
59
60
61
62
63
64
65
66
67
### More Models

#### LLaVA

```
python3 -m sglang.launch_server --model-path liuhaotian/llava-v1.5-7b --tokenizer-path llava-hf/llava-1.5-7b-hf --port 30000
```

```
cd benchmark/llava_bench
python3 bench_sglang.py
68
69
70

# Expected performance on A10G
# Latency: 50.031
Lianmin Zheng's avatar
Lianmin Zheng committed
71
72
73
74
75
76
77
78
79
80
81
82
83
```

## SGLang Unit Tests
```
export ANTHROPIC_API_KEY=
export OPENAI_API_KEY=
python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000
```

```
cd test/lang
python3 run_all.py
```