README.md 1.09 KB
Newer Older
1
2
3
4
# Benchmark

We provide several profiling tools to benchmark our models.

5
## profile with dataset
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

Download the dataset below or create your own dataset.

```bash
wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
```

Profiling your model with `profile_throughput.py`

```bash
python profile_throughput.py \
 ShareGPT_V3_unfiltered_cleaned_split.json \
 /path/to/your/model \
 --concurrency 64
```

## profile without dataset

`profile_generation.py` perform benchmark with dummy data.

```bash
python profile_generation.py \
 /path/to/your/model \
 --concurrency 8 --input_seqlen 0 --output_seqlen 2048
```

## profile serving

Tools above profile models with Python API. `profile_serving.py` is used to do benchmark on serving.

```bash
37
38
wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json

39
40
41
python profile_serving.py \
    ${TritonServerAddress} \
    /path/to/tokenizer \
42
    ShareGPT_V3_unfiltered_cleaned_split.json \
43
44
    --concurrency 64
```