README.md 1.15 KB
Newer Older
1
2
3
4
# Benchmark

We provide several profiling tools to benchmark our models.

5
## profile with dataset
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

Download the dataset below or create your own dataset.

```bash
wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
```

Profiling your model with `profile_throughput.py`

```bash
python profile_throughput.py \
 ShareGPT_V3_unfiltered_cleaned_split.json \
 /path/to/your/model \
 --concurrency 64
```

## profile without dataset

`profile_generation.py` perform benchmark with dummy data.

26
27
28
29
```shell
pip install nvidia-ml-py
```

30
31
```bash
python profile_generation.py \
32
33
 --model-path /path/to/your/model \
 --concurrency 1 8 --prompt-tokens 0 512 --completion-tokens 2048 512
34
35
36
37
38
39
40
```

## profile serving

Tools above profile models with Python API. `profile_serving.py` is used to do benchmark on serving.

```bash
41
42
wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json

43
44
45
python profile_serving.py \
    ${TritonServerAddress} \
    /path/to/tokenizer \
46
    ShareGPT_V3_unfiltered_cleaned_split.json \
47
48
    --concurrency 64
```