README.md 1.71 KB
Newer Older
1
2
3
4
# Benchmark

We provide several profiling tools to benchmark our models.

5
## profile with dataset
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

Download the dataset below or create your own dataset.

```bash
wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
```

Profiling your model with `profile_throughput.py`

```bash
python profile_throughput.py \
 ShareGPT_V3_unfiltered_cleaned_split.json \
 /path/to/your/model \
 --concurrency 64
```

## profile without dataset

`profile_generation.py` perform benchmark with dummy data.

26
27
28
29
```shell
pip install nvidia-ml-py
```

30
31
```bash
python profile_generation.py \
32
 --model-path /path/to/your/model \
33
 --concurrency 1 8 --prompt-tokens 1 512 --completion-tokens 2048 512
34
35
36
37
38
39
40
```

## profile serving

Tools above profile models with Python API. `profile_serving.py` is used to do benchmark on serving.

```bash
41
42
wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json

43
44
python profile_serving.py \
    ${TritonServerAddress} \
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
    /path/to/tokenizer \ # ends with .model for most models. Otherwise, please pass model_path/triton_models/tokenizer.
    ShareGPT_V3_unfiltered_cleaned_split.json \
    --concurrency 64
```

## profile restful api

`profile_restful_api.py` is used to do benchmark on api server.

```bash
wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json

python profile_restful_api.py \
    ${ServerAddress} \
    /path/to/tokenizer \ # ends with .model for most models. Otherwise, please pass model_path/triton_models/tokenizer.
60
    ShareGPT_V3_unfiltered_cleaned_split.json \
61
62
    --concurrency 64
```