profile_generation.md 1.52 KB
Newer Older
zhouxiang's avatar
zhouxiang committed
1
# Profile Token Latency and Throughput
Lyu Han's avatar
Lyu Han committed
2

zhouxiang's avatar
zhouxiang committed
3
We profile the latency and throughput of generated tokens with fixed batch size and fixed input/output token.
Lyu Han's avatar
Lyu Han committed
4

zhouxiang's avatar
zhouxiang committed
5
The profiling script is `profile_generation.py`. Before running it, please install the lmdeploy precompiled package and download the profiling script:
Lyu Han's avatar
Lyu Han committed
6
7

```shell
zhouxiang's avatar
zhouxiang committed
8
pip install lmdeploy
Lyu Han's avatar
Lyu Han committed
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
git clone --depth=1 https://github.com/InternLM/lmdeploy
```

## Metrics

LMDeploy records test results like first token latency, token throughput (tokens/s), percentile data of each token's latency (P50, P75, P95, P99), GPU mem, etc.

`first_token_latency` is only reported in the case of streaming inference.

The formula for calculating `throughput` is:

$$
TokenThroughput = Number\\ of\\ generated\\ tokens/TotalTime
$$

Total time includes prefill time.

During the test process, all graphics cards on the node should not run any other programs, otherwise the statistics of GPU mem would be inaccurate.

zhouxiang's avatar
zhouxiang committed
28
29
30
## Profile

In this section, we take [internlm/internlm-7b](https://huggingface.co/internlm/internlm-7b) as an example to show how to profile the inference engines of LMDeploy.
Lyu Han's avatar
Lyu Han committed
31

zhouxiang's avatar
zhouxiang committed
32
### Profile turbomind engine
Lyu Han's avatar
Lyu Han committed
33
34
35

```shell
cd lmdeploy/benchmark
zhouxiang's avatar
zhouxiang committed
36
python3 profile_generation.py internlm/internlm-7b
Lyu Han's avatar
Lyu Han committed
37
38
```

zhouxiang's avatar
zhouxiang committed
39
### Profile pytorch engine
Lyu Han's avatar
Lyu Han committed
40
41

```shell
zhouxiang's avatar
zhouxiang committed
42
43
cd lmdeploy/benchmark
python3 profile_generation.py internlm/internlm-7b --backend pytorch
Lyu Han's avatar
Lyu Han committed
44
45
```

zhouxiang's avatar
zhouxiang committed
46
For detailed argument specification of `profile_generation.py`, such as batch size, input and output token number an so on, please run the help command `python3 profile_generation.py -h`.