evaluation_turbomind.md 3.21 KB
Newer Older
1
2
3
4
# Evaluation with LMDeploy

We now support evaluation of models accelerated by the [LMDeploy](https://github.com/InternLM/lmdeploy). LMDeploy is a toolkit designed for compressing, deploying, and serving LLM. **TurboMind** is an efficient inference engine proposed by LMDeploy. OpenCompass is compatible with TurboMind. We now illustrate how to evaluate a model with the support of TurboMind in OpenCompass.

Songyang Zhang's avatar
Songyang Zhang committed
5
## Setup
6

Songyang Zhang's avatar
Songyang Zhang committed
7
### Install OpenCompass
8
9
10

Please follow the [instructions](https://opencompass.readthedocs.io/en/latest/get_started.html) to install the OpenCompass and prepare the evaluation datasets.

Songyang Zhang's avatar
Songyang Zhang committed
11
### Install LMDeploy
12
13
14
15
16
17
18

Install lmdeploy via pip (python 3.8+)

```shell
pip install lmdeploy
```

Songyang Zhang's avatar
Songyang Zhang committed
19
## Evaluation
20

21
OpenCompass integrates both turbomind's python API and gRPC API for evaluation. And the former is highly recommended.
22

23
We take the InternLM-20B as example. Please download it from huggingface and convert it to turbomind's model format:
24
25
26
27
28
29

```shell
# 1. Download InternLM model(or use the cached model's checkpoint)

# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
30
git clone https://huggingface.co/internlm/internlm-20b /path/to/internlm-20b
31

32
33
34
35
36
37
# 2. Convert InternLM model to turbomind's format, and save it in the home folder of opencompass
lmdeploy convert internlm /path/to/internlm-20b \
    --dst-path {/home/folder/of/opencompass}/turbomind
```

**Note**:
38

39
If evaluating the InternLM Chat model, make sure to pass `internlm-chat` as the model name instead of `internlm` when converting the model format. The specific command is:
40

41
42
43
```shell
lmdeploy convert internlm-chat /path/to/internlm-20b-chat \
    --dst-path {/home/folder/of/opencompass}/turbomind
44
45
```

46
47
48
### Evaluation with Turbomind Python API (recommended)

In the home folder of OpenCompass, start evaluation by the following command:
49
50

```shell
51
python run.py configs/eval_internlm_turbomind.py -w outputs/turbomind/internlm-20b
52
53
```

54
55
56
You are expected to get the evaluation results after the inference and evaluation.

**Note**:
57

58
59
60
- If you evaluate the InternLM Chat model, please use configuration file `eval_internlm_chat_turbomind.py`
- If you evaluate the InternLM 7B model, please modify `eval_internlm_turbomind.py` or `eval_internlm_chat_turbomind.py` by changing to the setting `models = [internlm_7b]` in the last line.
- If you want to evaluate other chat models like Llama2, QWen-7B, Baichuan2-7B, you could change to the setting of `models` in `eval_internlm_chat_turbomind.py`.
61

62
63
64
### Evaluation with Turbomind gPRC API (optional)

In the home folder of OpenCompass, launch the Triton Inference Server:
65
66

```shell
67
bash turbomind/service_docker_up.sh
68
69
```

70
71
72
73
74
75
76
And start evaluation by the following command:

```shell
python run.py configs/eval_internlm_turbomind_tis.py -w outputs/turbomind-tis/internlm-20b
```

\*\*Note: \*\*
77

78
79
- If the InternLM Chat model is requested to be evaluated, please use config file `eval_internlm_chat_turbomind_tis.py`
- In `eval_internlm_turbomind_tis.py`, the configured Triton Inference Server (TIS) address is `tis_addr='0.0.0.0:33337'`. Please modify `tis_addr` to the IP address of the machine where the server is launched.
80
- If evaluating the InternLM 7B model, please modify the `models` configuration in `eval_internlm_xxx_turbomind_tis.py`.