evaluation_turbomind.md 2.17 KB
Newer Older
1
2
3
4
# Evaluation with LMDeploy

We now support evaluation of models accelerated by the [LMDeploy](https://github.com/InternLM/lmdeploy). LMDeploy is a toolkit designed for compressing, deploying, and serving LLM. **TurboMind** is an efficient inference engine proposed by LMDeploy. OpenCompass is compatible with TurboMind. We now illustrate how to evaluate a model with the support of TurboMind in OpenCompass.

Songyang Zhang's avatar
Songyang Zhang committed
5
## Setup
6

Songyang Zhang's avatar
Songyang Zhang committed
7
### Install OpenCompass
8
9
10

Please follow the [instructions](https://opencompass.readthedocs.io/en/latest/get_started.html) to install the OpenCompass and prepare the evaluation datasets.

Songyang Zhang's avatar
Songyang Zhang committed
11
### Install LMDeploy
12
13
14
15
16
17
18

Install lmdeploy via pip (python 3.8+)

```shell
pip install lmdeploy
```

Songyang Zhang's avatar
Songyang Zhang committed
19
## Evaluation
20
21
22

We take the InternLM as example.

Songyang Zhang's avatar
Songyang Zhang committed
23
### Step-1: Get InternLM model
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

```shell
# 1. Download InternLM model(or use the cached model's checkpoint)

# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/internlm/internlm-chat-7b /path/to/internlm-chat-7b

# if you want to clone without large files – just their pointers
# prepend your git clone with the following env var:
GIT_LFS_SKIP_SMUDGE=1

# 2. Convert InternLM model to turbomind's format, which will be in "./workspace" by default
python3 -m lmdeploy.serve.turbomind.deploy internlm-chat-7b /path/to/internlm-chat-7b

```

41
### Step-2: Launch Triton Inference Server
42
43

```shell
44
bash ./workspace/service_docker_up.sh
45
46
```

47
48
\*\*Note: \*\*In the implementation of turbomind, inference is "persistent". The "destroy" operation can lead to unexpected issues. Therefore, we temporarily use service interfaces for model evaluation. And we will integrate the Python API to OpenCompass when turbomind supports "destroy".

Songyang Zhang's avatar
Songyang Zhang committed
49
### Step-3: Evaluate the Converted Model
50
51
52
53
54
55
56
57

In the home folder of OpenCompass

```shell
python run.py configs/eval_internlm_chat_7b_turbomind.py -w outputs/turbomind
```

You are expected to get the evaluation results after the inference and evaluation.
58
59

\*\*Note: \*\*In `eval_internlm_chat_7b_turbomind.py`, the configured Triton Inference Server (TIS) address is `tis_addr='0.0.0.0:33337'`. Please modify `tis_addr` to the IP address of the machine where the server is launched.