experimentation.md 6.98 KB
Newer Older
Leymore's avatar
Leymore committed
1
2
# Task Execution and Monitoring

Ezra-Yu's avatar
Ezra-Yu committed
3
## Launching an Evaluation Task
Leymore's avatar
Leymore committed
4

5
The program entry for the evaluation task is `run.py`. The usage is as follows:
Leymore's avatar
Leymore committed
6
7

```shell
Fengzhe Zhou's avatar
Fengzhe Zhou committed
8
python run.py $EXP {--slurm | --dlc | None} [-p PARTITION] [-q QUOTATYPE] [--debug] [-m MODE] [-r [REUSE]] [-w WORKDIR] [-l] [--dry-run] [--dump-eval-details]
Leymore's avatar
Leymore committed
9
10
```

11
Task Configuration (`$EXP`):
Ezra-Yu's avatar
Ezra-Yu committed
12

13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
- `run.py` accepts a .py configuration file as task-related parameters, which must include the `datasets` and `models` fields.

  ```bash
  python run.py configs/eval_demo.py
  ```

- If no configuration file is provided, users can also specify models and datasets using `--models MODEL1 MODEL2 ...` and `--datasets DATASET1 DATASET2 ...`:

  ```bash
  python run.py --models hf_opt_350m hf_opt_125m --datasets siqa_gen winograd_ppl
  ```

- For HuggingFace related models, users can also define a model quickly in the command line through HuggingFace parameters and then specify datasets using `--datasets DATASET1 DATASET2 ...`.

  ```bash
  python run.py --datasets siqa_gen winograd_ppl \
  --hf-path huggyllama/llama-7b \  # HuggingFace model path
  --model-kwargs device_map='auto' \  # Parameters for constructing the model
  --tokenizer-kwargs padding_side='left' truncation='left' use_fast=False \  # Parameters for constructing the tokenizer
  --max-out-len 100 \  # Maximum sequence length the model can accept
  --max-seq-len 2048 \  # Maximum generated token count
  --batch-size 8 \  # Batch size
  --no-batch-padding \  # Disable batch padding and infer through a for loop to avoid accuracy loss
Tong Gao's avatar
Tong Gao committed
36
  --num-gpus 1  # Number of minimum required GPUs for this model
37
38
39
40
41
42
43
44
45
46
47
48
49
  ```

  Complete HuggingFace parameter descriptions:

  - `--hf-path`: HuggingFace model path
  - `--peft-path`: PEFT model path
  - `--tokenizer-path`: HuggingFace tokenizer path (if it's the same as the model path, it can be omitted)
  - `--model-kwargs`: Parameters for constructing the model
  - `--tokenizer-kwargs`: Parameters for constructing the tokenizer
  - `--max-out-len`: Maximum generated token count
  - `--max-seq-len`: Maximum sequence length the model can accept
  - `--no-batch-padding`: Disable batch padding and infer through a for loop to avoid accuracy loss
  - `--batch-size`: Batch size
Tong Gao's avatar
Tong Gao committed
50
  - `--num-gpus`: Number of GPUs required to run the model. Please note that this parameter is only used to determine the number of GPUs required to run the model, and does not affect the actual number of GPUs used for the task. Refer to [Efficient Evaluation](./evaluation.md) for more details.
51
52
53
54
55
56
57

Starting Methods:

- Running on local machine: `run.py $EXP`.
- Running with slurm: `run.py $EXP --slurm -p $PARTITION_name`.
- Running with dlc: `run.py $EXP --dlc --aliyun-cfg $AliYun_Cfg`
- Customized starting: `run.py $EXP`. Here, $EXP is the configuration file which includes the `eval` and `infer` fields. For detailed configurations, please refer to [Efficient Evaluation](./evaluation.md).
Ezra-Yu's avatar
Ezra-Yu committed
58

Leymore's avatar
Leymore committed
59
60
The parameter explanation is as follows:

Ezra-Yu's avatar
Ezra-Yu committed
61
62
63
64
65
66
67
- `-p`: Specify the slurm partition;
- `-q`: Specify the slurm quotatype (default is None), with optional values being reserved, auto, spot. This parameter may only be used in some slurm variants;
- `--debug`: When enabled, inference and evaluation tasks will run in single-process mode, and output will be echoed in real-time for debugging;
- `-m`: Running mode, default is `all`. It can be specified as `infer` to only run inference and obtain output results; if there are already model outputs in `{WORKDIR}`, it can be specified as `eval` to only run evaluation and obtain evaluation results; if the evaluation results are ready, it can be specified as `viz` to only run visualization, which summarizes the results in tables; if specified as `all`, a full run will be performed, which includes inference, evaluation, and visualization.
- `-r`: Reuse existing inference results, and skip the finished tasks. If followed by a timestamp, the result under that timestamp in the workspace path will be reused; otherwise, the latest result in the specified workspace path will be reused.
- `-w`: Specify the working path, default is `./outputs/default`.
- `-l`: Enable status reporting via Lark bot.
Leymore's avatar
Leymore committed
68
- `--dry-run`: When enabled, inference and evaluation tasks will be dispatched but won't actually run for debugging.
Fengzhe Zhou's avatar
Fengzhe Zhou committed
69
- `--dump-eval-details`: When enabled,evaluation under the `results` folder will include more details, such as the correctness of each sample.
Leymore's avatar
Leymore committed
70
71
72
73

Using run mode `-m all` as an example, the overall execution flow is as follows:

1. Read the configuration file, parse out the model, dataset, evaluator, and other configuration information
Ezra-Yu's avatar
Ezra-Yu committed
74
75
2. The evaluation task mainly includes three stages: inference `infer`, evaluation `eval`, and visualization `viz`. After task division by Partitioner, they are handed over to Runner for parallel execution. Individual inference and evaluation tasks are abstracted into `OpenICLInferTask` and `OpenICLEvalTask` respectively.
3. After each stage ends, the visualization stage will read the evaluation results in `results/` to generate a table.
Leymore's avatar
Leymore committed
76
77
78
79
80
81
82
83
84

## Task Monitoring: Lark Bot

Users can enable real-time monitoring of task status by setting up a Lark bot. Please refer to [this document](https://open.feishu.cn/document/ukTMukTMukTM/ucTM5YjL3ETO24yNxkjN?lang=zh-CN#7a28964d) for setting up the Lark bot.

Configuration method:

1. Open the `configs/lark.py` file, and add the following line:

Ezra-Yu's avatar
Ezra-Yu committed
85
86
87
   ```python
   lark_bot_url = 'YOUR_WEBHOOK_URL'
   ```
Leymore's avatar
Leymore committed
88

Ezra-Yu's avatar
Ezra-Yu committed
89
   Typically, the Webhook URL is formatted like this: https://open.feishu.cn/open-apis/bot/v2/hook/xxxxxxxxxxxxxxxxx .
Leymore's avatar
Leymore committed
90
91
92

2. Inherit this file in the complete evaluation configuration:

Ezra-Yu's avatar
Ezra-Yu committed
93
94
   ```python
     from mmengine.config import read_base
Leymore's avatar
Leymore committed
95

Ezra-Yu's avatar
Ezra-Yu committed
96
97
     with read_base():
         from .lark import lark_bot_url
Leymore's avatar
Leymore committed
98

Ezra-Yu's avatar
Ezra-Yu committed
99
   ```
Leymore's avatar
Leymore committed
100
101
102

3. To avoid frequent messages from the bot becoming a nuisance, status updates are not automatically reported by default. You can start status reporting using `-l` or `--lark` when needed:

Ezra-Yu's avatar
Ezra-Yu committed
103
104
105
   ```bash
   python run.py configs/eval_demo.py -p {PARTITION} -l
   ```
Leymore's avatar
Leymore committed
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126

## Run Results

All run results will be placed in `outputs/default/` directory by default, the directory structure is shown below:

```
outputs/default/
├── 20200220_120000
├── ...
├── 20230220_183030
│   ├── configs
│   ├── logs
│   │   ├── eval
│   │   └── infer
│   ├── predictions
│   │   └── MODEL1
│   └── results
│       └── MODEL1
```

Each timestamp contains the following content:
Ezra-Yu's avatar
Ezra-Yu committed
127

Leymore's avatar
Leymore committed
128
129
130
131
132
133
- configs folder, which stores the configuration files corresponding to each run with this timestamp as the output directory;
- logs folder, which stores the output log files of the inference and evaluation phases, each folder will store logs in subfolders by model;
- predictions folder, which stores the inferred json results, with a model subfolder;
- results folder, which stores the evaluated json results, with a model subfolder.

Also, all `-r` without specifying a corresponding timestamp will select the newest folder by sorting as the output directory.
Ezra-Yu's avatar
Ezra-Yu committed
134
135

## Introduction of Summerizer (to be updated)