experimentation.md 4.8 KB
Newer Older
Leymore's avatar
Leymore committed
1
2
# Task Execution and Monitoring

Ezra-Yu's avatar
Ezra-Yu committed
3
## Launching an Evaluation Task
Leymore's avatar
Leymore committed
4

Ezra-Yu's avatar
Ezra-Yu committed
5
The program entry for the evaluation task is `run.py`, its usage is as follows:
Leymore's avatar
Leymore committed
6
7

```shell
Leymore's avatar
Leymore committed
8
python run.py $Config {--slurm | --dlc | None} [-p PARTITION] [-q QUOTATYPE] [--debug] [-m MODE] [-r [REUSE]] [-w WORKDIR] [-l] [--dry-run]
Leymore's avatar
Leymore committed
9
10
```

Ezra-Yu's avatar
Ezra-Yu committed
11
12
13
14
15
16
17
Here are some examples for launching the task in different environments:

- Running locally: `run.py $Config`, where `$Config` does not contain fields 'eval' and 'infer'.
- Running with Slurm: `run.py $Config --slurm -p $PARTITION_name`.
- Running on ALiYun DLC: `run.py $Config --dlc --aliyun-cfg $AliYun_Cfg`, tutorial will come later.
- Customized run: `run.py $Config`, where `$Config` contains fields 'eval' and 'infer', and you are able to customize the way how each task will be split and launched. See [Evaluation document](./evaluation.md).

Leymore's avatar
Leymore committed
18
19
The parameter explanation is as follows:

Ezra-Yu's avatar
Ezra-Yu committed
20
21
22
23
24
25
26
- `-p`: Specify the slurm partition;
- `-q`: Specify the slurm quotatype (default is None), with optional values being reserved, auto, spot. This parameter may only be used in some slurm variants;
- `--debug`: When enabled, inference and evaluation tasks will run in single-process mode, and output will be echoed in real-time for debugging;
- `-m`: Running mode, default is `all`. It can be specified as `infer` to only run inference and obtain output results; if there are already model outputs in `{WORKDIR}`, it can be specified as `eval` to only run evaluation and obtain evaluation results; if the evaluation results are ready, it can be specified as `viz` to only run visualization, which summarizes the results in tables; if specified as `all`, a full run will be performed, which includes inference, evaluation, and visualization.
- `-r`: Reuse existing inference results, and skip the finished tasks. If followed by a timestamp, the result under that timestamp in the workspace path will be reused; otherwise, the latest result in the specified workspace path will be reused.
- `-w`: Specify the working path, default is `./outputs/default`.
- `-l`: Enable status reporting via Lark bot.
Leymore's avatar
Leymore committed
27
- `--dry-run`: When enabled, inference and evaluation tasks will be dispatched but won't actually run for debugging.
Leymore's avatar
Leymore committed
28
29
30
31

Using run mode `-m all` as an example, the overall execution flow is as follows:

1. Read the configuration file, parse out the model, dataset, evaluator, and other configuration information
Ezra-Yu's avatar
Ezra-Yu committed
32
33
2. The evaluation task mainly includes three stages: inference `infer`, evaluation `eval`, and visualization `viz`. After task division by Partitioner, they are handed over to Runner for parallel execution. Individual inference and evaluation tasks are abstracted into `OpenICLInferTask` and `OpenICLEvalTask` respectively.
3. After each stage ends, the visualization stage will read the evaluation results in `results/` to generate a table.
Leymore's avatar
Leymore committed
34
35
36
37
38
39
40
41
42

## Task Monitoring: Lark Bot

Users can enable real-time monitoring of task status by setting up a Lark bot. Please refer to [this document](https://open.feishu.cn/document/ukTMukTMukTM/ucTM5YjL3ETO24yNxkjN?lang=zh-CN#7a28964d) for setting up the Lark bot.

Configuration method:

1. Open the `configs/lark.py` file, and add the following line:

Ezra-Yu's avatar
Ezra-Yu committed
43
44
45
   ```python
   lark_bot_url = 'YOUR_WEBHOOK_URL'
   ```
Leymore's avatar
Leymore committed
46

Ezra-Yu's avatar
Ezra-Yu committed
47
   Typically, the Webhook URL is formatted like this: https://open.feishu.cn/open-apis/bot/v2/hook/xxxxxxxxxxxxxxxxx .
Leymore's avatar
Leymore committed
48
49
50

2. Inherit this file in the complete evaluation configuration:

Ezra-Yu's avatar
Ezra-Yu committed
51
52
   ```python
     from mmengine.config import read_base
Leymore's avatar
Leymore committed
53

Ezra-Yu's avatar
Ezra-Yu committed
54
55
     with read_base():
         from .lark import lark_bot_url
Leymore's avatar
Leymore committed
56

Ezra-Yu's avatar
Ezra-Yu committed
57
   ```
Leymore's avatar
Leymore committed
58
59
60

3. To avoid frequent messages from the bot becoming a nuisance, status updates are not automatically reported by default. You can start status reporting using `-l` or `--lark` when needed:

Ezra-Yu's avatar
Ezra-Yu committed
61
62
63
   ```bash
   python run.py configs/eval_demo.py -p {PARTITION} -l
   ```
Leymore's avatar
Leymore committed
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84

## Run Results

All run results will be placed in `outputs/default/` directory by default, the directory structure is shown below:

```
outputs/default/
├── 20200220_120000
├── ...
├── 20230220_183030
│   ├── configs
│   ├── logs
│   │   ├── eval
│   │   └── infer
│   ├── predictions
│   │   └── MODEL1
│   └── results
│       └── MODEL1
```

Each timestamp contains the following content:
Ezra-Yu's avatar
Ezra-Yu committed
85

Leymore's avatar
Leymore committed
86
87
88
89
90
91
- configs folder, which stores the configuration files corresponding to each run with this timestamp as the output directory;
- logs folder, which stores the output log files of the inference and evaluation phases, each folder will store logs in subfolders by model;
- predictions folder, which stores the inferred json results, with a model subfolder;
- results folder, which stores the evaluated json results, with a model subfolder.

Also, all `-r` without specifying a corresponding timestamp will select the newest folder by sorting as the output directory.
Ezra-Yu's avatar
Ezra-Yu committed
92
93

## Introduction of Summerizer (to be updated)