hydra_integration.md 4.86 KB
Newer Older
Sugon_ldc's avatar
Sugon_ldc committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113


## Hydra

Hydra is an open-source Python framework that simplifies the development of research and other complex applications. The key feature is the ability to dynamically create a hierarchical configuration by composition and override it through config files and the command line. The name Hydra comes from its ability to run multiple similar jobs - much like a Hydra with multiple heads.

## Train models with hydra interface

#### Provide parameters in `.yaml` files
For example, if we'd like to train a language model with transformer, we could provide parameters in yaml files. Note that the modules used (task, model, criterion, optimizer, lr scheduler) in training must be migrated with hydra interface already (See session below).

- Provide top level choices on which generic parameter file, and which modules to use: `config/config.yaml`, this will look like for example:

```
defaults:
  - params: training_params
  - task: language_modeling
  - model: transformer_lm
  - criterion: cross_entropy
  - optimizer: adam
  - lr_scheduler: inverse_sqrt
```

- Provide generic parameters common across different training jobs: `config/params/training_params.yaml`
- Provide task parameters: `config/task/language_modeling.yaml`
- Provide model parameters: `config/model/transformer_lm.yaml`
- Provide criterion parameters: `config/criterion/cross_entropy.yaml`
- Provide optimizer parameters: `config/optimizer/adam.yaml`
- Provide lr_scheduler parameters `config/lr_scheduler/inverse_sqrt.yaml`

#### Command line overriding
`train_hydra.py` is the main entry point for training with hydra interface. If we specify all parameters we want in `.yaml` files, then we could simply use command:

```
# task.data is requested field marked by `???` in yaml
python fairseq_cli/train_hydra.py \
task.data=/private/home/abaevski/data/wiki103 \
```

Alternatively, if we need to override certain params from the command line, we could do so as below (note the structure of where each parameter sits)

```
python fairseq_cli/train_hydra.py
params=training_params \
task=language_modeling \
task.data=/private/home/abaevski/data/wiki103 \
task.tokens_per_sample=512 \
task.sample_break_mode=none \
model=transformer_lm \
model.share_decoder_input_output_embed=true \
model.dropout=0.1 \
optimizer=adam \
optimizer.adam_betas="'(0.9, 0.98)'" \
optimizer.weight_decay=0.01 \
lr_scheduler=inverse_sqrt \
lr_scheduler.warmup_updates=4000 \
lr_scheduler.warmup_init_lr=1e-07 \
criterion=cross_entropy \
params.common.fp16=true \
params.common.log_format=json \
params.common.log_interval=1 \
params.dataset.max_tokens=1024 \
params.dataset.num_workers=4 \
params.optimization.update_freq=[16] \
params.optimization.max_update=50000 \
params.optimization.clip_norm=0.0 \
params.optimization.lr=[0.0005] \
params.checkpoint.save_dir=/checkpoint/mtian/transformer_wikitext-103-hydra-args-cli \
params.checkpoint.save_interval_updates=10
```

## Migrate existing/Creating new modules to hydra interface

In each of the modules we want to migrated/create with hydra interface, fundamentally we need to

- Provide a dataclass that layouts the parameters used in the module.

- Modify the builder and/or constructor that previously takes `argparse.Namespace` argument `args`, into taking `omegaconf.DictConfig` config objects. At this moment we allow `Union[omegaconf.DictConfig, argparse.Namespace]` to support compatibility.

- For `add_args()`, we need to extract argument from the dataclass defined in the same file, and append them into `parser`. This is also to support compatibility. This is simply supported with `gen_parser_from_dataclass` API, see examples files below.

#### Migrated examples:

- Task: `fairseq/tasks/language_modeling.py`

- Model: `fairseq/models/transformer_lm.py`

- Criterion: `fairseq/criterions/adaptive_loss.py` and `fairseq/criterions/cross_entropy.py`

- Optimizer: `fairseq/optim/adam.py` and `fairseq/optim/nag.py`

- LR scheduler: `fairseq/optim/lr_scheduler/cosine_lr_scheduler.py` and `fairseq/optim/lr_scheduler/inverse_square_root_schedule.py`


## Interpolate parameters across different places

## Support of legacy interface
If you still like to pass legacy style arguments in command line, `fairseq_cli/train.py` can support this. Internally it coverted `args` into hydra config objects whenever there are migrated modules aligned.

```
python fairseq_cli/train.py --task language_modeling \
/private/home/abaevski/data/wiki103 \
--save-dir /checkpoint/mtian/transformer_wikitext-103-hydra-args-cli \
--arch transformer_lm --share-decoder-input-output-embed \
--dropout 0.1 \
--optimizer adam --adam-betas '(0.9, 0.98)' --weight-decay 0.01 --clip-norm 0.0 \
--lr 0.0005 --lr-scheduler inverse_sqrt --warmup-updates 4000 --warmup-init-lr 1e-07 \
--tokens-per-sample 512 --sample-break-mode none \
--max-tokens 1024 --update-freq 16 \
--fp16 \
--max-update 50000 --log-format json --log-interval 1 --num-workers 4 \
--save-interval-updates 10
```