Unverified Commit 57eb1cb6 authored by Sam Shleifer's avatar Sam Shleifer Committed by GitHub
Browse files

[s2s] Document better mbart finetuning command (#6229)

* Document better MT command

* improve multigpu command
parent 0513f8d2
...@@ -113,22 +113,20 @@ Best performing command: ...@@ -113,22 +113,20 @@ Best performing command:
# optionally # optionally
export ENRO_DIR='wmt_en_ro' # Download instructions above export ENRO_DIR='wmt_en_ro' # Download instructions above
# export WANDB_PROJECT="MT" # optional # export WANDB_PROJECT="MT" # optional
export MAX_LEN=200 export MAX_LEN=128
export BS=4 export BS=4
export GAS=8 # gradient accumulation steps
./train_mbart_cc25_enro.sh --output_dir enro_finetune_baseline --label_smoothing 0.1 --fp16_opt_level=O1 --logger_name wandb --sortish_sampler ./train_mbart_cc25_enro.sh --output_dir enro_finetune_baseline --label_smoothing 0.1 --fp16_opt_level=O1 --logger_name wandb --sortish_sampler
``` ```
This should take < 6h/epoch on a 16GB v100 and achieve val_avg_ BLEU score above 25. (you can see metrics in wandb or metrics.json). This should take < 6h/epoch on a 16GB v100 and achieve test BLEU above 26
To get results in line with fairseq, you need to do some postprocessing. To get results in line with fairseq, you need to do some postprocessing. (see `romanian_postprocessing.md`)
MultiGPU command MultiGPU command
(using 8 GPUS as an example) (using 8 GPUS as an example)
```bash ```bash
export ENRO_DIR='wmt_en_ro' # Download instructions above export ENRO_DIR='wmt_en_ro' # Download instructions above
# export WANDB_PROJECT="MT" # optional # export WANDB_PROJECT="MT" # optional
export MAX_LEN=200 export MAX_LEN=128
export BS=4 export BS=4
export GAS=1 # gradient accumulation steps
./train_mbart_cc25_enro.sh --output_dir enro_finetune_baseline --gpus 8 --logger_name wandb ./train_mbart_cc25_enro.sh --output_dir enro_finetune_baseline --gpus 8 --logger_name wandb
``` ```
### Finetuning Outputs ### Finetuning Outputs
......
...@@ -10,7 +10,7 @@ python finetune.py \ ...@@ -10,7 +10,7 @@ python finetune.py \
--num_train_epochs 6 --src_lang en_XX --tgt_lang ro_RO \ --num_train_epochs 6 --src_lang en_XX --tgt_lang ro_RO \
--data_dir $ENRO_DIR \ --data_dir $ENRO_DIR \
--max_source_length $MAX_LEN --max_target_length $MAX_LEN --val_max_target_length $MAX_LEN --test_max_target_length $MAX_LEN \ --max_source_length $MAX_LEN --max_target_length $MAX_LEN --val_max_target_length $MAX_LEN --test_max_target_length $MAX_LEN \
--train_batch_size=$BS --eval_batch_size=$BS --gradient_accumulation_steps=$GAS \ --train_batch_size=$BS --eval_batch_size=$BS \
--task translation \ --task translation \
--warmup_steps 500 \ --warmup_steps 500 \
--freeze_embeds \ --freeze_embeds \
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment