README.md 1012 Bytes
Newer Older
hepj's avatar
hepj committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# APPS finetuning
In this folder we show how to train an autoregressive Language model on APPS dataset, since a common way to evaluate on this benchmark is after finetuning the model on its training split.
We use Hugging Face [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) which supports distributed training on multiple GPUs.

## Setup

First login to Weights & Biases
```
wandb login
```

You can finetune a model, `gpt_345_python_any_license` for example, by running:
```python
# we use a global batch size of 256, here = 8 (GPUs) * 2 (batch_size_per_device) * 16 (gradient_accumulation)
python apps_train.py \
        --model_ckpt BigCode/gpt_345_python_any_license \
        --num_epochs 10 \
        --batch_size 2 \
        --gradient_accumulation_steps 16 \
        --learning_rate 5e-5 \
        --eval_freq 250 \
        --fp16
```
The fine-tuning takes 11h on 4 A100 GPUs.

## Acknowledgments

This script is adapted from [APPS repository](https://github.com/hendrycks/apps).