Commit 3041681f authored by silencealiang's avatar silencealiang
Browse files

init

parent 291fc518
Pipeline #2557 canceled with stages
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
# APPS finetuning
In this folder we show how to train an autoregressive Language model on APPS dataset, since a common way to evaluate on this benchmark is after finetuning the model on its training split.
We use Hugging Face [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) which supports distributed training on multiple GPUs.
## Setup
First login to Weights & Biases
```
wandb login
```
You can finetune a model, `gpt_345_python_any_license` for example, by running:
```python
# we use a global batch size of 256, here = 8 (GPUs) * 2 (batch_size_per_device) * 16 (gradient_accumulation)
python apps_train.py \
--model_ckpt BigCode/gpt_345_python_any_license \
--num_epochs 10 \
--batch_size 2 \
--gradient_accumulation_steps 16 \
--learning_rate 5e-5 \
--eval_freq 250 \
--fp16
```
The fine-tuning takes 11h on 4 A100 GPUs.
## Acknowledgments
This script is adapted from [APPS repository](https://github.com/hendrycks/apps).
\ No newline at end of file
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment