Commit 0876b77f authored by Grégory Châtel's avatar Grégory Châtel
Browse files

Change to the README file to add SWAG results.

parent 150f3cd9
...@@ -441,13 +441,25 @@ python run_swag.py \ ...@@ -441,13 +441,25 @@ python run_swag.py \
--do_train \ --do_train \
--do_eval \ --do_eval \
--data_dir $SWAG_DIR/data --data_dir $SWAG_DIR/data
--train_batch_size 10 \ --train_batch_size 4 \
--learning_rate 2e-5 \ --learning_rate 2e-5 \
--num_train_epochs 3.0 \ --num_train_epochs 3.0 \
--max_seq_length 80 \ --max_seq_length 80 \
--output_dir /tmp/swag_output/ --output_dir /tmp/swag_output/
``` ```
Training with the previous hyper-parameters gave us the following results:
```
eval_accuracy = 0.7776167149855043
eval_loss = 1.006812262735175
global_step = 55161
loss = 0.282251750624779
```
The difference with the `81.6%` accuracy announced in the Bert article
is probably due to the different `training_batch_size` (here 4 and 16
in the article).
## Fine-tuning BERT-large on GPUs ## Fine-tuning BERT-large on GPUs
The options we list above allow to fine-tune BERT-large rather easily on GPU(s) instead of the TPU used by the original implementation. The options we list above allow to fine-tune BERT-large rather easily on GPU(s) instead of the TPU used by the original implementation.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment