You need to sign in or sign up before continuing.
Commit 0876b77f authored by Grégory Châtel's avatar Grégory Châtel
Browse files

Change to the README file to add SWAG results.

parent 150f3cd9
...@@ -441,13 +441,25 @@ python run_swag.py \ ...@@ -441,13 +441,25 @@ python run_swag.py \
--do_train \ --do_train \
--do_eval \ --do_eval \
--data_dir $SWAG_DIR/data --data_dir $SWAG_DIR/data
--train_batch_size 10 \ --train_batch_size 4 \
--learning_rate 2e-5 \ --learning_rate 2e-5 \
--num_train_epochs 3.0 \ --num_train_epochs 3.0 \
--max_seq_length 80 \ --max_seq_length 80 \
--output_dir /tmp/swag_output/ --output_dir /tmp/swag_output/
``` ```
Training with the previous hyper-parameters gave us the following results:
```
eval_accuracy = 0.7776167149855043
eval_loss = 1.006812262735175
global_step = 55161
loss = 0.282251750624779
```
The difference with the `81.6%` accuracy announced in the Bert article
is probably due to the different `training_batch_size` (here 4 and 16
in the article).
## Fine-tuning BERT-large on GPUs ## Fine-tuning BERT-large on GPUs
The options we list above allow to fine-tune BERT-large rather easily on GPU(s) instead of the TPU used by the original implementation. The options we list above allow to fine-tune BERT-large rather easily on GPU(s) instead of the TPU used by the original implementation.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment