Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
0876b77f
"examples/distillation/vscode:/vscode.git/clone" did not exist on "fa84ae26d62c7ac2ad6dca18b2d8b12ab83bc900"
Commit
0876b77f
authored
Dec 10, 2018
by
Grégory Châtel
Browse files
Change to the README file to add SWAG results.
parent
150f3cd9
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
13 additions
and
1 deletion
+13
-1
README.md
README.md
+13
-1
No files found.
README.md
View file @
0876b77f
...
...
@@ -441,13 +441,25 @@ python run_swag.py \
--do_train
\
--do_eval
\
--data_dir
$SWAG_DIR
/data
--train_batch_size
10
\
--train_batch_size
4
\
--learning_rate
2e-5
\
--num_train_epochs
3.0
\
--max_seq_length
80
\
--output_dir
/tmp/swag_output/
```
Training with the previous hyper-parameters gave us the following results:
```
eval_accuracy = 0.7776167149855043
eval_loss = 1.006812262735175
global_step = 55161
loss = 0.282251750624779
```
The difference with the
`81.6%`
accuracy announced in the Bert article
is probably due to the different
`training_batch_size`
(here 4 and 16
in the article).
## Fine-tuning BERT-large on GPUs
The options we list above allow to fine-tune BERT-large rather easily on GPU(s) instead of the TPU used by the original implementation.
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment