update readme - fix SQuAD model on multi-GPU

2c5d993b · thomwolf · 4850ec58 · 2c5d993b · 2c5d993b
Commit 2c5d993b authored Nov 08, 2018 by thomwolf
Show whitespace changes
Inline Side-by-side

Showing with 10 additions and 3 deletions

README.md README.md +5 -0

modeling.py modeling.py +5 -3

No files found.
--- a/README.md
+++ b/README.md
@@ -194,3 +194,8 @@ python run_squad.py \
  --doc_stride 128 \
  --output_dir ../debug_squad/
 ```
+
+Training with the previous hyper-parameters and a batch size 32 (on 4 GPUs) for 2 epochs gave us the following results:
+```bash
+{"f1": 88.19829549714827, "exact_match": 80.75685903500474}
+```
--- a/modeling.py
+++ b/modeling.py
@@ -455,8 +455,10 @@ class BertForQuestionAnswering(nn.Module):
        end_logits = end_logits.squeeze(-1)

        if start_positions is not None and end_positions is not None:
-            # If we are on multi-GPU, split add a dimension - if not this is a no-op
+            # If we are on multi-GPU, split add a dimension
+            if len(start_positions.size()) > 1:
                start_positions = start_positions.squeeze(-1)
+            if len(end_positions.size()) > 1:
                end_positions = end_positions.squeeze(-1)
            # sometimes the start/end positions are outside our model inputs, we ignore these terms
            ignored_index = start_logits.size(1)