- `bert-base-german-cased`: Trained on German data only, 12-layer, 768-hidden, 12-heads, 110M parameters [Performance Evaluation](https://deepset.ai/german-bert)
- `bert-base-german-cased`: Trained on German data only, 12-layer, 768-hidden, 12-heads, 110M parameters [Performance Evaluation](https://deepset.ai/german-bert)
- `bert-large-uncased-whole-word-masking`: 24-layer, 1024-hidden, 16-heads, 340M parameters - Trained with Whole Word Masking (mask all of the the tokens corresponding to a word at once)
- `bert-large-uncased-whole-word-masking`: 24-layer, 1024-hidden, 16-heads, 340M parameters - Trained with Whole Word Masking (mask all of the the tokens corresponding to a word at once)
- `bert-large-cased-whole-word-masking`: 24-layer, 1024-hidden, 16-heads, 340M parameters - Trained with Whole Word Masking (mask all of the the tokens corresponding to a word at once)
- `bert-large-cased-whole-word-masking`: 24-layer, 1024-hidden, 16-heads, 340M parameters - Trained with Whole Word Masking (mask all of the the tokens corresponding to a word at once)
- `bert-large-uncased-whole-word-masking-finetuned-squad`: The `bert-large-uncased-whole-word-masking` model finetuned on SQuAD (using the `run_squad.py` examples). Results: *exact_match: 86.91579943235573, f1: 93.1532499015869*
@@ -608,13 +609,15 @@ There are three types of files you need to save to be able to reload a fine-tune
...
@@ -608,13 +609,15 @@ There are three types of files you need to save to be able to reload a fine-tune
- the configuration file of the model which is saved as a JSON file, and
- the configuration file of the model which is saved as a JSON file, and
- the vocabulary (and the merges for the BPE-based models GPT and GPT-2).
- the vocabulary (and the merges for the BPE-based models GPT and GPT-2).
The defaults files names of these files are as follow:
The *default filenames* of these files are as follow:
- the model weights file: `pytorch_model.bin`,
- the model weights file: `pytorch_model.bin`,
- the configuration file: `config.json`,
- the configuration file: `config.json`,
- the vocabulary file: `vocab.txt` for BERT and Transformer-XL, `vocab.json` for GPT/GPT-2 (BPE vocabulary),
- the vocabulary file: `vocab.txt` for BERT and Transformer-XL, `vocab.json` for GPT/GPT-2 (BPE vocabulary),
- for GPT/GPT-2 (BPE vocabulary) the additional merges file: `merges.txt`.
- for GPT/GPT-2 (BPE vocabulary) the additional merges file: `merges.txt`.
**If you save a model using these *default filenames*, you can then re-load the model and tokenizer using the `from_pretrained()` method.**
Here is the recommended way of saving the model, configuration and vocabulary to an `output_dir` directory and reloading the model and tokenizer afterwards:
Here is the recommended way of saving the model, configuration and vocabulary to an `output_dir` directory and reloading the model and tokenizer afterwards:
```python
```python
...
@@ -1268,6 +1271,30 @@ python run_classifier.py \
...
@@ -1268,6 +1271,30 @@ python run_classifier.py \
--fp16
--fp16
```
```
**Distributed training**
Here is an example using distributed training on 8 V100 GPUs and Bert Whole Word Masking model to reach a F1 > 93 on SQuAD: