"git@developer.sourcefind.cn:wuxk1/megatron-lm.git" did not exist on "ab09d819d685b4861098d27e19230abebed8830e"
Commit 793dcd23 authored by thomwolf's avatar thomwolf
Browse files

Merge branch 'master' of https://github.com/huggingface/pytorch-pretrained-BERT into fifth-release

parents 93f563b8 2e4db64c
...@@ -71,11 +71,12 @@ This package comprises the following classes that can be imported in Python and ...@@ -71,11 +71,12 @@ This package comprises the following classes that can be imported in Python and
The repository further comprises: The repository further comprises:
- Four examples on how to use Bert (in the [`examples` folder](./examples)): - Five examples on how to use Bert (in the [`examples` folder](./examples)):
- [`extract_features.py`](./examples/extract_features.py) - Show how to extract hidden states from an instance of `BertModel`, - [`extract_features.py`](./examples/extract_features.py) - Show how to extract hidden states from an instance of `BertModel`,
- [`run_classifier.py`](./examples/run_classifier.py) - Show how to fine-tune an instance of `BertForSequenceClassification` on GLUE's MRPC task, - [`run_classifier.py`](./examples/run_classifier.py) - Show how to fine-tune an instance of `BertForSequenceClassification` on GLUE's MRPC task,
- [`run_squad.py`](./examples/run_squad.py) - Show how to fine-tune an instance of `BertForQuestionAnswering` on SQuAD v1.0 task. - [`run_squad.py`](./examples/run_squad.py) - Show how to fine-tune an instance of `BertForQuestionAnswering` on SQuAD v1.0 task.
- [`run_swag.py`](./examples/run_swag.py) - Show how to fine-tune an instance of `BertForMultipleChoice` on Swag task. - [`run_swag.py`](./examples/run_swag.py) - Show how to fine-tune an instance of `BertForMultipleChoice` on Swag task.
- [`run_lm_finetuning`](./examples/run_lm_finetuning.py) - Show how to fine-tune an instance of `BertForPretraining' on a target text corpus.
These examples are detailed in the [Examples](#examples) section of this readme. These examples are detailed in the [Examples](#examples) section of this readme.
...@@ -249,6 +250,9 @@ An example on how to use this class is given in the [`extract_features.py`](./ex ...@@ -249,6 +250,9 @@ An example on how to use this class is given in the [`extract_features.py`](./ex
- the masked language modeling logits, and - the masked language modeling logits, and
- the next sentence classification logits. - the next sentence classification logits.
An example on how to use this class is given in the [`run_lm_finetuning.py`](./examples/run_lm_finetuning.py) script which can be used to fine-tune the BERT language model on your specific different text corpus. This should improve model performance, if the language style is different from the original BERT training corpus (Wiki + BookCorpus).
#### 3. `BertForMaskedLM` #### 3. `BertForMaskedLM`
`BertForMaskedLM` includes the `BertModel` Transformer followed by the (possibly) pre-trained masked language modeling head. `BertForMaskedLM` includes the `BertModel` Transformer followed by the (possibly) pre-trained masked language modeling head.
...@@ -349,7 +353,7 @@ The optimizer accepts the following arguments: ...@@ -349,7 +353,7 @@ The optimizer accepts the following arguments:
| Sub-section | Description | | Sub-section | Description |
|-|-| |-|-|
| [Training large models: introduction, tools and examples](#Training-large-models-introduction,-tools-and-examples) | How to use gradient-accumulation, multi-gpu training, distributed training, optimize on CPU and 16-bits training to train Bert models | | [Training large models: introduction, tools and examples](#Training-large-models-introduction,-tools-and-examples) | How to use gradient-accumulation, multi-gpu training, distributed training, optimize on CPU and 16-bits training to train Bert models |
| [Fine-tuning with BERT: running the examples](#Fine-tuning-with-BERT-running-the-examples) | Running the examples in [`./examples`](./examples/): `extract_classif.py`, `run_classifier.py` and `run_squad.py` | | [Fine-tuning with BERT: running the examples](#Fine-tuning-with-BERT-running-the-examples) | Running the examples in [`./examples`](./examples/): `extract_classif.py`, `run_classifier.py`, `run_squad.py` and `run_lm_finetuning.py` |
| [Fine-tuning BERT-large on GPUs](#Fine-tuning-BERT-large-on-GPUs) | How to fine tune `BERT large`| | [Fine-tuning BERT-large on GPUs](#Fine-tuning-BERT-large-on-GPUs) | How to fine tune `BERT large`|
### Training large models: introduction, tools and examples ### Training large models: introduction, tools and examples
...@@ -380,6 +384,7 @@ We showcase several fine-tuning examples based on (and extended from) [the origi ...@@ -380,6 +384,7 @@ We showcase several fine-tuning examples based on (and extended from) [the origi
- a *sequence-level classifier* on the MRPC classification corpus, - a *sequence-level classifier* on the MRPC classification corpus,
- a *token-level classifier* on the question answering dataset SQuAD, and - a *token-level classifier* on the question answering dataset SQuAD, and
- a *sequence-level multiple-choice classifier* on the SWAG classification corpus. - a *sequence-level multiple-choice classifier* on the SWAG classification corpus.
- a *BERT language model* on another target corpus
#### MRPC #### MRPC
...@@ -492,6 +497,25 @@ global_step = 13788 ...@@ -492,6 +497,25 @@ global_step = 13788
loss = 0.06423990014260186 loss = 0.06423990014260186
``` ```
#### LM Fine-tuning
The data should be a text file in the same format as [sample_text.txt](./samples/sample_text.txt) (one sentence per line, docs separated by empty line).
You can download an [exemplary training corpus](https://ext-bert-sample.obs.eu-de.otc.t-systems.com/small_wiki_sentence_corpus.txt) generated from wikipedia articles and splitted into ~500k sentences with spaCy.
Training one epoch on this corpus takes about 1:20h on 4 x NVIDIA Tesla P100 with `train_batch_size=200` and `max_seq_length=128`:
```shell
python run_lm_finetuning.py \
--bert_model bert-base-cased
--do_train
--train_file samples/sample_text.txt
--output_dir models
--num_train_epochs 5.0
--learning_rate 3e-5
--train_batch_size 32
--max_seq_length 128
```
## Fine-tuning BERT-large on GPUs ## Fine-tuning BERT-large on GPUs
The options we list above allow to fine-tune BERT-large rather easily on GPU(s) instead of the TPU used by the original implementation. The options we list above allow to fine-tune BERT-large rather easily on GPU(s) instead of the TPU used by the original implementation.
......
...@@ -199,7 +199,7 @@ def main(): ...@@ -199,7 +199,7 @@ def main():
"bert-large-uncased, bert-base-cased, bert-base-multilingual, bert-base-chinese.") "bert-large-uncased, bert-base-cased, bert-base-multilingual, bert-base-chinese.")
## Other parameters ## Other parameters
parser.add_argument("--do_lower_case", default=False, action='store_true', help="Set this flag if you are using an uncased model.") parser.add_argument("--do_lower_case", action='store_true', help="Set this flag if you are using an uncased model.")
parser.add_argument("--layers", default="-1,-2,-3,-4", type=str) parser.add_argument("--layers", default="-1,-2,-3,-4", type=str)
parser.add_argument("--max_seq_length", default=128, type=int, parser.add_argument("--max_seq_length", default=128, type=int,
help="The maximum total input sequence length after WordPiece tokenization. Sequences longer " help="The maximum total input sequence length after WordPiece tokenization. Sequences longer "
...@@ -210,7 +210,6 @@ def main(): ...@@ -210,7 +210,6 @@ def main():
default=-1, default=-1,
help = "local_rank for distributed training on gpus") help = "local_rank for distributed training on gpus")
parser.add_argument("--no_cuda", parser.add_argument("--no_cuda",
default=False,
action='store_true', action='store_true',
help="Whether not to use CUDA when available") help="Whether not to use CUDA when available")
......
...@@ -312,7 +312,8 @@ def main(): ...@@ -312,7 +312,8 @@ def main():
help="The input data dir. Should contain the .tsv files (or other data files) for the task.") help="The input data dir. Should contain the .tsv files (or other data files) for the task.")
parser.add_argument("--bert_model", default=None, type=str, required=True, parser.add_argument("--bert_model", default=None, type=str, required=True,
help="Bert pre-trained model selected in the list: bert-base-uncased, " help="Bert pre-trained model selected in the list: bert-base-uncased, "
"bert-large-uncased, bert-base-cased, bert-base-multilingual, bert-base-chinese.") "bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, "
"bert-base-multilingual-cased, bert-base-chinese.")
parser.add_argument("--task_name", parser.add_argument("--task_name",
default=None, default=None,
type=str, type=str,
...@@ -332,15 +333,12 @@ def main(): ...@@ -332,15 +333,12 @@ def main():
"Sequences longer than this will be truncated, and sequences shorter \n" "Sequences longer than this will be truncated, and sequences shorter \n"
"than this will be padded.") "than this will be padded.")
parser.add_argument("--do_train", parser.add_argument("--do_train",
default=False,
action='store_true', action='store_true',
help="Whether to run training.") help="Whether to run training.")
parser.add_argument("--do_eval", parser.add_argument("--do_eval",
default=False,
action='store_true', action='store_true',
help="Whether to run eval on the dev set.") help="Whether to run eval on the dev set.")
parser.add_argument("--do_lower_case", parser.add_argument("--do_lower_case",
default=False,
action='store_true', action='store_true',
help="Set this flag if you are using an uncased model.") help="Set this flag if you are using an uncased model.")
parser.add_argument("--train_batch_size", parser.add_argument("--train_batch_size",
...@@ -365,7 +363,6 @@ def main(): ...@@ -365,7 +363,6 @@ def main():
help="Proportion of training to perform linear learning rate warmup for. " help="Proportion of training to perform linear learning rate warmup for. "
"E.g., 0.1 = 10%% of training.") "E.g., 0.1 = 10%% of training.")
parser.add_argument("--no_cuda", parser.add_argument("--no_cuda",
default=False,
action='store_true', action='store_true',
help="Whether not to use CUDA when available") help="Whether not to use CUDA when available")
parser.add_argument("--local_rank", parser.add_argument("--local_rank",
...@@ -381,7 +378,6 @@ def main(): ...@@ -381,7 +378,6 @@ def main():
default=1, default=1,
help="Number of updates steps to accumulate before performing a backward/update pass.") help="Number of updates steps to accumulate before performing a backward/update pass.")
parser.add_argument('--fp16', parser.add_argument('--fp16',
default=False,
action='store_true', action='store_true',
help="Whether to use 16-bit float precision instead of 32-bit") help="Whether to use 16-bit float precision instead of 32-bit")
parser.add_argument('--loss_scale', parser.add_argument('--loss_scale',
...@@ -431,7 +427,7 @@ def main(): ...@@ -431,7 +427,7 @@ def main():
if not args.do_train and not args.do_eval: if not args.do_train and not args.do_eval:
raise ValueError("At least one of `do_train` or `do_eval` must be True.") raise ValueError("At least one of `do_train` or `do_eval` must be True.")
if os.path.exists(args.output_dir) and os.listdir(args.output_dir): if os.path.exists(args.output_dir) and os.listdir(args.output_dir) and args.do_train:
raise ValueError("Output directory ({}) already exists and is not empty.".format(args.output_dir)) raise ValueError("Output directory ({}) already exists and is not empty.".format(args.output_dir))
os.makedirs(args.output_dir, exist_ok=True) os.makedirs(args.output_dir, exist_ok=True)
...@@ -503,6 +499,8 @@ def main(): ...@@ -503,6 +499,8 @@ def main():
t_total=t_total) t_total=t_total)
global_step = 0 global_step = 0
nb_tr_steps = 0
tr_loss = 0
if args.do_train: if args.do_train:
train_features = convert_examples_to_features( train_features = convert_examples_to_features(
train_examples, label_list, args.max_seq_length, tokenizer) train_examples, label_list, args.max_seq_length, tokenizer)
...@@ -554,11 +552,12 @@ def main(): ...@@ -554,11 +552,12 @@ def main():
# Save a trained model # Save a trained model
model_to_save = model.module if hasattr(model, 'module') else model # Only save the model it-self model_to_save = model.module if hasattr(model, 'module') else model # Only save the model it-self
output_model_file = os.path.join(args.output_dir, "pytorch_model.bin") output_model_file = os.path.join(args.output_dir, "pytorch_model.bin")
if args.do_train:
torch.save(model_to_save.state_dict(), output_model_file) torch.save(model_to_save.state_dict(), output_model_file)
# Load a trained model that you have fine-tuned # Load a trained model that you have fine-tuned
model_state_dict = torch.load(output_model_file) model_state_dict = torch.load(output_model_file)
model = BertForSequenceClassification.from_pretrained(args.bert_model, state_dict=model_state_dict) model = BertForSequenceClassification.from_pretrained(args.bert_model, state_dict=model_state_dict, num_labels=num_labels)
model.to(device) model.to(device)
if args.do_eval and (args.local_rank == -1 or torch.distributed.get_rank() == 0): if args.do_eval and (args.local_rank == -1 or torch.distributed.get_rank() == 0):
...@@ -580,7 +579,8 @@ def main(): ...@@ -580,7 +579,8 @@ def main():
model.eval() model.eval()
eval_loss, eval_accuracy = 0, 0 eval_loss, eval_accuracy = 0, 0
nb_eval_steps, nb_eval_examples = 0, 0 nb_eval_steps, nb_eval_examples = 0, 0
for input_ids, input_mask, segment_ids, label_ids in eval_dataloader:
for input_ids, input_mask, segment_ids, label_ids in tqdm(eval_dataloader, desc="Evaluating"):
input_ids = input_ids.to(device) input_ids = input_ids.to(device)
input_mask = input_mask.to(device) input_mask = input_mask.to(device)
segment_ids = segment_ids.to(device) segment_ids = segment_ids.to(device)
...@@ -602,11 +602,11 @@ def main(): ...@@ -602,11 +602,11 @@ def main():
eval_loss = eval_loss / nb_eval_steps eval_loss = eval_loss / nb_eval_steps
eval_accuracy = eval_accuracy / nb_eval_examples eval_accuracy = eval_accuracy / nb_eval_examples
loss = tr_loss/nb_tr_steps if args.do_train else None
result = {'eval_loss': eval_loss, result = {'eval_loss': eval_loss,
'eval_accuracy': eval_accuracy, 'eval_accuracy': eval_accuracy,
'global_step': global_step, 'global_step': global_step,
'loss': tr_loss/nb_tr_steps} 'loss': loss}
output_eval_file = os.path.join(args.output_dir, "eval_results.txt") output_eval_file = os.path.join(args.output_dir, "eval_results.txt")
with open(output_eval_file, "w") as writer: with open(output_eval_file, "w") as writer:
......
This diff is collapsed.
...@@ -681,7 +681,8 @@ def main(): ...@@ -681,7 +681,8 @@ def main():
## Required parameters ## Required parameters
parser.add_argument("--bert_model", default=None, type=str, required=True, parser.add_argument("--bert_model", default=None, type=str, required=True,
help="Bert pre-trained model selected in the list: bert-base-uncased, " help="Bert pre-trained model selected in the list: bert-base-uncased, "
"bert-large-uncased, bert-base-cased, bert-base-multilingual, bert-base-chinese.") "bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, "
"bert-base-multilingual-cased, bert-base-chinese.")
parser.add_argument("--output_dir", default=None, type=str, required=True, parser.add_argument("--output_dir", default=None, type=str, required=True,
help="The output directory where the model checkpoints and predictions will be written.") help="The output directory where the model checkpoints and predictions will be written.")
...@@ -697,8 +698,8 @@ def main(): ...@@ -697,8 +698,8 @@ def main():
parser.add_argument("--max_query_length", default=64, type=int, parser.add_argument("--max_query_length", default=64, type=int,
help="The maximum number of tokens for the question. Questions longer than this will " help="The maximum number of tokens for the question. Questions longer than this will "
"be truncated to this length.") "be truncated to this length.")
parser.add_argument("--do_train", default=False, action='store_true', help="Whether to run training.") parser.add_argument("--do_train", action='store_true', help="Whether to run training.")
parser.add_argument("--do_predict", default=False, action='store_true', help="Whether to run eval on the dev set.") parser.add_argument("--do_predict", action='store_true', help="Whether to run eval on the dev set.")
parser.add_argument("--train_batch_size", default=32, type=int, help="Total batch size for training.") parser.add_argument("--train_batch_size", default=32, type=int, help="Total batch size for training.")
parser.add_argument("--predict_batch_size", default=8, type=int, help="Total batch size for predictions.") parser.add_argument("--predict_batch_size", default=8, type=int, help="Total batch size for predictions.")
parser.add_argument("--learning_rate", default=5e-5, type=float, help="The initial learning rate for Adam.") parser.add_argument("--learning_rate", default=5e-5, type=float, help="The initial learning rate for Adam.")
...@@ -713,11 +714,10 @@ def main(): ...@@ -713,11 +714,10 @@ def main():
parser.add_argument("--max_answer_length", default=30, type=int, parser.add_argument("--max_answer_length", default=30, type=int,
help="The maximum length of an answer that can be generated. This is needed because the start " help="The maximum length of an answer that can be generated. This is needed because the start "
"and end predictions are not conditioned on one another.") "and end predictions are not conditioned on one another.")
parser.add_argument("--verbose_logging", default=False, action='store_true', parser.add_argument("--verbose_logging", action='store_true',
help="If true, all of the warnings related to data processing will be printed. " help="If true, all of the warnings related to data processing will be printed. "
"A number of warnings are expected for a normal SQuAD evaluation.") "A number of warnings are expected for a normal SQuAD evaluation.")
parser.add_argument("--no_cuda", parser.add_argument("--no_cuda",
default=False,
action='store_true', action='store_true',
help="Whether not to use CUDA when available") help="Whether not to use CUDA when available")
parser.add_argument('--seed', parser.add_argument('--seed',
...@@ -729,7 +729,6 @@ def main(): ...@@ -729,7 +729,6 @@ def main():
default=1, default=1,
help="Number of updates steps to accumulate before performing a backward/update pass.") help="Number of updates steps to accumulate before performing a backward/update pass.")
parser.add_argument("--do_lower_case", parser.add_argument("--do_lower_case",
default=True,
action='store_true', action='store_true',
help="Whether to lower case the input text. True for uncased models, False for cased models.") help="Whether to lower case the input text. True for uncased models, False for cased models.")
parser.add_argument("--local_rank", parser.add_argument("--local_rank",
...@@ -737,7 +736,6 @@ def main(): ...@@ -737,7 +736,6 @@ def main():
default=-1, default=-1,
help="local_rank for distributed training on gpus") help="local_rank for distributed training on gpus")
parser.add_argument('--fp16', parser.add_argument('--fp16',
default=False,
action='store_true', action='store_true',
help="Whether to use 16-bit float precision instead of 32-bit") help="Whether to use 16-bit float precision instead of 32-bit")
parser.add_argument('--loss_scale', parser.add_argument('--loss_scale',
...@@ -788,7 +786,7 @@ def main(): ...@@ -788,7 +786,7 @@ def main():
raise ValueError("Output directory () already exists and is not empty.") raise ValueError("Output directory () already exists and is not empty.")
os.makedirs(args.output_dir, exist_ok=True) os.makedirs(args.output_dir, exist_ok=True)
tokenizer = BertTokenizer.from_pretrained(args.bert_model) tokenizer = BertTokenizer.from_pretrained(args.bert_model, do_lower_case=args.do_lower_case)
train_examples = None train_examples = None
num_train_steps = None num_train_steps = None
...@@ -855,7 +853,7 @@ def main(): ...@@ -855,7 +853,7 @@ def main():
global_step = 0 global_step = 0
if args.do_train: if args.do_train:
cached_train_features_file = args.train_file+'_{0}_{1}_{2}_{3}'.format( cached_train_features_file = args.train_file+'_{0}_{1}_{2}_{3}'.format(
args.bert_model, str(args.max_seq_length), str(args.doc_stride), str(args.max_query_length)) list(filter(None, args.bert_model.split('/'))).pop(), str(args.max_seq_length), str(args.doc_stride), str(args.max_query_length))
train_features = None train_features = None
try: try:
with open(cached_train_features_file, "rb") as reader: with open(cached_train_features_file, "rb") as reader:
......
...@@ -249,7 +249,8 @@ def main(): ...@@ -249,7 +249,8 @@ def main():
help="The input data dir. Should contain the .csv files (or other data files) for the task.") help="The input data dir. Should contain the .csv files (or other data files) for the task.")
parser.add_argument("--bert_model", default=None, type=str, required=True, parser.add_argument("--bert_model", default=None, type=str, required=True,
help="Bert pre-trained model selected in the list: bert-base-uncased, " help="Bert pre-trained model selected in the list: bert-base-uncased, "
"bert-large-uncased, bert-base-cased, bert-base-multilingual, bert-base-chinese.") "bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, "
"bert-base-multilingual-cased, bert-base-chinese.")
parser.add_argument("--output_dir", parser.add_argument("--output_dir",
default=None, default=None,
type=str, type=str,
...@@ -264,15 +265,12 @@ def main(): ...@@ -264,15 +265,12 @@ def main():
"Sequences longer than this will be truncated, and sequences shorter \n" "Sequences longer than this will be truncated, and sequences shorter \n"
"than this will be padded.") "than this will be padded.")
parser.add_argument("--do_train", parser.add_argument("--do_train",
default=False,
action='store_true', action='store_true',
help="Whether to run training.") help="Whether to run training.")
parser.add_argument("--do_eval", parser.add_argument("--do_eval",
default=False,
action='store_true', action='store_true',
help="Whether to run eval on the dev set.") help="Whether to run eval on the dev set.")
parser.add_argument("--do_lower_case", parser.add_argument("--do_lower_case",
default=False,
action='store_true', action='store_true',
help="Set this flag if you are using an uncased model.") help="Set this flag if you are using an uncased model.")
parser.add_argument("--train_batch_size", parser.add_argument("--train_batch_size",
...@@ -297,7 +295,6 @@ def main(): ...@@ -297,7 +295,6 @@ def main():
help="Proportion of training to perform linear learning rate warmup for. " help="Proportion of training to perform linear learning rate warmup for. "
"E.g., 0.1 = 10%% of training.") "E.g., 0.1 = 10%% of training.")
parser.add_argument("--no_cuda", parser.add_argument("--no_cuda",
default=False,
action='store_true', action='store_true',
help="Whether not to use CUDA when available") help="Whether not to use CUDA when available")
parser.add_argument("--local_rank", parser.add_argument("--local_rank",
...@@ -313,7 +310,6 @@ def main(): ...@@ -313,7 +310,6 @@ def main():
default=1, default=1,
help="Number of updates steps to accumulate before performing a backward/update pass.") help="Number of updates steps to accumulate before performing a backward/update pass.")
parser.add_argument('--fp16', parser.add_argument('--fp16',
default=False,
action='store_true', action='store_true',
help="Whether to use 16-bit float precision instead of 32-bit") help="Whether to use 16-bit float precision instead of 32-bit")
parser.add_argument('--loss_scale', parser.add_argument('--loss_scale',
......
...@@ -439,8 +439,8 @@ class PreTrainedModel(nn.Module): ...@@ -439,8 +439,8 @@ class PreTrainedModel(nn.Module):
# cf https://github.com/pytorch/pytorch/pull/5617 # cf https://github.com/pytorch/pytorch/pull/5617
module.weight.data.normal_(mean=0.0, std=self.config.initializer_range) module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
elif isinstance(module, BertLayerNorm): elif isinstance(module, BertLayerNorm):
module.bias.data.normal_(mean=0.0, std=self.config.initializer_range) module.bias.data.zero_()
module.weight.data.normal_(mean=0.0, std=self.config.initializer_range) module.weight.data.fill_(1.0)
if isinstance(module, nn.Linear) and module.bias is not None: if isinstance(module, nn.Linear) and module.bias is not None:
module.bias.data.zero_() module.bias.data.zero_()
...@@ -456,7 +456,9 @@ class PreTrainedModel(nn.Module): ...@@ -456,7 +456,9 @@ class PreTrainedModel(nn.Module):
. `bert-base-uncased` . `bert-base-uncased`
. `bert-large-uncased` . `bert-large-uncased`
. `bert-base-cased` . `bert-base-cased`
. `bert-base-multilingual` . `bert-large-cased`
. `bert-base-multilingual-uncased`
. `bert-base-multilingual-cased`
. `bert-base-chinese` . `bert-base-chinese`
- a path or url to a pretrained model archive containing: - a path or url to a pretrained model archive containing:
. `bert_config.json` a configuration file for the model . `bert_config.json` a configuration file for the model
...@@ -728,7 +730,7 @@ class BertForMaskedLM(PreTrainedModel): ...@@ -728,7 +730,7 @@ class BertForMaskedLM(PreTrainedModel):
is only computed for the labels set in [0, ..., vocab_size] is only computed for the labels set in [0, ..., vocab_size]
Outputs: Outputs:
if `masked_lm_labels` is `None`: if `masked_lm_labels` is not `None`:
Outputs the masked language modeling loss. Outputs the masked language modeling loss.
if `masked_lm_labels` is `None`: if `masked_lm_labels` is `None`:
Outputs the masked language modeling logits of shape [batch_size, sequence_length, vocab_size]. Outputs the masked language modeling logits of shape [batch_size, sequence_length, vocab_size].
...@@ -1035,15 +1037,7 @@ class BertForQuestionAnswering(PreTrainedModel): ...@@ -1035,15 +1037,7 @@ class BertForQuestionAnswering(PreTrainedModel):
the sequence output that computes start_logits and end_logits the sequence output that computes start_logits and end_logits
Params: Params:
`config`: either `config`: a BertConfig class instance with the configuration to build a new model.
- a BertConfig class instance with the configuration to build a new model, or
- a str with the name of a pre-trained model to load selected in the list of:
. `bert-base-uncased`
. `bert-large-uncased`
. `bert-base-cased`
. `bert-base-multilingual`
. `bert-base-chinese`
The pre-trained model will be downloaded and cached if needed.
Inputs: Inputs:
`input_ids`: a torch.LongTensor of shape [batch_size, sequence_length] `input_ids`: a torch.LongTensor of shape [batch_size, sequence_length]
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment