Commit 2d51e04d authored by Louis MARTIN's avatar Louis MARTIN Committed by Facebook Github Bot
Browse files

Rename "loaded {} batches" to "loaded {} blocks" (#1279)

Summary:
Very small change.
The previous message was misleading, the length of TokenBlocksDataset is a number of "blocks" or "streams" but not the number of batches strictly speaking if I am not mistaken. I use the notion of batch from roberta https://github.com/pytorch/fairseq/blob/master/examples/roberta/README.pretraining.md.
It took me some time to understand what was going on, I hope it saves some time for others.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1279

Differential Revision: D18051476

fbshipit-source-id: 71fa35f21b9dbc8d6bde28cd3a487723690aadee
parent 34e6a5e8
......@@ -100,7 +100,7 @@ class MaskedLMTask(FairseqTask):
eos=self.source_dictionary.eos(),
break_mode=self.args.sample_break_mode,
)
print('| loaded {} batches from: {}'.format(len(dataset), split_path))
print('| loaded {} blocks from: {}'.format(len(dataset), split_path))
# prepend beginning-of-sentence token (<s>, equiv. to [CLS] in BERT)
dataset = PrependTokenDataset(dataset, self.source_dictionary.bos())
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment