Rename "loaded {} batches" to "loaded {} blocks" (#1279)

Summary: Very small change. The previous message was misleading, the length of TokenBlocksDataset is a number of "blocks" or "streams" but not the number of batches strictly speaking if I am not mistaken. I use the notion of batch from roberta https://github.com/pytorch/fairseq/blob/master/examples/roberta/README.pretraining.md. It took me some time to understand what was going on, I hope it saves some time for others. Pull Request resolved: https://github.com/pytorch/fairseq/pull/1279 Differential Revision: D18051476 fbshipit-source-id: 71fa35f21b9dbc8d6bde28cd3a487723690aadee

Rename "loaded {} batches" to "loaded {} blocks" (#1279)
Summary: Very small change. The previous message was misleading, the length of TokenBlocksDataset is a number of "blocks" or "streams" but not the number of batches strictly speaking if I am not mistaken. I use the notion of batch from roberta https://github.com/pytorch/fairseq/blob/master/examples/roberta/README.pretraining.md. It took me some time to understand what was going on, I hope it saves some time for others. Pull Request resolved: https://github.com/pytorch/fairseq/pull/1279 Differential Revision: D18051476 fbshipit-source-id: 71fa35f21b9dbc8d6bde28cd3a487723690aadee
2d51e04d · Louis MARTIN · Facebook Github Bot · 34e6a5e8 · 2d51e04d
Commit 2d51e04d authored Oct 21, 2019 by Louis MARTIN Committed by Facebook Github Bot Oct 21, 2019
Hide whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

fairseq/tasks/masked_lm.py fairseq/tasks/masked_lm.py +1 -1

No files found.
--- a/fairseq/tasks/masked_lm.py
+++ b/fairseq/tasks/masked_lm.py
@@ -100,7 +100,7 @@ class MaskedLMTask(FairseqTask):
            eos=self.source_dictionary.eos(),
            break_mode=self.args.sample_break_mode,
        )
-        print('| loaded {} batches from: {}'.format(len(dataset), split_path))
+        print('| loaded {} blocks from: {}'.format(len(dataset), split_path))

        # prepend beginning-of-sentence token (<s>, equiv. to [CLS] in BERT)
        dataset = PrependTokenDataset(dataset, self.source_dictionary.bos())