Reduced memory usage for pregenerating the data a lot by writing it

out on the fly without shuffling - the Sampler in the finetuning script will shuffle for us.

Reduced memory usage for pregenerating the data a lot by writing it
out on the fly without shuffling - the Sampler in the finetuning script will shuffle for us.
a8a577ba · Matthew Carrigan · 0ae59e66 · a8a577ba
Commit a8a577ba authored Mar 21, 2019 by Matthew Carrigan
Hide whitespace changes
Inline Side-by-side

Showing with 0 additions and 2 deletions

examples/lm_finetuning/finetune_on_pregenerated.py examples/lm_finetuning/finetune_on_pregenerated.py +0 -2

No files found.
--- a/examples/lm_finetuning/finetune_on_pregenerated.py
+++ b/examples/lm_finetuning/finetune_on_pregenerated.py
@@ -74,8 +74,6 @@ class PregeneratedDataset(Dataset):
        with data_file.open() as f:
            for i, line in enumerate(tqdm(f, total=num_samples, desc="Training examples")):
                line = line.strip()
-                if not line:
-                    continue  # Skip trailing blank lines etc.
                example = json.loads(line)
                features = convert_example_to_features(example, tokenizer, seq_len)
                input_ids[i] = features.input_ids