• Davide Caroselli's avatar
    Memory-Mapped IndexedDataset implementation (#589) · a1c997bd
    Davide Caroselli authored
    Summary:
    Following discussion in https://github.com/pytorch/fairseq/issues/574:
    
     - Implemented MMapIndexedDataset and MMapIndexedDatasetBuilder compatible with IndexedDataset/IndexedDatasetBuilder
    - Update scripts/read_binarized.py to support new MMapIndexedDataset
    - Option '--raw-text' and '--lazy-load' replaced with '--dataset-impl' and moved the option definition custom task args to more high-level options.add_dataset_args() (more appropriate)
    - Implemented also utils functions in indexed_dataset: make_dataset(), dataset_exists()
    Pull Request resolved: https://github.com/pytorch/fairseq/pull/589
    
    Differential Revision: D14597128
    
    Pulled By: myleott
    
    fbshipit-source-id: 4e92d99920cbaa52cfe5a0f1f5d9ae5c92d4268e
    a1c997bd
read_binarized.py 1.47 KB