• Louis Martin's avatar
    Replace unk with original string · 42a0150c
    Louis Martin authored
    * Add <eos> for unk replacement
    * Add IndexedRawTextDataset to load raw text files
    * Replace unk with original string
    * Add load_raw_text_dataset() and --output-format
    * Move has_binary_files to data.py
    42a0150c
preprocess.py 7.3 KB