• Ruty Rinott's avatar
    pipeline for LM training · 880e7cd4
    Ruty Rinott authored
    Summary:
    step 2 of pipeline for LM training
    assumes tokenized text data as input. Splits it into train/validation/test, and runs binarization
    (step a_ii in https://fb.quip.com/kazzAxvZHBj9)
    
    Reviewed By: borguz
    
    Differential Revision: D10454705
    
    fbshipit-source-id: 74e8679041f5507c4e404c1b719547c2ae9ed983
    880e7cd4
preprocess.py 13 KB