1. 08 Apr, 2020 1 commit
    • Jared Casper's avatar
      Updates to preprocess_data.py and indexed_dataset. · da0562fc
      Jared Casper authored
      preprocess_data:
      - Adds ability to not split sentences. This is used for gpt2 datasets.
      
      - Adds ability to create multiple datasets from different json keys,
      this is current untested.
      
      indexed_dataset:
      - Add new "get" function to get a portion of an entry.
      da0562fc