1. 21 Aug, 2019 2 commits
  2. 20 Aug, 2019 2 commits
  3. 19 Aug, 2019 6 commits
  4. 17 Aug, 2019 1 commit
  5. 16 Aug, 2019 2 commits
  6. 15 Aug, 2019 5 commits
  7. 14 Aug, 2019 5 commits
  8. 13 Aug, 2019 4 commits
  9. 12 Aug, 2019 5 commits
  10. 10 Aug, 2019 3 commits
  11. 09 Aug, 2019 3 commits
  12. 08 Aug, 2019 2 commits
    • Hafiz Shafruddin's avatar
      replace 'mkdir' with 'mkdir -p' (#997) · 6398aa9e
      Hafiz Shafruddin authored
      Summary:
      Allow shell script to create sub directories with -p flag. Amends readme file too.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/997
      
      Differential Revision: D16710813
      
      Pulled By: myleott
      
      fbshipit-source-id: 89abefa27e8fac99d212fc9b7b0dbc3690043ba0
      6398aa9e
    • Myle Ott's avatar
      Integrate with Apache Arrow/Plasma in-memory store for large datasets (#995) · 439ead5a
      Myle Ott authored
      Summary:
      Datasets with many examples can generate very large indexes in TokenBlockDataset (and possibly elsewhere). When using `--num-workers>0` these indexes are pickled and transferred via a multiprocessing pipe, which is slow and can fail if the index grows beyond 4GB (~0.5B examples). Apache Arrow has an in-memory store called Plasma that will offload these arrays to shared memory, which both reduces duplication of the data and avoids needing to pickle.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/995
      
      Differential Revision: D16697219
      
      Pulled By: myleott
      
      fbshipit-source-id: 1b679ee5b3d2726af54ff418f6159a3671173fb8
      439ead5a