1. 18 May, 2021 1 commit
  2. 11 May, 2021 1 commit
  3. 04 May, 2021 1 commit
    • Sylvain Gugger's avatar
      Reproducible checkpoint (#11582) · 6b241e0e
      Sylvain Gugger authored
      * Set generator in dataloader
      
      * Use generator in all random samplers
      
      * Checkpoint all RNG states
      
      * Final version
      
      * Quality
      
      * Test
      
      * Address review comments
      
      * Quality
      
      * Remove debug util
      
      * Add python and numpy RNGs
      
      * Split states in different files in distributed
      
      * Quality
      
      * local_rank for TPUs
      
      * Only use generator when accepted
      
      * Add test
      
      * Set seed to avoid flakiness
      
      * Make test less flaky
      
      * Quality
      6b241e0e
  4. 26 Apr, 2021 2 commits
  5. 23 Apr, 2021 1 commit
  6. 20 Apr, 2021 1 commit
  7. 16 Apr, 2021 1 commit
  8. 14 Apr, 2021 1 commit
  9. 08 Apr, 2021 1 commit
    • Stas Bekman's avatar
      [DeepSpeed] ZeRO Stage 3 (#10753) · c6d66484
      Stas Bekman authored
      
      
      * synced gpus
      
      * fix
      
      * fix
      
      * need to use t5-small for quality tests
      
      * notes
      
      * complete merge
      
      * fix a disappearing std stream problem
      
      * start zero3 tests
      
      * wip
      
      * tune params
      
      * sorting out the pre-trained model loading
      
      * reworking generate loop wip
      
      * wip
      
      * style
      
      * fix tests
      
      * split the tests
      
      * refactor tests
      
      * wip
      
      * parameterized
      
      * fix
      
      * workout the resume from non-ds checkpoint pass + test
      
      * cleanup
      
      * remove no longer needed code
      
      * split getter/setter functions
      
      * complete the docs
      
      * suggestions
      
      * gpus and their compute capabilities link
      
      * Apply suggestions from code review
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * style
      
      * remove invalid paramgd
      
      * automatically configure zero3 params that rely on hidden size
      
      * make _get_resized_embeddings zero3-aware
      
      * add test exercising resize_token_embeddings()
      
      * add docstring
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      c6d66484
  10. 17 Mar, 2021 2 commits
  11. 08 Mar, 2021 6 commits
  12. 04 Mar, 2021 1 commit
  13. 28 Feb, 2021 1 commit
  14. 26 Feb, 2021 1 commit
  15. 19 Feb, 2021 2 commits
  16. 18 Feb, 2021 1 commit
  17. 16 Feb, 2021 1 commit
  18. 15 Feb, 2021 1 commit
  19. 28 Jan, 2021 1 commit
  20. 27 Jan, 2021 1 commit
  21. 14 Jan, 2021 1 commit
  22. 13 Jan, 2021 1 commit
  23. 06 Jan, 2021 1 commit
  24. 18 Dec, 2020 1 commit
  25. 15 Dec, 2020 1 commit
  26. 01 Dec, 2020 1 commit
  27. 23 Nov, 2020 1 commit
    • Colin Brochtrup's avatar
      Add early stopping callback to pytorch trainer (#8581) · 8ffc01a7
      Colin Brochtrup authored
      * Add early stopping patience and minimum threshold metric must improve to prevent early stopping to pytorch trainer
      
      * Add early stopping test
      
      * Set patience counter to 0 if best metric not defined yet
      
      * Make early stopping a callback. Add callback event for updating the best metric for early stopping callback to trigger on.
      
      * Run make style
      
      * make funciton name sensible
      
      * Improve new argument docstring wording and hope that flakey CI test passes.
      
      * Use on_evaluation callback instead of custom. Remove some debug printing
      
      * Move early stopping arguments and state into early stopping callback
      
      * Run make style
      
      * Remove old code
      
      * Fix docs formatting. make style went rogue on me.
      
      * Remove copied attributes and fix variable
      
      * Add assertions on training arguments instead of mutating them. Move comment out of public docs.
      
      * Make separate test for early stopping callback. Add test of invalid arguments.
      
      * Run make style... I remembered before CI this time!
      
      * appease flake8
      
      * Add EarlyStoppingCallback to callback docs
      
      * Make docstring EarlyStoppingCallabck match other callbacks.
      
      * Fix typo in docs
      8ffc01a7
  28. 19 Nov, 2020 1 commit
  29. 18 Nov, 2020 1 commit
  30. 05 Nov, 2020 1 commit
  31. 03 Nov, 2020 1 commit
  32. 21 Oct, 2020 1 commit