1. 10 Oct, 2022 1 commit
  2. 07 Oct, 2022 1 commit
  3. 06 Oct, 2022 1 commit
  4. 05 Oct, 2022 1 commit
  5. 03 Oct, 2022 4 commits
  6. 28 Sep, 2022 2 commits
  7. 27 Sep, 2022 1 commit
  8. 26 Sep, 2022 2 commits
  9. 23 Sep, 2022 1 commit
  10. 22 Sep, 2022 2 commits
  11. 21 Sep, 2022 1 commit
  12. 20 Sep, 2022 2 commits
  13. 16 Sep, 2022 1 commit
  14. 14 Sep, 2022 1 commit
  15. 13 Sep, 2022 1 commit
  16. 09 Sep, 2022 2 commits
  17. 07 Sep, 2022 1 commit
  18. 06 Sep, 2022 1 commit
  19. 01 Sep, 2022 1 commit
  20. 25 Aug, 2022 1 commit
  21. 24 Aug, 2022 3 commits
  22. 22 Aug, 2022 1 commit
  23. 18 Aug, 2022 3 commits
  24. 17 Aug, 2022 1 commit
  25. 16 Aug, 2022 1 commit
    • zhoutang776's avatar
      Update run_translation_no_trainer.py (#18637) · 25e651a2
      zhoutang776 authored
      * Update run_translation_no_trainer.py
      
      found an error in selecting `no_decay` parameters and some small modifications when the user continues to train from a checkpoint
      
      * fixs `no_decay` and `resume_step` issue
      
      1. change `no_decay` list
      2. if use continue to train their model from provided checkpoint, the `resume_step` will not be initialized properly if `args.gradient_accumulation_steps != 1`
      25e651a2
  26. 14 Aug, 2022 1 commit
    • Karim Foda's avatar
      Flax Remat for LongT5 (#17994) · d6eeb871
      Karim Foda authored
      
      
      * [Flax] Add remat (gradient checkpointing)
      
      * fix variable naming in test
      
      * flip: checkpoint using a method
      
      * fix naming
      
      * fix class naming
      
      * apply PVP's suggestions from code review
      
      * add gradient_checkpointing to examples
      
      * Add gradient_checkpointing to run_mlm_flax
      
      * Add remat to longt5
      
      * Add gradient checkpointing test longt5
      
      * Fix args errors
      
      * Fix remaining tests
      
      * Make fixup & quality fixes
      
      * replace kwargs
      
      * remove unecessary kwargs
      
      * Make fixup changes
      
      * revert long_t5_flax changes
      
      * Remove return_dict and copy to LongT5
      
      * Remove test_gradient_checkpointing
      Co-authored-by: default avatarsanchit-gandhi <sanchit@huggingface.co>
      d6eeb871
  27. 11 Aug, 2022 2 commits