1. 21 Nov, 2023 1 commit
  2. 17 Nov, 2023 1 commit
  3. 16 Nov, 2023 1 commit
  4. 15 Nov, 2023 1 commit
  5. 10 Nov, 2023 1 commit
  6. 08 Nov, 2023 1 commit
  7. 06 Nov, 2023 1 commit
  8. 02 Nov, 2023 1 commit
  9. 31 Oct, 2023 1 commit
  10. 25 Oct, 2023 1 commit
  11. 20 Oct, 2023 1 commit
  12. 16 Oct, 2023 1 commit
  13. 11 Oct, 2023 1 commit
  14. 03 Oct, 2023 1 commit
    • Funtowicz Morgan's avatar
      [AMD] Add initial version for run_tests_multi_gpu (#26346) · 3632fb3c
      Funtowicz Morgan authored
      
      
      * Add initial version for run_tests_multi_gpu
      
      * Trigger change in BERT
      
      * fix typo setup -> setup_gpu
      
      * Add tag mi210
      
      * Enable multi-gpu jobs
      
      * One more
      
      * Use dynamic device allocation
      
      * Attempt to fix syntax for docker create
      
      * fix script path
      
      * fix
      
      * temp machine type
      
      * fix label
      
      * Enable multi-gpu tests
      
      * Rename multi-amd-gpu to multi-gpu
      
      * Let's not be lazy dude
      
      * Update rocm-smi output
      
      * Add gpu_flavour in the matrix
      
      * Fix typos
      
      * merge single/multi dispatch into the matrix
      
      * Format.
      
      * Revert BERT's change
      
      ---------
      Co-authored-by: default avatarGuillaume LEGENDRE <glegendre01@gmail.com>
      3632fb3c
  15. 27 Sep, 2023 1 commit
  16. 20 Sep, 2023 1 commit
    • Funtowicz Morgan's avatar
      Integrate AMD GPU in CI/CD environment (#26007) · 2d71307d
      Funtowicz Morgan authored
      
      
      * Add a Dockerfile for PyTorch + ROCm based on official AMD released artifact
      
      * Add a new artifact single-amdgpu testing on main
      
      * Attempt to test the workflow without merging.
      
      * Changed BERT to check if things are triggered
      
      * Meet the dependencies graph on workflow
      
      * Revert BERT changes
      
      * Add check_runners_amdgpu to correctly mount and check availability
      
      * Rename setup to setup_gpu for CUDA and add setup_amdgpu for AMD
      
      * Fix all the needs.setup -> needs.setup_[gpu|amdgpu] dependencies
      
      * Fix setup dependency graph to use check_runner_amdgpu
      
      * Let's do the runner status check only on AMDGPU target
      
      * Update the Dockerfile.amd to put ourselves in / rather than /var/lib
      
      * Restore the whole setup for CUDA too.
      
      * Let's redisable them
      
      * Change BERT to trigger tests
      
      * Restore BERT
      
      * Add torchaudio with rocm 5.6 to AMD Dockerfile (#26050)
      
      fix dockerfile
      Co-authored-by: default avatarFelix Marty <felix@hf.co>
      
      * Place AMD GPU tests in a separate workflow (correct branch) (#26105)
      
      AMDGPU CI lives in an other workflow
      
      * Fix invalid job name is dependencies.
      
      * Remove tests multi-amdgpu for now.
      
      * Use single-amdgpu
      
      * Use --net=host for now.
      
      * Remote host networking.
      
      * Removed duplicated check_runners_amdgpu step
      
      * Let's tag machine-types with mi210 for now.
      
      * Machine type should be only mi210
      
      * Remove unnecessary push.branches item
      
      * Apply review suggestions moving from `x-amdgpu` to `x-gpu` introducing `amd-gpu` and `miXXX` labels.
      
      * Remove amdgpu from step names.
      
      * finalize
      
      * delete
      
      ---------
      Co-authored-by: default avatarfxmarty <9808326+fxmarty@users.noreply.github.com>
      Co-authored-by: default avatarFelix Marty <felix@hf.co>
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      2d71307d
  17. 18 Sep, 2023 1 commit
  18. 30 Aug, 2023 2 commits
  19. 24 Aug, 2023 1 commit
  20. 23 Aug, 2023 2 commits
  21. 17 Aug, 2023 1 commit
    • Sylvain Gugger's avatar
      More utils doc (#25457) · 2defb6b0
      Sylvain Gugger authored
      * Document and clean more utils.
      
      * More documentation and fixes
      
      * Switch to Lysandre's token
      
      * Address review comments
      
      * Actually put else
      2defb6b0
  22. 17 Jul, 2023 1 commit
  23. 28 Jun, 2023 2 commits
  24. 20 Jun, 2023 1 commit
  25. 19 Jun, 2023 1 commit
  26. 16 Jun, 2023 1 commit
  27. 07 Jun, 2023 2 commits
  28. 05 Jun, 2023 1 commit
  29. 01 Jun, 2023 2 commits
  30. 31 May, 2023 1 commit
    • Sourab Mangrulkar's avatar
      accelerate deepspeed and gradient accumulation integrate (#23236) · a73b1d59
      Sourab Mangrulkar authored
      * mixed precision support via accelerate
      
      * fix issues
      
      * fix for the sharded ddp case
      
      * fix flax and tf failing tests
      
      * `refactor the place to create `Accelerator` object
      
      * move ddp prep to accelerate
      
      * fix 😅
      
      * resolving comments
      
      * move fsdp handling to accelerate
      
      * fixex
      
      * fix saving
      
      * shift torch dynamo handling to accelerate
      
      * shift deepspeed integration and save & load utils to accelerate
      
      * fix accelerate launcher support
      
      * oops
      
      * fix 🐛
      
      * save ckpt fix
      
      * Trigger CI
      
      * nasty 🐛 😅
      
      * as deepspeed needs grad_acc fixes, transfer grad_acc to accelerate
      
      * make tests happy
      
      * quality 
      
      * loss tracked needs to account for grad_acc
      
      * fixing the deepspeed tests
      
      * quality 
      
      * 😅😅😅
      
      * tests 😡
      
      * quality 
      
      
      
      * Trigger CI
      
      * resolve comments and fix the issue with the previous merge from branch
      
      * Trigger CI
      
      * accelerate took over deepspeed integration
      
      ---------
      Co-authored-by: default avatarStas Bekman <stas@stason.org>
      a73b1d59
  31. 22 May, 2023 1 commit
  32. 19 May, 2023 2 commits
  33. 17 May, 2023 1 commit
  34. 11 May, 2023 1 commit