1. 20 Sep, 2023 8 commits
    • Arthur's avatar
      include changes from llama (#26260) · f94c9b3d
      Arthur authored
      * include changes from llama
      
      * add a test
      f94c9b3d
    • Jinho Park's avatar
      add bbox input validation (#26294) · 00247ea0
      Jinho Park authored
      00247ea0
    • fxmarty's avatar
      fix deepspeed available detection (#26252) · 24553206
      fxmarty authored
      24553206
    • Matt's avatar
      Rewrite for custom code warning messages (#26291) · f29fe745
      Matt authored
      Quick britpicking for some warning messages!
      f29fe745
    • Funtowicz Morgan's avatar
      Integrate AMD GPU in CI/CD environment (#26007) · 2d71307d
      Funtowicz Morgan authored
      
      
      * Add a Dockerfile for PyTorch + ROCm based on official AMD released artifact
      
      * Add a new artifact single-amdgpu testing on main
      
      * Attempt to test the workflow without merging.
      
      * Changed BERT to check if things are triggered
      
      * Meet the dependencies graph on workflow
      
      * Revert BERT changes
      
      * Add check_runners_amdgpu to correctly mount and check availability
      
      * Rename setup to setup_gpu for CUDA and add setup_amdgpu for AMD
      
      * Fix all the needs.setup -> needs.setup_[gpu|amdgpu] dependencies
      
      * Fix setup dependency graph to use check_runner_amdgpu
      
      * Let's do the runner status check only on AMDGPU target
      
      * Update the Dockerfile.amd to put ourselves in / rather than /var/lib
      
      * Restore the whole setup for CUDA too.
      
      * Let's redisable them
      
      * Change BERT to trigger tests
      
      * Restore BERT
      
      * Add torchaudio with rocm 5.6 to AMD Dockerfile (#26050)
      
      fix dockerfile
      Co-authored-by: default avatarFelix Marty <felix@hf.co>
      
      * Place AMD GPU tests in a separate workflow (correct branch) (#26105)
      
      AMDGPU CI lives in an other workflow
      
      * Fix invalid job name is dependencies.
      
      * Remove tests multi-amdgpu for now.
      
      * Use single-amdgpu
      
      * Use --net=host for now.
      
      * Remote host networking.
      
      * Removed duplicated check_runners_amdgpu step
      
      * Let's tag machine-types with mi210 for now.
      
      * Machine type should be only mi210
      
      * Remove unnecessary push.branches item
      
      * Apply review suggestions moving from `x-amdgpu` to `x-gpu` introducing `amd-gpu` and `miXXX` labels.
      
      * Remove amdgpu from step names.
      
      * finalize
      
      * delete
      
      ---------
      Co-authored-by: default avatarfxmarty <9808326+fxmarty@users.noreply.github.com>
      Co-authored-by: default avatarFelix Marty <felix@hf.co>
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      2d71307d
    • Jinho Park's avatar
      Update bros checkpoint (#26277) · 37c205eb
      Jinho Park authored
      * fix bros integration test
      
      * update bros checkpoint
      37c205eb
    • Sourab Mangrulkar's avatar
      fix name error when accelerate is not available (#26278) · 86ffd5ff
      Sourab Mangrulkar authored
      * fix name error when accelerate is not available
      
      * fix `is_fsdp_available`
      86ffd5ff
    • Sourab Mangrulkar's avatar
      FSDP tests and checkpointing fixes (#26180) · 382ba670
      Sourab Mangrulkar authored
      
      
      * add fsdp tests
      
      * Update test_fsdp.py
      
      * Update test_fsdp.py
      
      * fixes
      
      * checks
      
      * Update trainer.py
      
      * fix
      
      * fixes for saving/resuming checkpoints
      
      * fixes
      
      * add tests and delete debug statements
      
      * fixing tests
      
      * Update test_fsdp.py
      
      * fix tests
      
      * fix tests
      
      * minor nits
      
      * fix code style and quality
      
      * refactor and modularize test code
      
      * reduce the time of tests
      
      * reduce the test time
      
      * fix test
      
      * reduce test time
      
      * reduce test time
      
      * fix failing tests
      
      * fix
      
      * Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * resolve comments
      
      ---------
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      382ba670
  2. 19 Sep, 2023 6 commits
  3. 18 Sep, 2023 18 commits
  4. 15 Sep, 2023 7 commits
  5. 14 Sep, 2023 1 commit