1. 23 Jul, 2024 1 commit
  2. 14 Jul, 2024 1 commit
  3. 11 Jul, 2024 1 commit
    • Arthur's avatar
      Refactor flash attention implementation in transformers (#31446) · e3143952
      Arthur authored
      
      
      * dumb commit
      
      * nit
      
      * update
      
      * something like this
      
      * unpack in modeling utils
      
      * safe import
      
      * oups
      
      * update
      
      * nits
      
      * diff convert gemma
      
      * update
      
      * start propagating
      
      * udpate other modeling code as well
      
      * update for sliding window models
      
      * nits
      
      * more init cleanups
      
      * styling
      
      * fixup
      
      * noice
      
      * pass fixup
      
      * typo typing_extension -> typing_extensions
      
      * torch.nn.functionnal -> torch.nn.functional
      
      * add to import structure
      
      * unpack
      
      * simplify a bit more for this first version
      
      * nut
      
      * update
      
      * update
      
      * nit
      
      * ease the import of `Unpack`
      
      * remove useless `use_sliding_window`
      
      * no qua please
      
      * protect import?
      
      * style
      
      * [run-slow]
      
      * [run slow] llama,gemma,mistral,mixtral
      
      * remove extra kwargs
      
      * fix llama
      
      * address review comments
      
      * apply diff_model_converter to modeling_gemma.py
      
      * remove cache_position 1
      
      * remove cache_position 2
      
      * some cleaning
      
      * refactor gemma2 as well
      
      * apply review comments
      
      * rename file to modeling_flash_attention_utils.py
      
      * siglip refactor
      
      * remove dead code
      
      * is the hub down?
      
      * still down?
      
      * fix siglip
      
      * fix gemma2
      
      * fatal: Could not read from remote repository.
      
      * fix typo in softcap implem
      
      * flacky
      
      * Failed: Timeout >120.0s
      
      ---------
      Co-authored-by: default avatarfxmarty <9808326+fxmarty@users.noreply.github.com>
      e3143952
  4. 07 Jun, 2024 1 commit
  5. 23 May, 2024 1 commit
  6. 16 May, 2024 1 commit
  7. 15 May, 2024 1 commit
  8. 14 May, 2024 1 commit
  9. 13 May, 2024 1 commit
    • Poedator's avatar
      Llama: fix custom 4D masks, v2 (#30348) · a0779b9e
      Poedator authored
      
      
      * 4d mask fixes
      
      * Update custom 4D mask logic
      
      * test moved to mixin
      
      * extra tests 4d mask
      
      * upd 4d mask and StaticCache handling
      
      * added Mask4DTestHard to mistral tests
      
      * post-rebase fixes
      
      * test fixes for StaticCache
      
      * make fix-copies
      
      * upd 1 after #30476
      
      * fix common tests
      
      * rm elif attention_mask.dim() == 4:
      
      * tests combined, fixed, mixtral supported
      
      * bigbird style chg reverted
      
      * rm if attention_mask.dim() == 2
      
      * modeling_llama formatting chg
      
      ---------
      Co-authored-by: default avatarJoao Gante <joao@huggingface.co>
      a0779b9e
  10. 09 May, 2024 1 commit
  11. 08 May, 2024 1 commit
  12. 03 May, 2024 1 commit
  13. 02 May, 2024 1 commit
  14. 30 Apr, 2024 1 commit
  15. 29 Apr, 2024 1 commit
  16. 22 Apr, 2024 1 commit
  17. 18 Apr, 2024 2 commits
  18. 17 Apr, 2024 1 commit
  19. 05 Apr, 2024 1 commit
  20. 30 Mar, 2024 1 commit
  21. 28 Mar, 2024 1 commit
  22. 21 Mar, 2024 1 commit
  23. 20 Mar, 2024 1 commit
    • Arthur's avatar
      [`BC 4.37 -> 4.38`] for Llama family, memory and speed (#29753) · ff841900
      Arthur authored
      * attempt to fix
      
      * the actual fix that works with compilation!
      
      * this?
      
      * temporary update
      
      * nit?
      
      * dispatcg to memory efficient?
      
      * update both models that have static cache support
      
      * fix copies fix compile
      
      * make sure fix
      
      * fix cohere and gemma
      
      * fix beams?
      
      * nit
      
      * slipped through the cracks
      
      * nit
      
      * nits
      
      * update
      
      * fix-copies
      
      * skip failing tests
      
      * nits
      ff841900
  24. 19 Mar, 2024 1 commit
  25. 14 Mar, 2024 1 commit
  26. 13 Mar, 2024 1 commit
  27. 08 Mar, 2024 1 commit
  28. 07 Mar, 2024 1 commit
  29. 06 Mar, 2024 2 commits
  30. 01 Mar, 2024 2 commits
  31. 28 Feb, 2024 4 commits
  32. 27 Feb, 2024 1 commit
  33. 26 Feb, 2024 1 commit
  34. 23 Feb, 2024 1 commit