1. 19 Jul, 2024 1 commit
  2. 18 Jul, 2024 1 commit
  3. 17 Jul, 2024 1 commit
  4. 14 Jul, 2024 1 commit
  5. 11 Jul, 2024 1 commit
    • Arthur's avatar
      Refactor flash attention implementation in transformers (#31446) · e3143952
      Arthur authored
      
      
      * dumb commit
      
      * nit
      
      * update
      
      * something like this
      
      * unpack in modeling utils
      
      * safe import
      
      * oups
      
      * update
      
      * nits
      
      * diff convert gemma
      
      * update
      
      * start propagating
      
      * udpate other modeling code as well
      
      * update for sliding window models
      
      * nits
      
      * more init cleanups
      
      * styling
      
      * fixup
      
      * noice
      
      * pass fixup
      
      * typo typing_extension -> typing_extensions
      
      * torch.nn.functionnal -> torch.nn.functional
      
      * add to import structure
      
      * unpack
      
      * simplify a bit more for this first version
      
      * nut
      
      * update
      
      * update
      
      * nit
      
      * ease the import of `Unpack`
      
      * remove useless `use_sliding_window`
      
      * no qua please
      
      * protect import?
      
      * style
      
      * [run-slow]
      
      * [run slow] llama,gemma,mistral,mixtral
      
      * remove extra kwargs
      
      * fix llama
      
      * address review comments
      
      * apply diff_model_converter to modeling_gemma.py
      
      * remove cache_position 1
      
      * remove cache_position 2
      
      * some cleaning
      
      * refactor gemma2 as well
      
      * apply review comments
      
      * rename file to modeling_flash_attention_utils.py
      
      * siglip refactor
      
      * remove dead code
      
      * is the hub down?
      
      * still down?
      
      * fix siglip
      
      * fix gemma2
      
      * fatal: Could not read from remote repository.
      
      * fix typo in softcap implem
      
      * flacky
      
      * Failed: Timeout >120.0s
      
      ---------
      Co-authored-by: default avatarfxmarty <9808326+fxmarty@users.noreply.github.com>
      e3143952
  6. 26 Jun, 2024 1 commit
  7. 21 Jun, 2024 1 commit
  8. 18 Jun, 2024 1 commit
  9. 05 Jun, 2024 1 commit
    • Cyril Vallez's avatar
      Reduce by 2 the memory requirement in `generate()` 馃敟馃敟馃敟 (#30536) · bd5091df
      Cyril Vallez authored
      * Fix contrastive_search for new cache structure, and improve performance by removing inneficient torch.stack(torch.split(x, top_k, dim=0))
      
      * Fix _contrastive_search for non-standard cache using ellipsis slicing
      
      * Fix all outputs.logits memory leaks for all decoding strategies!
      
      * Fix small error in _contrastive_search()
      
      * Make all necessary change and revert for the new class
      
      * Apply coding style
      
      * Remove pipes in type hints for compatibility
      
      * correct type hint
      
      * apply style
      
      * Use DynamicCache by default and solve conflicts
      
      * Fix rebase issues
      
      * Add `_supports_dynamic_cache_class` in models for models that support DynamicCache but not other caches to make DynamicCache the default for more models
      
      * Create generation config to return legacy format by default, or to choose not to
      
      * style
      
      * Fix case when use_cache is False
      
      * Remove default DynamicCache in assiste_decoding if assistant_model does not support it + fix _seen_tokens when cropping cache
      
      * Update prepare_inputs_for_generation() for case with empty DynamicCache
      
      * Correct return of args in _assisted_decoding
      
      * Remove EfficientDynamicCache as it is no longer needed
      
      * Correct mistake in generation config
      
      * Move cache logic of assisted decoding to AssistedCandidateGenerator.__init__
      
      * change DynamicCache function names from "split" to "batch_split" for readability + apply coding style
      
      * Remove `_supports_dynamic_cache_class` attribute after rebase
      
      * Correct missing line lost in conflict resolution during rebasing
      
      * Add special case for Jamba
      
      * Fix jamba test
      
      * Coding style
      
      * coding style
      
      * Correct missing import in rebasing
      
      * Simplify _validate_model_kwargs based on removal of _supports_dynamic_cache attribute
      
      * Simplify code paths in _contrastive_search
      
      * coding style
      
      * Update docstrings of cache methods
      
      * Update prepare_inputs_for_generation() -> past_key_values are always Cache objects
      bd5091df
  10. 03 Jun, 2024 1 commit
  11. 31 May, 2024 1 commit
    • Arthur's avatar
      Diff converter v2 (#30868) · 96eb0628
      Arthur authored
      * current working example!
      
      * commit regex and result file
      
      * update
      
      * nit
      
      * push the conversion file
      
      * oups
      
      * roadmap and nits
      
      * attempt diffs for 3 files
      
      * persimmon
      
      * nit
      
      * add diff file that is the same as the modeling_llama.py
      
      * fix rope nits
      
      * updates
      
      * updates with converted versions
      
      * give some breathing space to the code
      
      * delete
      
      * update
      
      * update
      
      * push the actual result
      
      * update regex patterns
      
      * update regex patterns
      
      * fix some issues
      
      * fix some issues
      
      * fix some issues
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * revert changes done to llama
      
      * updates
      
      * update gemma
      
      * updates
      
      * oups
      
      * current state
      
      * current state
      
      * update
      
      * ouiiii
      
      * nit
      
      * clear diffs
      
      * nit
      
      * fixup
      
      * update
      
      * doc 馃殌
      
      * 馃敟
      
      * for now use gemma
      
      * deal with comments
      
      * style
      
      * handle funtions
      
      * deal with assigns
      
      * todos
      
      * process inheritage
      
      * keep decorators?
      
      * 馃
      
      * deal with duplicates
      
      * fixup
      
      * correctly remove duplicate code
      
      * run ruff post script
      
      * ruff deals pretty well with imports, let's leave it to him
      
      * ah maybe not lol
      
      * for now remove all imports from child.
      
      * nit
      
      * conversion of llama
      
      * okay
      
      * convert starcoder2
      
      * synch with main
      
      * update llama diff
      
      * updates
      
      * https://docs.astral.sh/ruff/rules/redefined-while-unused/
      
       fixes the imports, bit needs later version of ruff
      
      * updates
      
      * okay actual state
      
      * non zero exit
      
      * update!
      
      * revert unrelated
      
      * remove other diff files
      
      * updates
      
      * cleanup
      
      * update
      
      * less diff!
      
      * stash
      
      * current updates
      
      * updates
      
      * No need for call
      
      * finished fining deps
      
      * update
      
      * current changes
      
      * current state
      
      * current state
      
      * new status
      
      * nit
      
      * finally
      
      * fixes
      
      * nits
      
      * order is now expected
      
      * use logger info instead of prints
      
      * fixup
      
      * up
      
      * nit
      
      * update
      
      * nits
      
      * update
      
      * correct merge
      
      * update
      
      * update
      
      * update
      
      * add warning
      
      * update caution message
      
      * update
      
      * better merging strategy
      
      * copy class statements :wink
      
      * fixups
      
      * nits
      
      * update
      
      * Apply suggestions from code review
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * nits
      
      * smaller header
      
      * do cleanup some stuff
      
      * even simpler header?
      
      * fixup
      
      * updates
      
      * ruff
      
      * update examples
      
      * nit
      
      * TODO
      
      * state
      
      * OUUUUUUF
      
      * current state
      
      * nits
      
      * final state
      
      * add a readme
      
      * fixup
      
      * remove diff llama
      
      * fix
      
      * nit
      
      * dummy noy funny
      
      * ruff format tests src utils --check
      
      * everless diffs
      
      * less diffs and fix test
      
      * fixes
      
      * naming nit?
      
      * update converter and add supper example
      
      * nits
      
      * updated for function signatures
      
      * update
      
      * update
      
      * add converted dummies
      
      * autoformat
      
      * single target assign fix
      
      * fixup
      
      * fix some imports
      
      * fixes
      
      * don't push them
      
      * `# noqa: F841`
      
      ---------
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      96eb0628
  12. 23 May, 2024 1 commit
  13. 20 May, 2024 3 commits
    • Longjie Zheng's avatar
      Add torch.compile for Mistral (#30642) · 616bb11d
      Longjie Zheng authored
      * first version
      
      * fix sliding window
      
      * fix style
      
      * add sliding window cache
      
      * fix style
      
      * address comments
      
      * fix test
      
      * fix style
      
      * move sliding window check inside cache init
      
      * revert changes on irrelevant files & add comment on SlidingWindowCache
      
      * address comments & fix style
      
      fix style
      
      * update causal mask
      
      * [run-slow] mistral
      
      * [run-slow] mistral
      
      * [run-slow] mistral
      
      * [run-slow] mistral
      
      * [run-slow] mistral
      
      * [run-slow] llama
      
      * [run-slow] mistral
      
      * [run-slow] mistral
      
      * [run-slow] mistral
      
      * revert CI from a10 to t4
      
      * wrap up
      616bb11d
    • Benjamin Warner's avatar
      Add support for torch.compile dynamic shapes (#30560) · cd6bd0af
      Benjamin Warner authored
      * add torch.compile dynamic support
      
      * Add SDPA dynamic shapes compile test & improve SDPA comment
      
      * comment consistency
      cd6bd0af
    • Joseph Enguehard's avatar
      Add TokenClassification for Mistral, Mixtral and Qwen2 (#29878) · 07bf2dff
      Joseph Enguehard authored
      
      
      * Add MistralForTokenClassification
      
      * Add tests and docs
      
      * Add token classification for Mixtral and Qwen2
      
      * Save llma for token classification draft
      
      * Add token classification support for Llama, Gemma, Persimmon, StableLm and StarCoder2
      
      * Formatting
      
      * Add token classification support for Qwen2Moe model
      
      * Add dropout layer to each ForTokenClassification model
      
      * Add copied from in tests
      
      * Update src/transformers/models/llama/modeling_llama.py
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      
      * Propagate suggested changes
      
      * Style
      
      ---------
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      07bf2dff
  14. 17 May, 2024 1 commit
    • amyeroberts's avatar
      Remove deprecated logic and warnings (#30743) · 57c965a8
      amyeroberts authored
      * Remove deprecated logic and warnings
      
      * Add back some code that seems to be important...
      
      * Let's just add all he nllb stuff back; removing it is a bit more involved
      
      * Remove kwargs
      
      * Remove more kwargs
      57c965a8
  15. 16 May, 2024 1 commit
  16. 15 May, 2024 1 commit
  17. 13 May, 2024 1 commit
    • Poedator's avatar
      Llama: fix custom 4D masks, v2 (#30348) · a0779b9e
      Poedator authored
      
      
      * 4d mask fixes
      
      * Update custom 4D mask logic
      
      * test moved to mixin
      
      * extra tests 4d mask
      
      * upd 4d mask and StaticCache handling
      
      * added Mask4DTestHard to mistral tests
      
      * post-rebase fixes
      
      * test fixes for StaticCache
      
      * make fix-copies
      
      * upd 1 after #30476
      
      * fix common tests
      
      * rm elif attention_mask.dim() == 4:
      
      * tests combined, fixed, mixtral supported
      
      * bigbird style chg reverted
      
      * rm if attention_mask.dim() == 2
      
      * modeling_llama formatting chg
      
      ---------
      Co-authored-by: default avatarJoao Gante <joao@huggingface.co>
      a0779b9e
  18. 09 May, 2024 1 commit
  19. 08 May, 2024 1 commit
  20. 03 May, 2024 1 commit
  21. 02 May, 2024 1 commit
  22. 30 Apr, 2024 1 commit
  23. 29 Apr, 2024 1 commit
  24. 22 Apr, 2024 1 commit
  25. 18 Apr, 2024 2 commits
  26. 17 Apr, 2024 1 commit
  27. 05 Apr, 2024 1 commit
  28. 30 Mar, 2024 1 commit
  29. 28 Mar, 2024 1 commit
  30. 21 Mar, 2024 1 commit
  31. 20 Mar, 2024 1 commit
    • Arthur's avatar
      [`BC 4.37 -> 4.38`] for Llama family, memory and speed (#29753) · ff841900
      Arthur authored
      * attempt to fix
      
      * the actual fix that works with compilation!
      
      * this?
      
      * temporary update
      
      * nit?
      
      * dispatcg to memory efficient?
      
      * update both models that have static cache support
      
      * fix copies fix compile
      
      * make sure fix
      
      * fix cohere and gemma
      
      * fix beams?
      
      * nit
      
      * slipped through the cracks
      
      * nit
      
      * nits
      
      * update
      
      * fix-copies
      
      * skip failing tests
      
      * nits
      ff841900
  32. 19 Mar, 2024 1 commit
  33. 14 Mar, 2024 1 commit
  34. 13 Mar, 2024 1 commit
  35. 08 Mar, 2024 1 commit
  36. 07 Mar, 2024 1 commit
  37. 06 Mar, 2024 1 commit