1. 31 Jul, 2024 1 commit
  2. 30 Jul, 2024 1 commit
  3. 26 Jul, 2024 1 commit
  4. 24 Jul, 2024 1 commit
  5. 23 Jul, 2024 2 commits
  6. 14 Jul, 2024 1 commit
  7. 11 Jul, 2024 1 commit
    • Arthur's avatar
      Refactor flash attention implementation in transformers (#31446) · e3143952
      Arthur authored
      
      
      * dumb commit
      
      * nit
      
      * update
      
      * something like this
      
      * unpack in modeling utils
      
      * safe import
      
      * oups
      
      * update
      
      * nits
      
      * diff convert gemma
      
      * update
      
      * start propagating
      
      * udpate other modeling code as well
      
      * update for sliding window models
      
      * nits
      
      * more init cleanups
      
      * styling
      
      * fixup
      
      * noice
      
      * pass fixup
      
      * typo typing_extension -> typing_extensions
      
      * torch.nn.functionnal -> torch.nn.functional
      
      * add to import structure
      
      * unpack
      
      * simplify a bit more for this first version
      
      * nut
      
      * update
      
      * update
      
      * nit
      
      * ease the import of `Unpack`
      
      * remove useless `use_sliding_window`
      
      * no qua please
      
      * protect import?
      
      * style
      
      * [run-slow]
      
      * [run slow] llama,gemma,mistral,mixtral
      
      * remove extra kwargs
      
      * fix llama
      
      * address review comments
      
      * apply diff_model_converter to modeling_gemma.py
      
      * remove cache_position 1
      
      * remove cache_position 2
      
      * some cleaning
      
      * refactor gemma2 as well
      
      * apply review comments
      
      * rename file to modeling_flash_attention_utils.py
      
      * siglip refactor
      
      * remove dead code
      
      * is the hub down?
      
      * still down?
      
      * fix siglip
      
      * fix gemma2
      
      * fatal: Could not read from remote repository.
      
      * fix typo in softcap implem
      
      * flacky
      
      * Failed: Timeout >120.0s
      
      ---------
      Co-authored-by: default avatarfxmarty <9808326+fxmarty@users.noreply.github.com>
      e3143952
  8. 26 Jun, 2024 1 commit
  9. 21 Jun, 2024 1 commit
  10. 18 Jun, 2024 1 commit
  11. 05 Jun, 2024 1 commit
    • Cyril Vallez's avatar
      Reduce by 2 the memory requirement in `generate()` 馃敟馃敟馃敟 (#30536) · bd5091df
      Cyril Vallez authored
      * Fix contrastive_search for new cache structure, and improve performance by removing inneficient torch.stack(torch.split(x, top_k, dim=0))
      
      * Fix _contrastive_search for non-standard cache using ellipsis slicing
      
      * Fix all outputs.logits memory leaks for all decoding strategies!
      
      * Fix small error in _contrastive_search()
      
      * Make all necessary change and revert for the new class
      
      * Apply coding style
      
      * Remove pipes in type hints for compatibility
      
      * correct type hint
      
      * apply style
      
      * Use DynamicCache by default and solve conflicts
      
      * Fix rebase issues
      
      * Add `_supports_dynamic_cache_class` in models for models that support DynamicCache but not other caches to make DynamicCache the default for more models
      
      * Create generation config to return legacy format by default, or to choose not to
      
      * style
      
      * Fix case when use_cache is False
      
      * Remove default DynamicCache in assiste_decoding if assistant_model does not support it + fix _seen_tokens when cropping cache
      
      * Update prepare_inputs_for_generation() for case with empty DynamicCache
      
      * Correct return of args in _assisted_decoding
      
      * Remove EfficientDynamicCache as it is no longer needed
      
      * Correct mistake in generation config
      
      * Move cache logic of assisted decoding to AssistedCandidateGenerator.__init__
      
      * change DynamicCache function names from "split" to "batch_split" for readability + apply coding style
      
      * Remove `_supports_dynamic_cache_class` attribute after rebase
      
      * Correct missing line lost in conflict resolution during rebasing
      
      * Add special case for Jamba
      
      * Fix jamba test
      
      * Coding style
      
      * coding style
      
      * Correct missing import in rebasing
      
      * Simplify _validate_model_kwargs based on removal of _supports_dynamic_cache attribute
      
      * Simplify code paths in _contrastive_search
      
      * coding style
      
      * Update docstrings of cache methods
      
      * Update prepare_inputs_for_generation() -> past_key_values are always Cache objects
      bd5091df
  12. 22 May, 2024 1 commit
  13. 20 May, 2024 3 commits
    • Longjie Zheng's avatar
      Add torch.compile for Mistral (#30642) · 616bb11d
      Longjie Zheng authored
      * first version
      
      * fix sliding window
      
      * fix style
      
      * add sliding window cache
      
      * fix style
      
      * address comments
      
      * fix test
      
      * fix style
      
      * move sliding window check inside cache init
      
      * revert changes on irrelevant files & add comment on SlidingWindowCache
      
      * address comments & fix style
      
      fix style
      
      * update causal mask
      
      * [run-slow] mistral
      
      * [run-slow] mistral
      
      * [run-slow] mistral
      
      * [run-slow] mistral
      
      * [run-slow] mistral
      
      * [run-slow] llama
      
      * [run-slow] mistral
      
      * [run-slow] mistral
      
      * [run-slow] mistral
      
      * revert CI from a10 to t4
      
      * wrap up
      616bb11d
    • Benjamin Warner's avatar
      Add support for torch.compile dynamic shapes (#30560) · cd6bd0af
      Benjamin Warner authored
      * add torch.compile dynamic support
      
      * Add SDPA dynamic shapes compile test & improve SDPA comment
      
      * comment consistency
      cd6bd0af
    • Joseph Enguehard's avatar
      Add TokenClassification for Mistral, Mixtral and Qwen2 (#29878) · 07bf2dff
      Joseph Enguehard authored
      
      
      * Add MistralForTokenClassification
      
      * Add tests and docs
      
      * Add token classification for Mixtral and Qwen2
      
      * Save llma for token classification draft
      
      * Add token classification support for Llama, Gemma, Persimmon, StableLm and StarCoder2
      
      * Formatting
      
      * Add token classification support for Qwen2Moe model
      
      * Add dropout layer to each ForTokenClassification model
      
      * Add copied from in tests
      
      * Update src/transformers/models/llama/modeling_llama.py
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      
      * Propagate suggested changes
      
      * Style
      
      ---------
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      07bf2dff
  14. 17 May, 2024 1 commit
    • amyeroberts's avatar
      Remove deprecated logic and warnings (#30743) · 57c965a8
      amyeroberts authored
      * Remove deprecated logic and warnings
      
      * Add back some code that seems to be important...
      
      * Let's just add all he nllb stuff back; removing it is a bit more involved
      
      * Remove kwargs
      
      * Remove more kwargs
      57c965a8
  15. 16 May, 2024 1 commit
  16. 14 May, 2024 1 commit
  17. 30 Apr, 2024 1 commit
  18. 17 Apr, 2024 2 commits
  19. 05 Apr, 2024 1 commit
  20. 27 Mar, 2024 1 commit
  21. 08 Mar, 2024 1 commit
  22. 04 Mar, 2024 1 commit
  23. 28 Feb, 2024 1 commit
  24. 14 Feb, 2024 1 commit
  25. 08 Feb, 2024 2 commits
  26. 31 Jan, 2024 1 commit
  27. 29 Jan, 2024 1 commit
  28. 24 Jan, 2024 1 commit
    • Khai Mai's avatar
      Exclude the load balancing loss of padding tokens in Mixtral-8x7B (#28517) · c5c69096
      Khai Mai authored
      * fix the function load_balancing_loss_func in Mixtral_Moe to include attention_mask
      
      * format code using black and ruff
      
      * skip computing mask if attention_mask=None
      
      * add tests for load balancing loss Mixtral-Moe
      
      * fix assert loss is different in mixtral_test
      
      * fix pad_leng
      
      * use assertNotAlmostEqual and print to debug
      
      * remove print for debug
      
      * minor updates
      
      * reduce rtol and atol
      c5c69096
  29. 15 Jan, 2024 1 commit
  30. 12 Jan, 2024 1 commit
  31. 11 Jan, 2024 1 commit
  32. 05 Jan, 2024 1 commit
  33. 26 Dec, 2023 1 commit
  34. 22 Dec, 2023 1 commit
  35. 21 Dec, 2023 1 commit