"docs/source/en/model_summary.md" did not exist on "b0c7d2ec58a8080780fb21327cc7cd7e1a177780"
  1. 28 Mar, 2024 1 commit
  2. 20 Mar, 2024 1 commit
  3. 13 Mar, 2024 1 commit
  4. 12 Mar, 2024 1 commit
  5. 26 Feb, 2024 2 commits
  6. 20 Feb, 2024 1 commit
  7. 14 Feb, 2024 1 commit
    • JB (Don)'s avatar
      Add tie_weights() to LM heads and set bias in set_output_embeddings() (#28948) · 725f4ad1
      JB (Don) authored
      * Add tie_weights() to LM heads and set bias in set_output_embeddings()
      
      The bias were not tied correctly in some LM heads, and this change should fix that.
      
      * Moving test_save_and_load_low_cpu_mem_usage to ModelTesterMixin
      
      * Adding _tie_weights() to MPNet and Vilt
      
      * Skip test for low cpu mem usage for Deta/DeformableDetr since they cannot init on meta device
      
      * Rename to test name to save_load to match the convention
      725f4ad1
  8. 12 Feb, 2024 1 commit
  9. 01 Feb, 2024 1 commit
  10. 31 Jan, 2024 1 commit
  11. 17 Jan, 2024 2 commits
  12. 10 Jan, 2024 1 commit
  13. 20 Dec, 2023 1 commit
  14. 11 Dec, 2023 2 commits
  15. 08 Dec, 2023 2 commits
    • fxmarty's avatar
      F.scaled_dot_product_attention support (#26572) · 80377eb0
      fxmarty authored
      
      
      * add sdpa
      
      * wip
      
      * cleaning
      
      * add ref
      
      * yet more cleaning
      
      * and more :)
      
      * wip llama
      
      * working llama
      
      * add output_attentions=True support
      
      * bigcode sdpa support
      
      * fixes
      
      * gpt-bigcode support, require torch>=2.1.1
      
      * add falcon support
      
      * fix conflicts falcon
      
      * style
      
      * fix attention_mask definition
      
      * remove output_attentions from attnmaskconverter
      
      * support whisper without removing any Copied from statement
      
      * fix mbart default to eager renaming
      
      * fix typo in falcon
      
      * fix is_causal in SDPA
      
      * check is_flash_attn_2_available in the models init as well in case the model is not initialized through from_pretrained
      
      * add warnings when falling back on the manual implementation
      
      * precise doc
      
      * wip replace _flash_attn_enabled by config.attn_implementation
      
      * fix typo
      
      * add tests
      
      * style
      
      * add a copy.deepcopy on the config in from_pretrained, as we do not want to modify it inplace
      
      * obey to config.attn_implementation if a config is passed in from_pretrained
      
      * fix is_torch_sdpa_available when torch is not installed
      
      * remove dead code
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/models/bart/modeling_bart.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * remove duplicate pretraining_tp code
      
      * add dropout in llama
      
      * precise comment on attn_mask
      
      * add fmt: off for _unmask_unattended docstring
      
      * precise num_masks comment
      
      * nuke pretraining_tp in LlamaSDPAAttention following Arthur's suggestion
      
      * cleanup modeling_utils
      
      * backward compatibility
      
      * fix style as requested
      
      * style
      
      * improve documentation
      
      * test pass
      
      * style
      
      * add _unmask_unattended tests
      
      * skip meaningless tests for idefics
      
      * hard_check SDPA requirements when specifically requested
      
      * standardize the use if XXX_ATTENTION_CLASSES
      
      * fix SDPA bug with mem-efficient backend on CUDA when using fp32
      
      * fix test
      
      * rely on SDPA is_causal parameter to handle the causal mask in some cases
      
      * fix FALCON_ATTENTION_CLASSES
      
      * remove _flash_attn_2_enabled occurences
      
      * fix test
      
      * add OPT to the list of supported flash models
      
      * improve test
      
      * properly test on different SDPA backends, on different dtypes & properly handle separately the pad tokens in the test
      
      * remove remaining _flash_attn_2_enabled occurence
      
      * Update src/transformers/modeling_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update docs/source/en/perf_infer_gpu_one.md
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * remove use_attn_implementation
      
      * fix docstring & slight bug
      
      * make attn_implementation internal (_attn_implementation)
      
      * typos
      
      * fix tests
      
      * deprecate use_flash_attention_2=True
      
      * fix test
      
      * add back llama that was removed by mistake
      
      * fix tests
      
      * remove _flash_attn_2_enabled occurences bis
      
      * add check & test that passed attn_implementation is valid
      
      * fix falcon torchscript export
      
      * fix device of mask in tests
      
      * add tip about torch.jit.trace and move bt doc below sdpa
      
      * fix parameterized.expand order
      
      * move tests from test_modeling_attn_mask_utils to test_modeling_utils as a relevant test class is already there
      
      * update sdpaattention class with the new cache
      
      * Update src/transformers/configuration_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/models/bark/modeling_bark.py
      
      * address review comments
      
      * WIP torch.jit.trace fix. left: test both eager & sdpa
      
      * add test for torch.jit.trace for both eager/sdpa
      
      * fix falcon with torch==2.0 that needs to use sdpa
      
      * fix doc
      
      * hopefully last fix
      
      * fix key_value_length that has no default now in mask converter
      
      * is it flacky?
      
      * fix speculative decoding bug
      
      * tests do pass
      
      * fix following #27907
      
      ---------
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      80377eb0
    • Tom Aarsen's avatar
      Generate: New `Cache` abstraction and Attention Sinks support (#26681) · 633215ba
      Tom Aarsen authored
      * Draft version of new KV Caching
      
      This should allow Attention Sinks (https://github.com/tomaarsen/attention_sinks)
      / StreamingLLM (https://arxiv.org/abs/2309.17453) to be easily implemented
      in a third-party or in transformers directly
      
      * Address numerous PR suggestions
      
      1. Move layer_idx from cache to ...Attention. Removes confusing set_layer_idx magic.
      2. Always convert past_key_values to Cache instance at the start of ...Attention, removes all other isinstance calls.
      3. Remove __bool__ and __getitem__ magic as they're confusing.
      4. past_key_values.update(key, value, idx) now returns key, value.
      5. Add use_legacy_cache flag, defaults to None, i.e. Falsey. This breaks generate for now, until 1) the cache is used is generate() or 2) use_legacy_cache is defaulted to True in generate() until we change it in another PR.
      6. Separate key_cache and value_cache.
      
      Some work is still needed to see if the SinkCache can conveniently be implemented with just one update method.
      
      * Implement the SinkCache through backward+forward rotations
      
      * Integrate (Sink)Cache with Llama FA2
      
      * Set use_legacy_cache=True as default, allows for test passes
      
      * Move from/to_legacy_cache to ...Model class
      
      * Undo unnecessary newline change
      
      * Remove copy utility from deprecated OpenLlama
      
      * Match import style
      
      * manual rebase with main
      
      * Cache class working with generate (#1)
      
      * Draft version of new KV Caching
      
      This should allow Attention Sinks (https://github.com/tomaarsen/attention_sinks)
      / StreamingLLM (https://arxiv.org/abs/2309.17453
      
      ) to be easily implemented
      in a third-party or in transformers directly
      
      * Address numerous PR suggestions
      
      1. Move layer_idx from cache to ...Attention. Removes confusing set_layer_idx magic.
      2. Always convert past_key_values to Cache instance at the start of ...Attention, removes all other isinstance calls.
      3. Remove __bool__ and __getitem__ magic as they're confusing.
      4. past_key_values.update(key, value, idx) now returns key, value.
      5. Add use_legacy_cache flag, defaults to None, i.e. Falsey. This breaks generate for now, until 1) the cache is used is generate() or 2) use_legacy_cache is defaulted to True in generate() until we change it in another PR.
      6. Separate key_cache and value_cache.
      
      Some work is still needed to see if the SinkCache can conveniently be implemented with just one update method.
      
      * Integrate (Sink)Cache with Llama FA2
      
      * Move from/to_legacy_cache to ...Model class
      
      * Undo unnecessary newline change
      
      * Match import style
      
      * working generate
      
      * Add tests; Simplify code; Apply changes to Mistral and Persimmon
      
      * fix rebase mess
      
      * a few more manual fixes
      
      * last manual fix
      
      * propagate changes to phi
      
      * upgrade test
      
      * add use_legacy_cache docstring; beef up tests
      
      * reintroduce unwanted deletes
      
      ---------
      Co-authored-by: default avatarTom Aarsen <Cubiegamedev@gmail.com>
      
      * move import
      
      * add default to model_kwargs.get('use_legacy_cache')
      
      * correct failing test
      
      * Apply suggestions from code review
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * apply PR suggestions
      
      * fix failing test
      
      * Apply suggestions from code review
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarTom Aarsen <37621491+tomaarsen@users.noreply.github.com>
      
      * PR comments
      
      * tmp commit
      
      * add docstrings
      
      * more tests, more docstrings, add to docs
      
      * derp
      
      * tmp commit
      
      * tmp dbg
      
      * more dbg
      
      * fix beam search bug
      
      * cache can be a list of tuples in some models
      
      * fix group beam search
      
      * all but sinkcache integration tests
      
      * fix sink cache and add hard integration test
      
      * now also compatible with input_embeds input
      
      * PR comments
      
      * add Cache support to Phi+FA2
      
      * make fixup
      
      ---------
      Co-authored-by: default avatarJoao Gante <joao@huggingface.co>
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      633215ba
  16. 04 Dec, 2023 2 commits
  17. 27 Nov, 2023 1 commit
  18. 24 Nov, 2023 1 commit
  19. 21 Nov, 2023 2 commits
  20. 20 Nov, 2023 1 commit
  21. 16 Nov, 2023 1 commit
    • Arthur's avatar
      [`Styling`] stylify using ruff (#27144) · 651408a0
      Arthur authored
      
      
      * try to stylify using ruff
      
      * might need to remove these changes?
      
      * use ruf format andruff check
      
      * use isinstance instead of type comparision
      
      * use # fmt: skip
      
      * use # fmt: skip
      
      * nits
      
      * soem styling changes
      
      * update ci job
      
      * nits isinstance
      
      * more files update
      
      * nits
      
      * more nits
      
      * small nits
      
      * check and format
      
      * revert wrong changes
      
      * actually use formatter instead of checker
      
      * nits
      
      * well docbuilder is overwriting this commit
      
      * revert notebook changes
      
      * try to nuke docbuilder
      
      * style
      
      * fix feature exrtaction test
      
      * remve `indent-width = 4`
      
      * fixup
      
      * more nits
      
      * update the ruff version that we use
      
      * style
      
      * nuke docbuilder styling
      
      * leve the print for detected changes
      
      * nits
      
      * Remove file I/O
      Co-authored-by: default avatarcharliermarsh <charlie.r.marsh@gmail.com>
      
      * style
      
      * nits
      
      * revert notebook changes
      
      * Add # fmt skip when possible
      
      * Add # fmt skip when possible
      
      * Fix
      
      * More `  # fmt: skip` usage
      
      * More `  # fmt: skip` usage
      
      * More `  # fmt: skip` usage
      
      * NIts
      
      * more fixes
      
      * fix tapas
      
      * Another way to skip
      
      * Recommended way
      
      * Fix two more fiels
      
      * Remove asynch
      Remove asynch
      
      ---------
      Co-authored-by: default avatarcharliermarsh <charlie.r.marsh@gmail.com>
      651408a0
  22. 13 Nov, 2023 1 commit
  23. 02 Nov, 2023 1 commit
  24. 01 Nov, 2023 2 commits
  25. 31 Oct, 2023 1 commit
  26. 30 Oct, 2023 1 commit
  27. 25 Oct, 2023 1 commit
    • Younes Belkada's avatar
      [`core`] Refactor of `gradient_checkpointing` (#27020) · 06e782da
      Younes Belkada authored
      * v1
      
      * fix
      
      * remove `create_custom_forward`
      
      * fixup
      
      * fixup
      
      * add test and fix all failing GC tests
      
      * remove all remaining `create_custom_forward` methods
      
      * fix idefics bug
      
      * fixup
      
      * replace with `__call__`
      
      * add comment
      
      * quality
      06e782da
  28. 18 Oct, 2023 1 commit
  29. 17 Oct, 2023 1 commit
  30. 05 Oct, 2023 1 commit
  31. 03 Oct, 2023 1 commit
  32. 29 Sep, 2023 1 commit
  33. 27 Sep, 2023 1 commit