1. 19 Jul, 2024 2 commits
  2. 17 Jul, 2024 1 commit
  3. 15 Jul, 2024 1 commit
  4. 02 Jul, 2024 1 commit
    • Sanchit Gandhi's avatar
      [whisper] static kv cache (#31166) · a9701953
      Sanchit Gandhi authored
      
      
      * make work with cache abstraction
      
      * correct for static cache
      
      * hacks for compile
      
      * make fast
      
      * fix
      
      * fix pos ids
      
      * generate
      
      * fix sdpa
      
      * fix sdpa cache pos
      
      * fix fa2
      
      * clean fa2
      
      * integrate cache into generate
      
      * make style
      
      * copies
      
      * more copies
      
      * update eager
      
      * update sdpa
      
      * update fa2
      
      * simplify
      
      * use cache pos
      
      * always compute cross-cache for debug
      
      * avoid recompiles
      Co-authored-by: default avatarArthur Zucker <arthur@huggingface.co>
      
      * fix fix
      
      * fix fix fix
      
      * more fix
      
      * try encoder-decoder cache (too messy)
      
      * revert encoder-decoder cache
      
      * check cross-attn cache
      
      * use enc-dec dataclass
      
      * use richer enc-dec dataclass
      
      * clean-up
      
      * revert static cache changes
      
      * small fixes
      
      * revert to cpu flag
      
      * fix copies
      
      * add static slow test
      
      * past k/v docstring
      
      * more docstrings
      
      * cache_position docstrings
      
      * add to docs
      
      * add enc-dec cache to docs
      
      * make style
      
      * fix after rebase
      
      * fix beam
      
      * style
      
      * fix generation strategies
      
      * fix most decoder-only tests
      
      * style
      
      * skip test
      
      * more clean up
      
      * small docstrings
      
      * Apply suggestions from code review
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      
      * add todo
      
      * only crop self-attn
      
      * check cache in mixin
      
      * style
      
      * fix re-compile after rebase
      
      * move `is_updated` logic to enc-dec wrapper
      
      * revert back
      
      * revert cache back
      
      * finalise design
      
      * fix
      
      * fix fix
      
      * style
      
      * Update src/transformers/cache_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * deprecate
      
      * updates
      
      * final updates
      
      * style
      
      * style
      
      ---------
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      a9701953
  5. 26 Jun, 2024 1 commit
  6. 17 Jun, 2024 1 commit
    • Albert Villanova del Moral's avatar
      Pass datasets trust_remote_code (#31406) · a14b055b
      Albert Villanova del Moral authored
      * Pass datasets trust_remote_code
      
      * Pass trust_remote_code in more tests
      
      * Add trust_remote_dataset_code arg to some tests
      
      * Revert "Temporarily pin datasets upper version to fix CI"
      
      This reverts commit b7672826.
      
      * Pass trust_remote_code in librispeech_asr_dummy docstrings
      
      * Revert "Pin datasets<2.20.0 for examples"
      
      This reverts commit 833fc17a.
      
      * Pass trust_remote_code to all examples
      
      * Revert "Add trust_remote_dataset_code arg to some tests" to research_projects
      
      * Pass trust_remote_code to tests
      
      * Pass trust_remote_code to docstrings
      
      * Fix flax examples tests requirements
      
      * Pass trust_remote_dataset_code arg to tests
      
      * Replace trust_remote_dataset_code with trust_remote_code in one example
      
      * Fix duplicate trust_remote_code
      
      * Replace args.trust_remote_dataset_code with args.trust_remote_code
      
      * Replace trust_remote_dataset_code with trust_remote_code in parser
      
      * Replace trust_remote_dataset_code with trust_remote_code in dataclasses
      
      * Replace trust_remote_dataset_code with trust_remote_code arg
      a14b055b
  7. 07 Jun, 2024 1 commit
  8. 27 May, 2024 1 commit
  9. 22 May, 2024 1 commit
  10. 20 May, 2024 1 commit
  11. 15 May, 2024 1 commit
  12. 09 May, 2024 1 commit
  13. 19 Apr, 2024 2 commits
  14. 09 Apr, 2024 1 commit
  15. 03 Apr, 2024 2 commits
  16. 01 Apr, 2024 1 commit
  17. 12 Mar, 2024 1 commit
  18. 08 Mar, 2024 1 commit
  19. 27 Feb, 2024 1 commit
  20. 31 Jan, 2024 1 commit
  21. 19 Jan, 2024 1 commit
  22. 18 Jan, 2024 1 commit
  23. 10 Jan, 2024 1 commit
  24. 22 Dec, 2023 1 commit
  25. 08 Dec, 2023 1 commit
    • fxmarty's avatar
      F.scaled_dot_product_attention support (#26572) · 80377eb0
      fxmarty authored
      
      
      * add sdpa
      
      * wip
      
      * cleaning
      
      * add ref
      
      * yet more cleaning
      
      * and more :)
      
      * wip llama
      
      * working llama
      
      * add output_attentions=True support
      
      * bigcode sdpa support
      
      * fixes
      
      * gpt-bigcode support, require torch>=2.1.1
      
      * add falcon support
      
      * fix conflicts falcon
      
      * style
      
      * fix attention_mask definition
      
      * remove output_attentions from attnmaskconverter
      
      * support whisper without removing any Copied from statement
      
      * fix mbart default to eager renaming
      
      * fix typo in falcon
      
      * fix is_causal in SDPA
      
      * check is_flash_attn_2_available in the models init as well in case the model is not initialized through from_pretrained
      
      * add warnings when falling back on the manual implementation
      
      * precise doc
      
      * wip replace _flash_attn_enabled by config.attn_implementation
      
      * fix typo
      
      * add tests
      
      * style
      
      * add a copy.deepcopy on the config in from_pretrained, as we do not want to modify it inplace
      
      * obey to config.attn_implementation if a config is passed in from_pretrained
      
      * fix is_torch_sdpa_available when torch is not installed
      
      * remove dead code
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/models/bart/modeling_bart.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * remove duplicate pretraining_tp code
      
      * add dropout in llama
      
      * precise comment on attn_mask
      
      * add fmt: off for _unmask_unattended docstring
      
      * precise num_masks comment
      
      * nuke pretraining_tp in LlamaSDPAAttention following Arthur's suggestion
      
      * cleanup modeling_utils
      
      * backward compatibility
      
      * fix style as requested
      
      * style
      
      * improve documentation
      
      * test pass
      
      * style
      
      * add _unmask_unattended tests
      
      * skip meaningless tests for idefics
      
      * hard_check SDPA requirements when specifically requested
      
      * standardize the use if XXX_ATTENTION_CLASSES
      
      * fix SDPA bug with mem-efficient backend on CUDA when using fp32
      
      * fix test
      
      * rely on SDPA is_causal parameter to handle the causal mask in some cases
      
      * fix FALCON_ATTENTION_CLASSES
      
      * remove _flash_attn_2_enabled occurences
      
      * fix test
      
      * add OPT to the list of supported flash models
      
      * improve test
      
      * properly test on different SDPA backends, on different dtypes & properly handle separately the pad tokens in the test
      
      * remove remaining _flash_attn_2_enabled occurence
      
      * Update src/transformers/modeling_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update docs/source/en/perf_infer_gpu_one.md
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * remove use_attn_implementation
      
      * fix docstring & slight bug
      
      * make attn_implementation internal (_attn_implementation)
      
      * typos
      
      * fix tests
      
      * deprecate use_flash_attention_2=True
      
      * fix test
      
      * add back llama that was removed by mistake
      
      * fix tests
      
      * remove _flash_attn_2_enabled occurences bis
      
      * add check & test that passed attn_implementation is valid
      
      * fix falcon torchscript export
      
      * fix device of mask in tests
      
      * add tip about torch.jit.trace and move bt doc below sdpa
      
      * fix parameterized.expand order
      
      * move tests from test_modeling_attn_mask_utils to test_modeling_utils as a relevant test class is already there
      
      * update sdpaattention class with the new cache
      
      * Update src/transformers/configuration_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/models/bark/modeling_bark.py
      
      * address review comments
      
      * WIP torch.jit.trace fix. left: test both eager & sdpa
      
      * add test for torch.jit.trace for both eager/sdpa
      
      * fix falcon with torch==2.0 that needs to use sdpa
      
      * fix doc
      
      * hopefully last fix
      
      * fix key_value_length that has no default now in mask converter
      
      * is it flacky?
      
      * fix speculative decoding bug
      
      * tests do pass
      
      * fix following #27907
      
      ---------
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      80377eb0
  26. 23 Nov, 2023 1 commit
  27. 22 Nov, 2023 1 commit
    • Patrick von Platen's avatar
      [Whisper] Add sequential longform decoding (#27492) · 4151fbb4
      Patrick von Platen authored
      * [Whisper] Add seq gen
      
      * [Whisper] Add seq gen
      
      * more debug
      
      * Fix whisper logit processor
      
      * Improve whisper code further
      
      * Fix more
      
      * more debug
      
      * more debug
      
      * Improve further
      
      * Add tests
      
      * Prep for batch size > 1
      
      * Get batch_size>1 working
      
      * Correct more
      
      * Add extensive tests
      
      * more debug
      
      * more debug
      
      * more debug
      
      * add more tests
      
      * more debug
      
      * Apply suggestions from code review
      
      * more debug
      
      * add comments to explain the code better
      
      * add comments to explain the code better
      
      * add comments to explain the code better
      
      * Add more examples
      
      * add comments to explain the code better
      
      * fix more
      
      * add comments to explain the code better
      
      * add comments to explain the code better
      
      * correct
      
      * correct
      
      * finalize
      
      * Apply suggestions from code review
      
      * Apply suggestions from code review
      4151fbb4
  28. 16 Nov, 2023 2 commits
    • Arthur's avatar
      [`Styling`] stylify using ruff (#27144) · 651408a0
      Arthur authored
      
      
      * try to stylify using ruff
      
      * might need to remove these changes?
      
      * use ruf format andruff check
      
      * use isinstance instead of type comparision
      
      * use # fmt: skip
      
      * use # fmt: skip
      
      * nits
      
      * soem styling changes
      
      * update ci job
      
      * nits isinstance
      
      * more files update
      
      * nits
      
      * more nits
      
      * small nits
      
      * check and format
      
      * revert wrong changes
      
      * actually use formatter instead of checker
      
      * nits
      
      * well docbuilder is overwriting this commit
      
      * revert notebook changes
      
      * try to nuke docbuilder
      
      * style
      
      * fix feature exrtaction test
      
      * remve `indent-width = 4`
      
      * fixup
      
      * more nits
      
      * update the ruff version that we use
      
      * style
      
      * nuke docbuilder styling
      
      * leve the print for detected changes
      
      * nits
      
      * Remove file I/O
      Co-authored-by: default avatarcharliermarsh <charlie.r.marsh@gmail.com>
      
      * style
      
      * nits
      
      * revert notebook changes
      
      * Add # fmt skip when possible
      
      * Add # fmt skip when possible
      
      * Fix
      
      * More `  # fmt: skip` usage
      
      * More `  # fmt: skip` usage
      
      * More `  # fmt: skip` usage
      
      * NIts
      
      * more fixes
      
      * fix tapas
      
      * Another way to skip
      
      * Recommended way
      
      * Fix two more fiels
      
      * Remove asynch
      Remove asynch
      
      ---------
      Co-authored-by: default avatarcharliermarsh <charlie.r.marsh@gmail.com>
      651408a0
    • Patrick von Platen's avatar
      Revert "add attention_mask and position_ids in assisted model" (#27523) · 5603fad2
      Patrick von Platen authored
      * Revert "add attention_mask and position_ids in assisted model (#26892)"
      
      This reverts commit 184f60dc.
      
      * more debug
      5603fad2
  29. 09 Nov, 2023 1 commit
  30. 01 Nov, 2023 3 commits
  31. 31 Oct, 2023 1 commit
    • Hz, Ji's avatar
      device agnostic models testing (#27146) · 50378cbf
      Hz, Ji authored
      * device agnostic models testing
      
      * add decorator `require_torch_fp16`
      
      * make style
      
      * apply review suggestion
      
      * Oops, the fp16 decorator was misused
      50378cbf
  32. 30 Oct, 2023 1 commit
  33. 11 Oct, 2023 1 commit
  34. 15 Sep, 2023 1 commit