1. 15 Jan, 2024 1 commit
  2. 12 Jan, 2024 1 commit
  3. 17 Dec, 2023 1 commit
  4. 15 Dec, 2023 1 commit
  5. 08 Dec, 2023 2 commits
    • fxmarty's avatar
      F.scaled_dot_product_attention support (#26572) · 80377eb0
      fxmarty authored
      
      
      * add sdpa
      
      * wip
      
      * cleaning
      
      * add ref
      
      * yet more cleaning
      
      * and more :)
      
      * wip llama
      
      * working llama
      
      * add output_attentions=True support
      
      * bigcode sdpa support
      
      * fixes
      
      * gpt-bigcode support, require torch>=2.1.1
      
      * add falcon support
      
      * fix conflicts falcon
      
      * style
      
      * fix attention_mask definition
      
      * remove output_attentions from attnmaskconverter
      
      * support whisper without removing any Copied from statement
      
      * fix mbart default to eager renaming
      
      * fix typo in falcon
      
      * fix is_causal in SDPA
      
      * check is_flash_attn_2_available in the models init as well in case the model is not initialized through from_pretrained
      
      * add warnings when falling back on the manual implementation
      
      * precise doc
      
      * wip replace _flash_attn_enabled by config.attn_implementation
      
      * fix typo
      
      * add tests
      
      * style
      
      * add a copy.deepcopy on the config in from_pretrained, as we do not want to modify it inplace
      
      * obey to config.attn_implementation if a config is passed in from_pretrained
      
      * fix is_torch_sdpa_available when torch is not installed
      
      * remove dead code
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/models/bart/modeling_bart.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * remove duplicate pretraining_tp code
      
      * add dropout in llama
      
      * precise comment on attn_mask
      
      * add fmt: off for _unmask_unattended docstring
      
      * precise num_masks comment
      
      * nuke pretraining_tp in LlamaSDPAAttention following Arthur's suggestion
      
      * cleanup modeling_utils
      
      * backward compatibility
      
      * fix style as requested
      
      * style
      
      * improve documentation
      
      * test pass
      
      * style
      
      * add _unmask_unattended tests
      
      * skip meaningless tests for idefics
      
      * hard_check SDPA requirements when specifically requested
      
      * standardize the use if XXX_ATTENTION_CLASSES
      
      * fix SDPA bug with mem-efficient backend on CUDA when using fp32
      
      * fix test
      
      * rely on SDPA is_causal parameter to handle the causal mask in some cases
      
      * fix FALCON_ATTENTION_CLASSES
      
      * remove _flash_attn_2_enabled occurences
      
      * fix test
      
      * add OPT to the list of supported flash models
      
      * improve test
      
      * properly test on different SDPA backends, on different dtypes & properly handle separately the pad tokens in the test
      
      * remove remaining _flash_attn_2_enabled occurence
      
      * Update src/transformers/modeling_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update docs/source/en/perf_infer_gpu_one.md
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * remove use_attn_implementation
      
      * fix docstring & slight bug
      
      * make attn_implementation internal (_attn_implementation)
      
      * typos
      
      * fix tests
      
      * deprecate use_flash_attention_2=True
      
      * fix test
      
      * add back llama that was removed by mistake
      
      * fix tests
      
      * remove _flash_attn_2_enabled occurences bis
      
      * add check & test that passed attn_implementation is valid
      
      * fix falcon torchscript export
      
      * fix device of mask in tests
      
      * add tip about torch.jit.trace and move bt doc below sdpa
      
      * fix parameterized.expand order
      
      * move tests from test_modeling_attn_mask_utils to test_modeling_utils as a relevant test class is already there
      
      * update sdpaattention class with the new cache
      
      * Update src/transformers/configuration_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/models/bark/modeling_bark.py
      
      * address review comments
      
      * WIP torch.jit.trace fix. left: test both eager & sdpa
      
      * add test for torch.jit.trace for both eager/sdpa
      
      * fix falcon with torch==2.0 that needs to use sdpa
      
      * fix doc
      
      * hopefully last fix
      
      * fix key_value_length that has no default now in mask converter
      
      * is it flacky?
      
      * fix speculative decoding bug
      
      * tests do pass
      
      * fix following #27907
      
      ---------
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      80377eb0
    • fxmarty's avatar
      [️ removed a default argument] Make `AttentionMaskConverter` compatible with... · 307a7d0b
      fxmarty authored
      [️ removed a default argument] Make `AttentionMaskConverter` compatible with `torch.compile(..., fullgraph=True)` (#27868)
      
      * remove bugged torch.float32 default
      
      * add test
      
      * fix tests
      
      * fix test
      
      * fix doc
      307a7d0b
  6. 04 Dec, 2023 1 commit
  7. 01 Dec, 2023 1 commit
  8. 15 Nov, 2023 1 commit
  9. 13 Nov, 2023 1 commit
  10. 10 Nov, 2023 1 commit
  11. 31 Oct, 2023 2 commits
  12. 27 Oct, 2023 1 commit
  13. 26 Oct, 2023 1 commit
  14. 04 Oct, 2023 1 commit
  15. 08 Sep, 2023 1 commit
  16. 10 Aug, 2023 1 commit
  17. 30 Jun, 2023 1 commit
    • JB (Don)'s avatar
      Show a warning for missing attention masks when pad_token_id is not None (#24510) · 78a2b19f
      JB (Don) authored
      
      
      * Adding warning messages to BERT for missing attention masks
      
      These warning messages when there are pad tokens within the input ids and
      no attention masks are given. The warning message should only show up once.
      
      * Adding warning messages to BERT for missing attention masks
      
      These warning messages are shown when the pad_token_id is not None
      and no attention masks are given. The warning message should only
      show up once.
      
      * Ran fix copies to copy over the changes to some of the other models
      
      * Add logger.warning_once.cache_clear() to the test
      
      * Shows warning when there are no attention masks and input_ids start/end with pad tokens
      
      * Using warning_once() instead and fix indexing in input_ids check
      
      ---------
      Co-authored-by: default avatarJB Lau <hckyn@voyager2.local>
      78a2b19f
  18. 27 Jun, 2023 1 commit
    • Sylvain Gugger's avatar
      Clean load keys (#24505) · 8e5d1619
      Sylvain Gugger authored
      * Preliminary work on some models
      
      * Fix test load missing and make sure nonpersistent buffers are tested
      
      * Always ignore nonpersistent buffers if in state_dict
      
      * Treat models
      
      * More models
      
      * Treat remaining models
      
      * Fix quality
      
      * Fix tests
      
      * Remove draft
      
      * This test is not needed anymore
      
      * Fix copies
      
      * Fix last test
      
      * Newly added models
      
      * Fix last tests
      
      * Address review comments
      8e5d1619
  19. 16 Jun, 2023 1 commit
    • Sylvain Gugger's avatar
      Tied weights load (#24310) · 096f2cf1
      Sylvain Gugger authored
      * Use tied weight keys
      
      * More
      
      * Fix tied weight missing warning
      
      * Only give info on unexpected keys with different classes
      
      * Deal with empty archs
      
      * Fix tests
      
      * Refine test
      096f2cf1
  20. 15 Jun, 2023 1 commit