1. 08 Apr, 2024 1 commit
  2. 29 Mar, 2024 1 commit
  3. 28 Mar, 2024 1 commit
  4. 08 Dec, 2023 1 commit
    • fxmarty's avatar
      F.scaled_dot_product_attention support (#26572) · 80377eb0
      fxmarty authored
      
      
      * add sdpa
      
      * wip
      
      * cleaning
      
      * add ref
      
      * yet more cleaning
      
      * and more :)
      
      * wip llama
      
      * working llama
      
      * add output_attentions=True support
      
      * bigcode sdpa support
      
      * fixes
      
      * gpt-bigcode support, require torch>=2.1.1
      
      * add falcon support
      
      * fix conflicts falcon
      
      * style
      
      * fix attention_mask definition
      
      * remove output_attentions from attnmaskconverter
      
      * support whisper without removing any Copied from statement
      
      * fix mbart default to eager renaming
      
      * fix typo in falcon
      
      * fix is_causal in SDPA
      
      * check is_flash_attn_2_available in the models init as well in case the model is not initialized through from_pretrained
      
      * add warnings when falling back on the manual implementation
      
      * precise doc
      
      * wip replace _flash_attn_enabled by config.attn_implementation
      
      * fix typo
      
      * add tests
      
      * style
      
      * add a copy.deepcopy on the config in from_pretrained, as we do not want to modify it inplace
      
      * obey to config.attn_implementation if a config is passed in from_pretrained
      
      * fix is_torch_sdpa_available when torch is not installed
      
      * remove dead code
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/models/bart/modeling_bart.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * remove duplicate pretraining_tp code
      
      * add dropout in llama
      
      * precise comment on attn_mask
      
      * add fmt: off for _unmask_unattended docstring
      
      * precise num_masks comment
      
      * nuke pretraining_tp in LlamaSDPAAttention following Arthur's suggestion
      
      * cleanup modeling_utils
      
      * backward compatibility
      
      * fix style as requested
      
      * style
      
      * improve documentation
      
      * test pass
      
      * style
      
      * add _unmask_unattended tests
      
      * skip meaningless tests for idefics
      
      * hard_check SDPA requirements when specifically requested
      
      * standardize the use if XXX_ATTENTION_CLASSES
      
      * fix SDPA bug with mem-efficient backend on CUDA when using fp32
      
      * fix test
      
      * rely on SDPA is_causal parameter to handle the causal mask in some cases
      
      * fix FALCON_ATTENTION_CLASSES
      
      * remove _flash_attn_2_enabled occurences
      
      * fix test
      
      * add OPT to the list of supported flash models
      
      * improve test
      
      * properly test on different SDPA backends, on different dtypes & properly handle separately the pad tokens in the test
      
      * remove remaining _flash_attn_2_enabled occurence
      
      * Update src/transformers/modeling_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update docs/source/en/perf_infer_gpu_one.md
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * remove use_attn_implementation
      
      * fix docstring & slight bug
      
      * make attn_implementation internal (_attn_implementation)
      
      * typos
      
      * fix tests
      
      * deprecate use_flash_attention_2=True
      
      * fix test
      
      * add back llama that was removed by mistake
      
      * fix tests
      
      * remove _flash_attn_2_enabled occurences bis
      
      * add check & test that passed attn_implementation is valid
      
      * fix falcon torchscript export
      
      * fix device of mask in tests
      
      * add tip about torch.jit.trace and move bt doc below sdpa
      
      * fix parameterized.expand order
      
      * move tests from test_modeling_attn_mask_utils to test_modeling_utils as a relevant test class is already there
      
      * update sdpaattention class with the new cache
      
      * Update src/transformers/configuration_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/models/bark/modeling_bark.py
      
      * address review comments
      
      * WIP torch.jit.trace fix. left: test both eager & sdpa
      
      * add test for torch.jit.trace for both eager/sdpa
      
      * fix falcon with torch==2.0 that needs to use sdpa
      
      * fix doc
      
      * hopefully last fix
      
      * fix key_value_length that has no default now in mask converter
      
      * is it flacky?
      
      * fix speculative decoding bug
      
      * tests do pass
      
      * fix following #27907
      
      ---------
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      80377eb0
  5. 24 Nov, 2023 1 commit
  6. 03 Nov, 2023 1 commit
    • Tom Aarsen's avatar
      Refactor: Use Llama RoPE implementation for Falcon (#26933) · 05ea7b79
      Tom Aarsen authored
      * Use Llama RoPE implementation for Falcon
      
      + Add copy functionalities
      
      * Use standard cache format for Falcon
      
      * Simplify apply_rotary_pos_emb, copy from Llama
      
      * Remove unnecessary cache conversion test
      
      We don't need to convert any caches anymore!
      
      * Resolve copy complaint
      05ea7b79
  7. 13 Oct, 2023 1 commit
  8. 02 Oct, 2023 1 commit
    • Lysandre Debut's avatar
      Revert falcon exception (#26472) · 67239f73
      Lysandre Debut authored
      * Revert "Falcon: fix revision propagation (#26006)"
      
      This reverts commit 118c676ef3124423e5d062b665f05cde55bc9a90.
      
      * Revert "Put Falcon back (#25960)"
      
      This reverts commit 22a69f1d.
      67239f73
  9. 13 Sep, 2023 1 commit
  10. 04 Sep, 2023 1 commit
  11. 01 Sep, 2023 1 commit
  12. 02 Aug, 2023 1 commit
  13. 31 Jul, 2023 1 commit
  14. 11 Jul, 2023 1 commit
    • Matt's avatar
      Falcon port (#24523) · b3ab3fac
      Matt authored
      
      
      * Initial commit
      
      * Update src/transformers/models/falcon/configuration_falcon.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/models/falcon/configuration_falcon.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Cleanup config docstring
      
      * Update src/transformers/models/falcon/configuration_falcon.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Convert to relative imports
      
      * Remove torch < 1.8 warning
      
      * Restructure cos_sin header
      
      * qkv -> query, key, value
      
      * Refactor attention calculation
      
      * Add a couple of config variables to account for the different checkpoints
      
      * Successful merging of the code paths!
      
      * Fix misplaced line in the non-parallel attention path
      
      * Update config and tests
      
      * Add a pad_token_id when testing
      
      * Support output_attentions when alibi is None
      
      * make fixup
      
      * Skip KV cache shape test
      
      * No more _keys_to_ignore_on_load_missing
      
      * Simplify self attention a bit
      
      * Simplify self attention a bit
      
      * make fixup
      
      * stash commit
      
      * Some more attention mask updates
      
      * Should pass all tests except assisted generation!
      
      * Add big model generation test
      
      * make fixup
      
      * Add temporary workaround for test
      
      * Test overrides for assisted generation
      
      * Update src/transformers/models/falcon/modeling_falcon.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/models/falcon/modeling_falcon.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/models/falcon/modeling_falcon.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update tests/models/falcon/test_modeling_falcon.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Test overrides for assisted generation
      
      * Add generation demo
      
      * Update copyright
      
      * Make the docstring model actually small
      
      * Add module-level docstring
      
      * Remove all assertions
      
      * Add copied from bloom
      
      * Reformat the QKV layer
      
      * Add copied from bloom
      
      * Update src/transformers/models/falcon/modeling_falcon.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Remove unused line and reformat
      
      * No single letter variables
      
      * Cleanup return names
      
      * Add copied from line
      
      * Remove the deprecated arguments blocks
      
      * Change the embeddings test to an alibi on/off test
      
      * Remove position_ids from FalconForQA
      
      * Remove old check for token type IDs
      
      * Fix the alibi path when multi_query is False
      
      * Update src/transformers/models/falcon/modeling_falcon.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update src/transformers/models/falcon/modeling_falcon.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update tests/models/falcon/test_modeling_falcon.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update config naming
      
      * Fix typo for new_decoder_architecture
      
      * Add some comments
      
      * Fix docstring
      
      * Fix docstring
      
      * Create range in the right dtype from the start
      
      * Review comment cleanup
      
      * n_head_kv -> num_kv_heads
      
      * self.alibi -> self.use_alibi
      
      * self.num_kv -> self.num_kv_heads
      
      * Reorder config args
      
      * Made alibi arguments Optional
      
      * Add all model docstrings
      
      * Add extra checkpoints
      
      * Add author info for Falcon
      
      * Stop removing token_type_ids because our checkpoints shouldn't return it anymore
      
      * Add one hopeful comment for the future
      
      * Fix typo
      
      * Update tests, fix cache issue for generation
      
      * Use -1e9 instead of -inf to avoid float overflow
      
      * Recompute the rotary embeddings much less often
      
      * Re-enable disabled tests
      
      * One final fix to attention mask calculation, and update tests
      
      * Cleanup targeting falcon-40b equivalency
      
      * Post-rebase docs update
      
      * Update docstrings, especially in the config
      
      * More descriptive variable names, and comments where we can't rename them
      
      ---------
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      b3ab3fac