1. 02 Jul, 2024 1 commit
  2. 27 Jun, 2024 1 commit
  3. 26 Jun, 2024 1 commit
  4. 12 Jun, 2024 1 commit
  5. 07 Jun, 2024 1 commit
    • Benjamin Badger's avatar
      Extend save_pretrained to offloaded models (#27412) · ff689f57
      Benjamin Badger authored
      
      
      * added hidden subset
      
      * debugged hidden subset contrastive search
      
      * added contrastive search compression
      
      * debugged compressed contrastive search
      
      * memory reduction for contrastive search
      
      * debugged mem red
      
      * added low memory option feature
      
      * debugged mem optmimization output stack
      
      * debugged mem optmimization output stack
      
      * debugged low mem
      
      * added low mem cache
      
      * fixed 2047 tensor view
      
      * debugged 2042 past key val inputs
      
      * reformatted tensors
      
      * changed low mem output
      
      * final clean
      
      * removed subset hidden csearch
      
      * fixed hidden device
      
      * fixed hidden device
      
      * changed compressor dtype
      
      * removed hstate compression
      
      * integrated csearch in generate
      
      * test csearch integration into generation
      
      exit()
      
      * fixed csearch kwarg integration with generation
      
      * final wrap and added doc
      
      * Update src/transformers/generation/utils.py
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      
      * Update src/transformers/generation/utils.py
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      
      * Update src/transformers/generation/utils.py
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      
      * added debug print
      
      * direct hstate cat
      
      * direct hstate cat
      
      * direct hstate cat debug
      
      * direct hstate cat debug
      
      * expanded full hidden state stack
      
      * expanded full hidden state stack
      
      * matched dims for hstates
      
      * matched dims for hstates
      
      * logits fix
      
      * equality test
      
      * equality hidden debug
      
      * debug
      
      * added prints for debug
      
      * added prints for debug
      
      * equality check
      
      * switched squeeze dim
      
      * input format debug
      
      * tracing top_k_ids
      
      * removed trace
      
      * added test context
      
      * added jitter
      
      * added jitter
      
      * added jitter
      
      * returned state
      
      * rebuilt past key value reconstruction
      
      * debugged
      
      * cleaned traces
      
      * added selection for pkv
      
      * changed output to dict
      
      * cleaned
      
      * cleaned
      
      * cleaned up contrastive search test
      
      * moved low_memory kwarg
      
      * debugged
      
      * changed low mem test batch size to 1
      
      * removed output
      
      * debugged test input shape
      
      * reformatted csearch test
      
      * added trace
      
      * removed unsqueeze on final forward pass
      
      * replaced unsqueeze with view
      
      * removed traces
      
      * cleaned
      
      * debugged model kwargs
      
      * removed special models from test
      
      * ran make quality
      
      * Update src/transformers/generation/configuration_utils.py
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      
      * Update src/transformers/generation/configuration_utils.py
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      
      * refactored
      
      * refactored
      
      * refactored
      
      * make fixup
      
      * renamed flag sequential
      
      * renamed flag sequential
      
      * iterative onloading
      
      * black style and test utils
      
      * added traces for integrated test
      
      * debugged
      
      * added traces
      
      * make style
      
      * removed traces, make style
      
      * included suggestions and added test
      
      * debugged test
      
      * added offload module check and make style
      
      * is_accelerate_available and make style
      
      * added test decorator
      
      * changed test model and config spec
      
      * added offload condition
      
      * added lazy loading for each shard
      
      * debugged
      
      * modified sharding
      
      * debugged
      
      * added traces
      
      * removed safe serialization
      
      * no index overload;
      
      * trace on safe save ptrs
      
      * added ptr condition
      
      * debugged
      
      * debugged ptr
      
      * moved module map init
      
      * remake shard only for offloaded modules
      
      * refactored
      
      * debugged
      
      * refactored
      
      * debugged
      
      * cleaned and make style
      
      * cleaned and make style
      
      * added trace
      
      * sparse module map
      
      * debugged
      
      * removed module map conditional
      
      * refactored
      
      * debug
      
      * debugged
      
      * added traces
      
      * added shard mem trace
      
      * added shard mem trace
      
      * removed underlying storage check
      
      * refactored
      
      * memory leak removal and make style
      
      * cleaned
      
      * swapped test decs and make style
      
      * added mem checks and make style
      
      * added free mem warning
      
      * implemented some suggestions
      
      * moved onloading to accelerate
      
      * refactored for accelerate integration
      
      * cleaned test
      
      * make style
      
      * debugged offload map name
      
      * cleaned and make style
      
      * replaced meta device check for sharding
      
      * cleaned and make style
      
      * implemented some suggestions
      
      * more suggestions
      
      * update warning
      Co-authored-by: default avatarMarc Sun <57196510+SunMarc@users.noreply.github.com>
      
      * more suggestions
      
      * make style
      
      * new make style
      
      * Update src/transformers/modeling_utils.py
      Co-authored-by: default avatarMarc Sun <57196510+SunMarc@users.noreply.github.com>
      
      * Update src/transformers/modeling_utils.py
      Co-authored-by: default avatarMarc Sun <57196510+SunMarc@users.noreply.github.com>
      
      * Update src/transformers/modeling_utils.py
      Co-authored-by: default avatarMarc Sun <57196510+SunMarc@users.noreply.github.com>
      
      * Update src/transformers/modeling_utils.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      ---------
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      Co-authored-by: default avatarMarc Sun <57196510+SunMarc@users.noreply.github.com>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      ff689f57
  6. 28 May, 2024 1 commit
  7. 13 May, 2024 1 commit
    • Poedator's avatar
      Llama: fix custom 4D masks, v2 (#30348) · a0779b9e
      Poedator authored
      
      
      * 4d mask fixes
      
      * Update custom 4D mask logic
      
      * test moved to mixin
      
      * extra tests 4d mask
      
      * upd 4d mask and StaticCache handling
      
      * added Mask4DTestHard to mistral tests
      
      * post-rebase fixes
      
      * test fixes for StaticCache
      
      * make fix-copies
      
      * upd 1 after #30476
      
      * fix common tests
      
      * rm elif attention_mask.dim() == 4:
      
      * tests combined, fixed, mixtral supported
      
      * bigbird style chg reverted
      
      * rm if attention_mask.dim() == 2
      
      * modeling_llama formatting chg
      
      ---------
      Co-authored-by: default avatarJoao Gante <joao@huggingface.co>
      a0779b9e
  8. 07 May, 2024 1 commit
  9. 26 Apr, 2024 1 commit
  10. 24 Apr, 2024 1 commit
  11. 19 Apr, 2024 1 commit
  12. 10 Apr, 2024 1 commit
  13. 02 Apr, 2024 1 commit
    • Nicolas Patry's avatar
      Hard error when ignoring tensors. (#27484) (#29906) · 9b0a8ea7
      Nicolas Patry authored
      
      
      * Hard error when ignoring tensors. (#27484)
      
      * [WIP] Hard error when ignoring tensors.
      
      * Better selection/error when saving a checkpoint.
      
      - Find all names we should normally drop (those are in the transformers
        config)
      - Find all disjoint tensors (for those we can safely trigger a copy to
        get rid of the sharing before saving)
      - Clone those disjoint tensors getting rid of the issue
      - Find all identical names (those should be declared in the config
        but we try to find them all anyway.)
      - For all identical names:
        - If they are in the config, just ignore them everything is fine
        - If they are not, warn about them.
      - For all remainder tensors which are shared yet neither identical NOR
        disjoint. raise a hard error.
      
      * Adding a failing test on `main` that passes here.
      
      * We don't need to keep the subfolder logic in this test.
      
      * Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      ---------
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Add small tests.
      
      * Dead variable.
      
      * Fixup.
      
      * Fixing tied_Weights_keys on generic models.
      
      * Fixup + T5 encoder/decoder tying (with different layers)
      
      * Code quality.
      
      * Dynamic member.
      
      * trigger
      
      * Fixing encoder name for other types of encoder/decoder combos.
      
      * Fix scoping.
      
      * Update .github/workflows/self-scheduled.yml
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Fixing the tied_weights after the call.
      
      ---------
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      9b0a8ea7
  14. 27 Mar, 2024 1 commit
  15. 25 Mar, 2024 1 commit
  16. 19 Mar, 2024 1 commit
  17. 15 Mar, 2024 1 commit
  18. 13 Mar, 2024 1 commit
  19. 11 Mar, 2024 1 commit
  20. 08 Mar, 2024 1 commit
  21. 07 Mar, 2024 2 commits
  22. 06 Mar, 2024 1 commit
  23. 05 Mar, 2024 1 commit
  24. 16 Feb, 2024 1 commit
  25. 06 Feb, 2024 1 commit
  26. 05 Feb, 2024 1 commit
    • Nicolas Patry's avatar
      [WIP] Hard error when ignoring tensors. (#27484) · 2da28c4b
      Nicolas Patry authored
      
      
      * [WIP] Hard error when ignoring tensors.
      
      * Better selection/error when saving a checkpoint.
      
      - Find all names we should normally drop (those are in the transformers
        config)
      - Find all disjoint tensors (for those we can safely trigger a copy to
        get rid of the sharing before saving)
      - Clone those disjoint tensors getting rid of the issue
      - Find all identical names (those should be declared in the config
        but we try to find them all anyway.)
      - For all identical names:
        - If they are in the config, just ignore them everything is fine
        - If they are not, warn about them.
      - For all remainder tensors which are shared yet neither identical NOR
        disjoint. raise a hard error.
      
      * Adding a failing test on `main` that passes here.
      
      * We don't need to keep the subfolder logic in this test.
      
      * Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      ---------
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      2da28c4b
  27. 02 Feb, 2024 1 commit
  28. 18 Jan, 2024 1 commit
  29. 16 Jan, 2024 2 commits
  30. 15 Jan, 2024 1 commit
  31. 12 Jan, 2024 1 commit
  32. 17 Dec, 2023 1 commit
  33. 15 Dec, 2023 1 commit
  34. 08 Dec, 2023 2 commits
    • fxmarty's avatar
      F.scaled_dot_product_attention support (#26572) · 80377eb0
      fxmarty authored
      
      
      * add sdpa
      
      * wip
      
      * cleaning
      
      * add ref
      
      * yet more cleaning
      
      * and more :)
      
      * wip llama
      
      * working llama
      
      * add output_attentions=True support
      
      * bigcode sdpa support
      
      * fixes
      
      * gpt-bigcode support, require torch>=2.1.1
      
      * add falcon support
      
      * fix conflicts falcon
      
      * style
      
      * fix attention_mask definition
      
      * remove output_attentions from attnmaskconverter
      
      * support whisper without removing any Copied from statement
      
      * fix mbart default to eager renaming
      
      * fix typo in falcon
      
      * fix is_causal in SDPA
      
      * check is_flash_attn_2_available in the models init as well in case the model is not initialized through from_pretrained
      
      * add warnings when falling back on the manual implementation
      
      * precise doc
      
      * wip replace _flash_attn_enabled by config.attn_implementation
      
      * fix typo
      
      * add tests
      
      * style
      
      * add a copy.deepcopy on the config in from_pretrained, as we do not want to modify it inplace
      
      * obey to config.attn_implementation if a config is passed in from_pretrained
      
      * fix is_torch_sdpa_available when torch is not installed
      
      * remove dead code
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/models/bart/modeling_bart.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * remove duplicate pretraining_tp code
      
      * add dropout in llama
      
      * precise comment on attn_mask
      
      * add fmt: off for _unmask_unattended docstring
      
      * precise num_masks comment
      
      * nuke pretraining_tp in LlamaSDPAAttention following Arthur's suggestion
      
      * cleanup modeling_utils
      
      * backward compatibility
      
      * fix style as requested
      
      * style
      
      * improve documentation
      
      * test pass
      
      * style
      
      * add _unmask_unattended tests
      
      * skip meaningless tests for idefics
      
      * hard_check SDPA requirements when specifically requested
      
      * standardize the use if XXX_ATTENTION_CLASSES
      
      * fix SDPA bug with mem-efficient backend on CUDA when using fp32
      
      * fix test
      
      * rely on SDPA is_causal parameter to handle the causal mask in some cases
      
      * fix FALCON_ATTENTION_CLASSES
      
      * remove _flash_attn_2_enabled occurences
      
      * fix test
      
      * add OPT to the list of supported flash models
      
      * improve test
      
      * properly test on different SDPA backends, on different dtypes & properly handle separately the pad tokens in the test
      
      * remove remaining _flash_attn_2_enabled occurence
      
      * Update src/transformers/modeling_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update docs/source/en/perf_infer_gpu_one.md
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * remove use_attn_implementation
      
      * fix docstring & slight bug
      
      * make attn_implementation internal (_attn_implementation)
      
      * typos
      
      * fix tests
      
      * deprecate use_flash_attention_2=True
      
      * fix test
      
      * add back llama that was removed by mistake
      
      * fix tests
      
      * remove _flash_attn_2_enabled occurences bis
      
      * add check & test that passed attn_implementation is valid
      
      * fix falcon torchscript export
      
      * fix device of mask in tests
      
      * add tip about torch.jit.trace and move bt doc below sdpa
      
      * fix parameterized.expand order
      
      * move tests from test_modeling_attn_mask_utils to test_modeling_utils as a relevant test class is already there
      
      * update sdpaattention class with the new cache
      
      * Update src/transformers/configuration_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/models/bark/modeling_bark.py
      
      * address review comments
      
      * WIP torch.jit.trace fix. left: test both eager & sdpa
      
      * add test for torch.jit.trace for both eager/sdpa
      
      * fix falcon with torch==2.0 that needs to use sdpa
      
      * fix doc
      
      * hopefully last fix
      
      * fix key_value_length that has no default now in mask converter
      
      * is it flacky?
      
      * fix speculative decoding bug
      
      * tests do pass
      
      * fix following #27907
      
      ---------
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      80377eb0
    • fxmarty's avatar
      [️ removed a default argument] Make `AttentionMaskConverter` compatible with... · 307a7d0b
      fxmarty authored
      [️ removed a default argument] Make `AttentionMaskConverter` compatible with `torch.compile(..., fullgraph=True)` (#27868)
      
      * remove bugged torch.float32 default
      
      * add test
      
      * fix tests
      
      * fix test
      
      * fix doc
      307a7d0b
  35. 04 Dec, 2023 1 commit
  36. 01 Dec, 2023 1 commit
  37. 15 Nov, 2023 1 commit