1. 06 Feb, 2024 2 commits
  2. 05 Feb, 2024 1 commit
  3. 02 Feb, 2024 1 commit
    • Klaus Hipp's avatar
      [Docs] Fix spelling and grammar mistakes (#28825) · 721ee783
      Klaus Hipp authored
      * Fix typos and grammar mistakes in docs and examples
      
      * Fix typos in docstrings and comments
      
      * Fix spelling of `tokenizer` in model tests
      
      * Remove erroneous spaces in decorators
      
      * Remove extra spaces in Markdown link texts
      721ee783
  4. 01 Feb, 2024 1 commit
  5. 30 Jan, 2024 1 commit
    • Matt's avatar
      Add tf_keras imports to prepare for Keras 3 (#28588) · 415e9a09
      Matt authored
      * Port core files + ESM (because ESM code is odd)
      
      * Search-replace in modelling code
      
      * Fix up transfo_xl as well
      
      * Fix other core files + tests (still need to add correct import to tests)
      
      * Fix cookiecutter
      
      * make fixup, fix imports in some more core files
      
      * Auto-add imports to tests
      
      * Cleanup, add imports to sagemaker tests
      
      * Use correct exception for importing tf_keras
      
      * Fixes in modeling_tf_utils
      
      * make fixup
      
      * Correct version parsing code
      
      * Ensure the pipeline tests correctly revert to float32 after each test
      
      * Ensure the pipeline tests correctly revert to float32 after each test
      
      * More tf.keras -> keras
      
      * Add dtype cast
      
      * Better imports of tf_keras
      
      * Add a cast for tf.assign, just in case
      
      * Fix callback imports
      415e9a09
  6. 29 Jan, 2024 1 commit
  7. 15 Jan, 2024 1 commit
  8. 12 Jan, 2024 1 commit
  9. 04 Jan, 2024 1 commit
  10. 13 Dec, 2023 1 commit
  11. 11 Dec, 2023 1 commit
  12. 08 Dec, 2023 1 commit
    • fxmarty's avatar
      F.scaled_dot_product_attention support (#26572) · 80377eb0
      fxmarty authored
      
      
      * add sdpa
      
      * wip
      
      * cleaning
      
      * add ref
      
      * yet more cleaning
      
      * and more :)
      
      * wip llama
      
      * working llama
      
      * add output_attentions=True support
      
      * bigcode sdpa support
      
      * fixes
      
      * gpt-bigcode support, require torch>=2.1.1
      
      * add falcon support
      
      * fix conflicts falcon
      
      * style
      
      * fix attention_mask definition
      
      * remove output_attentions from attnmaskconverter
      
      * support whisper without removing any Copied from statement
      
      * fix mbart default to eager renaming
      
      * fix typo in falcon
      
      * fix is_causal in SDPA
      
      * check is_flash_attn_2_available in the models init as well in case the model is not initialized through from_pretrained
      
      * add warnings when falling back on the manual implementation
      
      * precise doc
      
      * wip replace _flash_attn_enabled by config.attn_implementation
      
      * fix typo
      
      * add tests
      
      * style
      
      * add a copy.deepcopy on the config in from_pretrained, as we do not want to modify it inplace
      
      * obey to config.attn_implementation if a config is passed in from_pretrained
      
      * fix is_torch_sdpa_available when torch is not installed
      
      * remove dead code
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/models/bart/modeling_bart.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * remove duplicate pretraining_tp code
      
      * add dropout in llama
      
      * precise comment on attn_mask
      
      * add fmt: off for _unmask_unattended docstring
      
      * precise num_masks comment
      
      * nuke pretraining_tp in LlamaSDPAAttention following Arthur's suggestion
      
      * cleanup modeling_utils
      
      * backward compatibility
      
      * fix style as requested
      
      * style
      
      * improve documentation
      
      * test pass
      
      * style
      
      * add _unmask_unattended tests
      
      * skip meaningless tests for idefics
      
      * hard_check SDPA requirements when specifically requested
      
      * standardize the use if XXX_ATTENTION_CLASSES
      
      * fix SDPA bug with mem-efficient backend on CUDA when using fp32
      
      * fix test
      
      * rely on SDPA is_causal parameter to handle the causal mask in some cases
      
      * fix FALCON_ATTENTION_CLASSES
      
      * remove _flash_attn_2_enabled occurences
      
      * fix test
      
      * add OPT to the list of supported flash models
      
      * improve test
      
      * properly test on different SDPA backends, on different dtypes & properly handle separately the pad tokens in the test
      
      * remove remaining _flash_attn_2_enabled occurence
      
      * Update src/transformers/modeling_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update docs/source/en/perf_infer_gpu_one.md
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * remove use_attn_implementation
      
      * fix docstring & slight bug
      
      * make attn_implementation internal (_attn_implementation)
      
      * typos
      
      * fix tests
      
      * deprecate use_flash_attention_2=True
      
      * fix test
      
      * add back llama that was removed by mistake
      
      * fix tests
      
      * remove _flash_attn_2_enabled occurences bis
      
      * add check & test that passed attn_implementation is valid
      
      * fix falcon torchscript export
      
      * fix device of mask in tests
      
      * add tip about torch.jit.trace and move bt doc below sdpa
      
      * fix parameterized.expand order
      
      * move tests from test_modeling_attn_mask_utils to test_modeling_utils as a relevant test class is already there
      
      * update sdpaattention class with the new cache
      
      * Update src/transformers/configuration_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/models/bark/modeling_bark.py
      
      * address review comments
      
      * WIP torch.jit.trace fix. left: test both eager & sdpa
      
      * add test for torch.jit.trace for both eager/sdpa
      
      * fix falcon with torch==2.0 that needs to use sdpa
      
      * fix doc
      
      * hopefully last fix
      
      * fix key_value_length that has no default now in mask converter
      
      * is it flacky?
      
      * fix speculative decoding bug
      
      * tests do pass
      
      * fix following #27907
      
      ---------
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      80377eb0
  13. 07 Dec, 2023 1 commit
  14. 06 Dec, 2023 1 commit
  15. 04 Dec, 2023 1 commit
  16. 28 Nov, 2023 1 commit
  17. 27 Nov, 2023 2 commits
  18. 20 Nov, 2023 1 commit
  19. 17 Nov, 2023 1 commit
  20. 15 Nov, 2023 1 commit
  21. 13 Nov, 2023 1 commit
  22. 09 Nov, 2023 1 commit
  23. 06 Nov, 2023 1 commit
  24. 30 Oct, 2023 1 commit
  25. 18 Oct, 2023 1 commit
  26. 17 Oct, 2023 1 commit
  27. 11 Oct, 2023 1 commit
  28. 20 Jun, 2023 1 commit
  29. 09 May, 2023 1 commit
  30. 04 Apr, 2023 1 commit
    • Shubhamai's avatar
      Flax Regnet (#21867) · 90067748
      Shubhamai authored
      * initial commit
      
      * review changes
      
      * post model PR merge
      
      * updating doc
      90067748
  31. 24 Mar, 2023 1 commit
    • Shubhamai's avatar
      Resnet flax (#21472) · a0cbbba3
      Shubhamai authored
      
      
      * [WIP] flax resnet
      
      * added pretrained flax models, results reproducible
      
      * Added pretrained flax models, results reproducible
      
      * working on tests
      
      * no real code change, just some comments
      
      * [flax] adding support for batch norm layers
      
      * fixing bugs related to pt+flax integration
      
      * removing loss from modeling flax output class
      
      * fixing classifier tests
      
      * fixing comments, model output
      
      * cleaning comments
      
      * review changes
      
      * review changes
      
      * Apply suggestions from code review
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * renaming Flax to PyTorch
      
      ---------
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      a0cbbba3
  32. 14 Mar, 2023 1 commit
  33. 20 Feb, 2023 1 commit
  34. 23 Jan, 2023 1 commit
  35. 19 Jan, 2023 1 commit
  36. 18 Jan, 2023 1 commit