1. 15 Mar, 2024 1 commit
  2. 16 Feb, 2024 1 commit
  3. 16 Jan, 2024 1 commit
  4. 08 Dec, 2023 1 commit
    • fxmarty's avatar
      F.scaled_dot_product_attention support (#26572) · 80377eb0
      fxmarty authored
      
      
      * add sdpa
      
      * wip
      
      * cleaning
      
      * add ref
      
      * yet more cleaning
      
      * and more :)
      
      * wip llama
      
      * working llama
      
      * add output_attentions=True support
      
      * bigcode sdpa support
      
      * fixes
      
      * gpt-bigcode support, require torch>=2.1.1
      
      * add falcon support
      
      * fix conflicts falcon
      
      * style
      
      * fix attention_mask definition
      
      * remove output_attentions from attnmaskconverter
      
      * support whisper without removing any Copied from statement
      
      * fix mbart default to eager renaming
      
      * fix typo in falcon
      
      * fix is_causal in SDPA
      
      * check is_flash_attn_2_available in the models init as well in case the model is not initialized through from_pretrained
      
      * add warnings when falling back on the manual implementation
      
      * precise doc
      
      * wip replace _flash_attn_enabled by config.attn_implementation
      
      * fix typo
      
      * add tests
      
      * style
      
      * add a copy.deepcopy on the config in from_pretrained, as we do not want to modify it inplace
      
      * obey to config.attn_implementation if a config is passed in from_pretrained
      
      * fix is_torch_sdpa_available when torch is not installed
      
      * remove dead code
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/models/bart/modeling_bart.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * remove duplicate pretraining_tp code
      
      * add dropout in llama
      
      * precise comment on attn_mask
      
      * add fmt: off for _unmask_unattended docstring
      
      * precise num_masks comment
      
      * nuke pretraining_tp in LlamaSDPAAttention following Arthur's suggestion
      
      * cleanup modeling_utils
      
      * backward compatibility
      
      * fix style as requested
      
      * style
      
      * improve documentation
      
      * test pass
      
      * style
      
      * add _unmask_unattended tests
      
      * skip meaningless tests for idefics
      
      * hard_check SDPA requirements when specifically requested
      
      * standardize the use if XXX_ATTENTION_CLASSES
      
      * fix SDPA bug with mem-efficient backend on CUDA when using fp32
      
      * fix test
      
      * rely on SDPA is_causal parameter to handle the causal mask in some cases
      
      * fix FALCON_ATTENTION_CLASSES
      
      * remove _flash_attn_2_enabled occurences
      
      * fix test
      
      * add OPT to the list of supported flash models
      
      * improve test
      
      * properly test on different SDPA backends, on different dtypes & properly handle separately the pad tokens in the test
      
      * remove remaining _flash_attn_2_enabled occurence
      
      * Update src/transformers/modeling_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/modeling_attn_mask_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update docs/source/en/perf_infer_gpu_one.md
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * remove use_attn_implementation
      
      * fix docstring & slight bug
      
      * make attn_implementation internal (_attn_implementation)
      
      * typos
      
      * fix tests
      
      * deprecate use_flash_attention_2=True
      
      * fix test
      
      * add back llama that was removed by mistake
      
      * fix tests
      
      * remove _flash_attn_2_enabled occurences bis
      
      * add check & test that passed attn_implementation is valid
      
      * fix falcon torchscript export
      
      * fix device of mask in tests
      
      * add tip about torch.jit.trace and move bt doc below sdpa
      
      * fix parameterized.expand order
      
      * move tests from test_modeling_attn_mask_utils to test_modeling_utils as a relevant test class is already there
      
      * update sdpaattention class with the new cache
      
      * Update src/transformers/configuration_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/models/bark/modeling_bark.py
      
      * address review comments
      
      * WIP torch.jit.trace fix. left: test both eager & sdpa
      
      * add test for torch.jit.trace for both eager/sdpa
      
      * fix falcon with torch==2.0 that needs to use sdpa
      
      * fix doc
      
      * hopefully last fix
      
      * fix key_value_length that has no default now in mask converter
      
      * is it flacky?
      
      * fix speculative decoding bug
      
      * tests do pass
      
      * fix following #27907
      
      ---------
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      80377eb0
  5. 13 Nov, 2023 1 commit
  6. 04 Aug, 2023 1 commit
  7. 26 Jul, 2023 1 commit
  8. 15 Jun, 2023 1 commit
  9. 09 May, 2023 1 commit
    • Sylvain Gugger's avatar
      Add RWKV-4 (#22797) · b4d4d6fe
      Sylvain Gugger authored
      
      
      * First draft of RWKV-4
      
      * Add support for generate
      
      * Style post-rebase
      
      * Properly use state
      
      * Write doc
      
      * Fix doc
      
      * More math
      
      * Add model to README, dummies and clean config
      
      * Fix init
      
      * multiple fixes:
      
      - fix common tests
      - fix configuraion default values
      - add CI test for checking state computation
      - fix some CI tests
      
      * correct tokenizer
      
      * some tweaks
      
      - fix config docstring
      - fix failing tests
      
      * fix CI tests
      
      - add output_attention / output_hidden_states
      - override test_initialization
      - fix failing CIs
      
      * fix conversion script
      
      - fix sharded case
      - add new arguments
      
      * add slow tests + more fixes on conversion script
      
      * add another test
      
      * final fixes
      
      * change single name variable
      
      * add mock attention mask for pipeline to work
      
      * correct eos token id
      
      * fix nits
      
      * add checkpoints
      
      * Apply suggestions from code review
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * add `tie_word_embeddings` in docstring
      
      * change tensor name
      
      * fix final nits
      
      * Trigger CI
      
      ---------
      Co-authored-by: default avataryounesbelkada <younesbelkada@gmail.com>
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      b4d4d6fe
  10. 24 Apr, 2023 1 commit
  11. 09 Mar, 2023 1 commit
  12. 06 Feb, 2023 1 commit
    • Sylvain Gugger's avatar
      Update quality tooling for formatting (#21480) · 6f79d264
      Sylvain Gugger authored
      * Result of black 23.1
      
      * Update target to Python 3.7
      
      * Switch flake8 to ruff
      
      * Configure isort
      
      * Configure isort
      
      * Apply isort with line limit
      
      * Put the right black version
      
      * adapt black in check copies
      
      * Fix copies
      6f79d264
  13. 05 Oct, 2022 1 commit
    • Arthur's avatar
      Add WhisperModel to transformers (#19166) · 45e14038
      Arthur authored
      
      
      * simplify loop
      
      * add featur extractor
      
      * add model
      
      * start conversion
      
      * add dropout
      
      * initial commit of test files
      
      * copnversion for all models
      
      * update processor for correct padding
      
      * update feature extraction
      
      * update integration test logits match
      
      * fmnt: off for the logits
      
      * on the fly mel bank
      
      * small nit
      
      * update test
      
      * update tokenizer
      
      * nit feature extraction
      
      * update
      
      * update tokenizer test
      
      * adds logit processor and update tokenizer to get supress tokens
      
      * style
      
      * clean convert
      
      * revert to original modeling tf utils
      
      * Update
      
      * update
      
      * nit
      
      * clean convert file
      
      * update tests and nits
      
      * quality
      
      * slow generation test
      
      * ffn_dim to allow customization
      
      * update readme
      
      * add to toctreee
      
      * start fixing integration tests
      
      * update tests and code
      
      * fix feature extractor
      
      * fix config tests common
      
      * update code to fix tests
      
      * fix feature exctractor
      
      * nit feature extraction
      
      * update test for new feature extractor
      
      * style
      
      * add absrtact
      
      * large logits wioth custom decoder input ids
      
      * wraap around is otrch available
      
      * fix feature extractor
      
      * correct logits for whisper small.en
      
      * nit
      
      * fix encoder_attentino_mask
      
      * some fixes
      
      * remove unnecessary inputs
      
      * nits
      
      * add normalizer file
      
      * update etst tokenization
      
      * fix attention mask not defined
      
      * Add model to README
      
      * Fix doc tests
      
      * fix generate
      
      * remove uncoder attention mask useless
      
      * update test modeling whisper
      
      * update condfig to add second non supress tokens
      
      * nits on feature exrtactor
      
      * nit for test tokenizers
      
      * update etsts
      
      * update tests
      
      * update tokenization test
      
      * fixup
      
      * invalidated hf token. Clean convert openai to whisper
      
      * fix logit tests
      
      * fixup
      
      * clean merge
      
      * revert toc_tree changes
      
      * remove useless LogitProcessor
      
      * Update whisper .mdx
      
      * update config file doc
      
      * update configuration docstring
      
      * update test tokenization
      
      * update test tokenization
      
      * update tokenization whisper
      Added copied from where needed
      
      * update feature extraction
      
      * nit test name
      
      * style
      
      * quality
      
      * remove get suppress tokens and update non_speech tokens global variables
      
      * Update src/transformers/models/whisper/feature_extraction_whisper.py
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * clean modeling whisper and test
      Removed the attention mask arguments that are deprecated
      
      * fix large test
      
      * Add multilingual audio test, and translate test
      
      * style
      
      * fix larg multilingual test
      
      * nits
      
      * Update docs/source/en/model_doc/whisper.mdx
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * add copied from for attention layer
      
      * remove attention masks in doc
      
      * add english normalizer
      
      * update tokenization test
      
      * remove copied from in whisper attention : no bias in k_proj only
      
      * wrap around dependencies in english normalizer
      
      * style
      
      * correct import generation logits
      
      * for now, wrap feature extractor with torch
      
      * Update src/transformers/models/whisper/convert_openai_whisper_to_tfms.py
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      
      * Update src/transformers/models/whisper/configuration_whisper.py
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      
      * Update docs/source/en/model_doc/whisper.mdx
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      
      * remove torch depencies for feature extraction and style
      
      * fixup
      
      * nit
      
      * update logitds
      
      * style
      
      * nit
      
      * nits and fix final tests
      
      * add `is_more_itertools_available` to utils
      
      * quality
      
      * add begin supress tokens, supress tokens to generate args and config
      
      * clean supressTokensLogitProcessor in generation logits
      
      * Nit naming
      
      * add supressTokensAtBegin
      
      * udpate tests, supress tokens to None or correct values
      
      * nit and style
      
      * update RAG to fit test and generate_logit
      
      * add copy pasted statment on english normalizer
      
      * add arguments to config_common_kwargs
      
      * Update src/transformers/generation_utils.py
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      
      * Update src/transformers/generation_logits_process.py
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      
      * Update src/transformers/models/whisper/configuration_whisper.py
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      
      * revert changes based on reviews
      
      * update doc and nits
      
      * more nits
      
      * last nits
      
      * update test configuration common
      
      * add BART name in decoder attention mask documentation
      
      * Update src/transformers/models/whisper/modeling_whisper.py
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      
      * style
      
      * nit
      
      * nit
      
      * add english.json file to git
      
      * nits on documentation
      
      * nit
      
      * nits
      
      * last styling
      
      * add main toctree file
      
      * remove sentence piece dependency
      
      * clean init file
      
      * fix tokenizer that has no dependencies on sentencepiece
      
      * update whisper init file, nit
      
      * remove english.json file
      
      * add get decoder prompt id
      
      * revert changes and add forced logit processor
      
      * nit
      
      * clean normalizer
      
      * remove protected
      
      * update
      
      * Update src/transformers/models/whisper/configuration_whisper.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * update based on review
      
      * Update src/transformers/models/whisper/configuration_whisper.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * add batched tests
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarNielsRogge <niels.rogge1@gmail.com>
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      45e14038
  14. 16 Sep, 2022 1 commit
  15. 29 Aug, 2022 1 commit
  16. 10 Aug, 2022 1 commit
  17. 05 Aug, 2022 1 commit
    • Sylvain Gugger's avatar
      Use new huggingface_hub tools for download models (#18438) · 5cd40323
      Sylvain Gugger authored
      * Draft new cached_file
      
      * Initial draft for config and model
      
      * Small fixes
      
      * Fix first batch of tests
      
      * Look in cache when internet is down
      
      * Fix last tests
      
      * Bad black, not fixing all quality errors
      
      * Make diff less
      
      * Implement change for TF and Flax models
      
      * Add tokenizer and feature extractor
      
      * For compatibility with main
      
      * Add utils to move the cache and auto-do it at first use.
      
      * Quality
      
      * Deal with empty commit shas
      
      * Deal with empty etag
      
      * Address review comments
      5cd40323
  18. 01 Aug, 2022 1 commit
  19. 19 Jul, 2022 1 commit
  20. 01 Jul, 2022 1 commit
    • Matt's avatar
      XLA train step fixes (#17973) · d6cec458
      Matt authored
      * Copy inputs to train and test step before modifying them, as this breaks things
      
      * Add XLA tests, fix our loss functions to be XLA-compatible
      
      * make fixup
      
      * Update loss computation test to expect vector of per-sample losses
      
      * Patch loss for TFLED
      
      * Patch loss for TFAlbert
      
      * Add a tf_legacy_loss config flag that enables old loss functions
      
      * Stop using config.get() because it's not a dict
      
      * Skip loss computation test for RAG because its loss is very strange and I'm afraid to rewrite it
      
      * make fixup
      
      * Add XLA-compatible RAG loss
      
      * Fix dtype of loss mask for TFAlbert
      
      * Fix test for XLNet too because it overrides the default one
      
      * make fixup
      
      * Fix config test
      
      * No more depending on GPU NaN behaviour
      
      * Add test, avoid potential zero division
      
      * Fix test item assignment
      
      * Fix loss computation masking test
      
      * make fixup
      
      * Fix dtype bugs
      d6cec458
  21. 21 Jun, 2022 1 commit
  22. 12 May, 2022 1 commit
  23. 04 May, 2022 1 commit
  24. 23 Mar, 2022 1 commit
  25. 11 Mar, 2022 1 commit
    • Kevin Bondzio's avatar
      Add soft length regulation for sequence generation (#15245) · 9442b3ce
      Kevin Bondzio authored
      
      
      * add possibility to softly regulate length when using sampling method in model.generate() function
      
      * fix test config, fix formatting
      
      * fix rag integration, fix docstyling
      
      * fix wrong docstring
      
      * change param to tuple, add test
      
      * fix old param in rag_model, remove unused import
      
      * change test according to new param
      
      * fix formatting
      
      * fix test case
      
      * fix doc style
      
      * move start_length calculation to Logitprocessor
      
      * add possibility to softly regulate length when using sampling method in model.generate() function
      
      * fix rag integration, fix docstyling
      
      * fix test config, fix formatting
      
      * change param to tuple, add test
      
      * fix old param in rag_model, remove unused import
      
      * add possibility to softly regulate length when using sampling method in model.generate() function
      
      * change param to tuple, add test
      
      * fix old param in rag_model, remove unused import
      
      * remove unused import
      
      * fix small errors
      
      * fix test
      
      * add possibility to softly regulate length when using sampling method in model.generate() function
      
      * fix test config, fix formatting
      
      * fix rag integration, fix docstyling
      
      * change param to tuple, add test
      
      * fix old param in rag_model, remove unused import
      
      * change test according to new param
      
      * fix test case
      
      * move start_length calculation to Logitprocessor
      
      * add possibility to softly regulate length when using sampling method in model.generate() function
      
      * fix rag integration, fix docstyling
      
      * fix test config, fix formatting
      
      * change param to tuple, add test
      
      * fix old param in rag_model, remove unused import
      
      * add possibility to softly regulate length when using sampling method in model.generate() function
      
      * fix test config, fix formatting
      
      * fix rag integration, fix docstyling
      
      * add possibility to softly regulate length when using sampling method in model.generate() function
      
      * fix rag integration, fix docstyling
      
      * change param to tuple, add test
      
      * fix old param in rag_model, remove unused import
      
      * fix small errors
      
      * Update src/transformers/generation_utils.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/generation_utils.py
      
      * Update src/transformers/generation_utils.py
      
      * fix docstring, add type ind model rag
      
      * fix docstrings
      
      * introduce seq_length variable for cleaner code
      
      * fix black formatting
      
      * add input_ids_seq_length to modeling_rag
      
      * add input_ids_seq_length to test
      
      * retrigger checks
      
      * retrigger checks
      Co-authored-by: default avatarKevin Bondzio <kev@AIM-LAP-02.local>
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarKevin Bondzio <kev@AIM-LAP-02.fritz.box>
      9442b3ce
  26. 11 Feb, 2022 1 commit
  27. 09 Feb, 2022 1 commit
  28. 02 Feb, 2022 1 commit
  29. 25 Jan, 2022 1 commit
    • Sylvain Gugger's avatar
      Avoid using get_list_of_files (#15287) · e6954707
      Sylvain Gugger authored
      * Avoid using get_list_of_files in config
      
      * Wip, change tokenizer file getter
      
      * Remove call in tokenizer files
      
      * Remove last call to get_list_model_files
      
      * Better tests
      
      * Unit tests for new function
      
      * Document bad API
      e6954707
  30. 14 Jan, 2022 1 commit
  31. 15 Nov, 2021 1 commit
  32. 08 Nov, 2021 1 commit
  33. 02 Nov, 2021 1 commit
  34. 14 Oct, 2021 1 commit
  35. 06 Sep, 2021 1 commit
    • Nils Reimers's avatar
      Update model configs - Allow setters for common properties (#13026) · c8be8a9a
      Nils Reimers authored
      * refactor GPT Config to allow dyn. properties
      
      * make attribute_map a class attribute
      
      * remove old code
      
      * update unit test to test config: Add test for common properties setter
      
      * update unit test to test config: Add test for common properties passed as parameters to __init__
      
      * update to black code format
      
      * Allow that setters are not defined for certain config classes
      
      * update config classes to implement attribute_map
      
      * bugfix lxmert config - id2labels was not defined when num_labels was set
      
      * update broken configs - add attribute_maps
      
      * update bart config
      
      * update black codestyle
      
      * update documentation on common config attributes
      
      * update GPTJ config to new attribute map
      
      * update docs on common attributes
      
      * gptj config: add max_position_embeddings
      
      * gptj config: format with black
      
      * update speech to text 2 config
      
      * format doc file to max_len 119
      
      * update config template
      c8be8a9a
  36. 23 Jun, 2021 1 commit
  37. 25 May, 2021 1 commit
  38. 26 Apr, 2021 1 commit
  39. 23 Apr, 2021 1 commit
  40. 12 Apr, 2021 1 commit
    • NielsRogge's avatar
      Add DeiT (PyTorch) (#11056) · 9f126097
      NielsRogge authored
      * First draft of deit
      
      * More improvements
      
      * Remove DeiTTokenizerFast from init
      
      * Conversion script works
      
      * Add DeiT to ViT conversion script
      
      * Add tests, add head model, add support for deit in vit conversion script
      
      * Update model checkpoint names
      
      * Update image_mean and image_std, set resample to bicubic
      
      * Improve docs
      
      * Docs improvements
      
      * Add DeiTForImageClassificationWithTeacher to init
      
      * Address comments by @sgugger
      
      * Improve feature extractors
      
      * Make fix-copies
      
      * Minor fixes
      
      * Address comments by @patil-suraj
      
      * All models uploaded
      
      * Fix tests
      
      * Remove labels argument from DeiTForImageClassificationWithTeacher
      
      * Fix-copies, style and quality
      
      * Fix tests
      
      * Fix typo
      
      * Multiple docs improvements
      
      * More docs fixes
      9f126097