1. 31 Jan, 2024 4 commits
    • Joao Gante's avatar
      DeepSpeed: hardcode `torch.arange` dtype on `float` usage to avoid incorrect... · beb2a096
      Joao Gante authored
      DeepSpeed: hardcode `torch.arange` dtype on `float` usage to avoid incorrect initialization (#28760)
      
      beb2a096
    • Kian Sierra McGettigan's avatar
      Flax mistral (#26943) · f7076cd3
      Kian Sierra McGettigan authored
      * direct copy from llama work
      
      * mistral modules forward pass working
      
      * flax mistral forward pass with sliding window
      
      * added tests
      
      * added layer collection approach
      
      * Revert "added layer collection approach"
      
      This reverts commit 0e2905bf2236ec323163fc1a9f0c016b21aa8b8f.
      
      * Revert "Revert "added layer collection approach""
      
      This reverts commit fb17b6187ac5d16da7c461e1130514dc3d137a43.
      
      * fixed attention outputs
      
      * added mistral to init and auto
      
      * fixed import name
      
      * fixed layernorm weight dtype
      
      * freeze initialized weights
      
      * make sure conversion consideres bfloat16
      
      * added backend
      
      * added docstrings
      
      * added cache
      
      * fixed sliding window causal mask
      
      * passes cache tests
      
      * passed all tests
      
      * applied make style
      
      * removed commented out code
      
      * applied fix-copies ignored other model changes
      
      * applied make fix-copies
      
      * removed unused functions
      
      * passed generation integration test
      
      * slow tests pass
      
      * fixed slow tests
      
      * changed default dtype from jax.numpy.float32 to float32 for docstring check
      
      * skip cache test  for FlaxMistralForSequenceClassification since if pad_token_id in input_ids it doesn't score previous input_ids
      
      * updated checkpoint since from_pt not included
      
      * applied black style
      
      * removed unused args
      
      * Applied styling and fixup
      
      * changed checkpoint for doc back
      
      * fixed rf after adding it to hf hub
      
      * Add dummy ckpt
      
      * applied styling
      
      * added tokenizer to new ckpt
      
      * fixed slice format
      
      * fix init and slice
      
      * changed ref for placeholder TODO
      
      * added copies from Llama
      
      * applied styling
      
      * applied fix-copies
      
      * fixed docs
      
      * update weight dtype reconversion for sharded weights
      
      * removed Nullable input ids
      
      * Removed unnecessary output attentions in Module
      
      * added embedding weight initialziation
      
      * removed unused past_key_values
      
      * fixed deterministic
      
      * Fixed RMS Norm and added copied from
      
      * removed input_embeds
      
      * applied make style
      
      * removed nullable input ids from sequence classification model
      
      * added copied from GPTJ
      
      * added copied from Llama on FlaxMistralDecoderLayer
      
      * added copied from to FlaxMistralPreTrainedModel methods
      
      * fix test deprecation warning
      
      * freeze gpt neox random_params and fix copies
      
      * applied make style
      
      * fixed doc issue
      
      * skipped docstring test to allign # copied from
      
      * applied make style
      
      * removed FlaxMistralForSequenceClassification
      
      * removed unused padding_idx
      
      * removed more sequence classification
      
      * removed sequence classification
      
      * applied styling and consistency
      
      * added copied from in tests
      
      * removed sequence classification test logic
      
      * applied styling
      
      * applied make style
      
      * removed freeze and fixed copies
      
      * undo test change
      
      * changed repeat_kv to tile
      
      * fixed to key value groups
      
      * updated copyright year
      
      * split casual_mask
      
      * empty to rerun failed pt_flax_equivalence test FlaxWav2Vec2ModelTest
      
      * went back to 2023 for tests_pr_documentation_tests
      
      * went back to 2024
      
      * changed tile to repeat
      
      * applied make style
      
      * empty for retry on Wav2Vec2
      f7076cd3
    • Patrick von Platen's avatar
      [Whisper] Refactor forced_decoder_ids & prompt ids (#28687) · 65a926e8
      Patrick von Platen authored
      
      
      * up
      
      * Fix more
      
      * Correct more
      
      * Fix more tests
      
      * fix fast tests
      
      * Fix more
      
      * fix more
      
      * push all files
      
      * finish all
      
      * make style
      
      * Fix timestamp wrap
      
      * make style
      
      * make style
      
      * up
      
      * up
      
      * up
      
      * Fix lang detection behavior
      
      * Fix lang detection behavior
      
      * Add lang detection test
      
      * Fix lang detection behavior
      
      * make style
      
      * Update src/transformers/models/whisper/generation_whisper.py
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * better error message
      
      * make style tests
      
      * add warning
      
      ---------
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      65a926e8
    • tom-p-reichel's avatar
      don't initialize the output embeddings if we're going to tie them to input embeddings (#28192) · ae0c27ad
      tom-p-reichel authored
      * test that tied output embeddings aren't initialized on load
      
      * don't initialize the output embeddings if we're going to tie them to the input embeddings
      ae0c27ad
  2. 30 Jan, 2024 4 commits
  3. 29 Jan, 2024 3 commits
  4. 26 Jan, 2024 1 commit
  5. 25 Jan, 2024 1 commit
    • NielsRogge's avatar
      Add Depth Anything (#28654) · 963db81a
      NielsRogge authored
      * First draft
      
      * More improvements
      
      * More improvements
      
      * More improvements
      
      * More improvements
      
      * Add docs
      
      * Remove file
      
      * Add copied from
      
      * Address comments
      
      * Address comments
      
      * Address comments
      
      * Fix style
      
      * Update docs
      
      * Convert all checkpoints, add integration test
      
      * Rename checkpoints
      
      * Add pretrained backbone attributes
      
      * Fix default config
      
      * Address comment
      
      * Add figure to docs
      
      * Fix bug thanks to @xenova
      
      * Update conversion script
      
      * Fix integration test
      963db81a
  6. 24 Jan, 2024 1 commit
    • Khai Mai's avatar
      Exclude the load balancing loss of padding tokens in Mixtral-8x7B (#28517) · c5c69096
      Khai Mai authored
      * fix the function load_balancing_loss_func in Mixtral_Moe to include attention_mask
      
      * format code using black and ruff
      
      * skip computing mask if attention_mask=None
      
      * add tests for load balancing loss Mixtral-Moe
      
      * fix assert loss is different in mixtral_test
      
      * fix pad_leng
      
      * use assertNotAlmostEqual and print to debug
      
      * remove print for debug
      
      * minor updates
      
      * reduce rtol and atol
      c5c69096
  7. 23 Jan, 2024 2 commits
  8. 22 Jan, 2024 1 commit
  9. 21 Jan, 2024 1 commit
  10. 19 Jan, 2024 7 commits
  11. 18 Jan, 2024 5 commits
    • Sanchit Gandhi's avatar
      [Whisper] Fix audio classification with weighted layer sum (#28563) · 186aa6be
      Sanchit Gandhi authored
      * fix
      
      * tests
      
      * fix test
      186aa6be
    • Yih-Dar's avatar
      Use `LoggingLevel` context manager in 3 tests (#28575) · 0754217c
      Yih-Dar authored
      
      
      * inside with LoggingLevel
      
      * remove is_flaky
      
      ---------
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      0754217c
    • Yoach Lacombe's avatar
      Add new meta w2v2-conformer BERT-like model (#28165) · d2cdefb9
      Yoach Lacombe authored
      
      
      * first commit
      
      * correct default value non causal
      
      * update config and modeling code
      
      * update converting checkpoint
      
      * clean modeling and fix tests
      
      * make style
      
      * add new config parameters to docstring
      
      * fix copied from statements
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * make position_embeddings_type docstrings clearer
      
      * clean converting script
      
      * remove function not used
      
      * clean modeling file
      
      * apply suggestion for test file + add convert script to not_doctested
      
      * modify tests according to review - cleaner logic and more tests
      
      * Apply nit suggestions from code review
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * add checker of valid position embeddings type
      
      * instantiate new layer norm layer with the right eps
      
      * fix freeze_feature_encoder since it can be None in some cases
      
      * add test same output in convert script
      
      * restore wav2vec2conformer and add new model
      
      * create processor and FE + clean
      
      * add new model code
      
      * fix convert script and set default config parameters
      
      * correct model id paths
      
      * make style
      
      * make fix-copies and cleaning files
      
      * fix copied from statements
      
      * complete .md and fixe copies
      
      * clean convert script argument defaults
      
      * fix config parameters docstrings
      
      * fix config docstring
      
      * add copied from and enrich FE tests
      
      * fix copied from and repo-consistency
      
      * add autotokenizer
      
      * make test input length shorter and change docstring code
      
      * fix docstrings and copied from
      
      * add add_adapter to ASR training example
      
      * make testing of adapters more robust
      
      * adapt to multi adapter layers
      
      * refactor input_values->input_features and remove w2v2-bert feature extractor
      
      * remove pretraining model
      
      * remove depreciated features and useless lines
      
      * add copied from and ignore statements to modeling tests
      
      * remove pretraining model #2
      
      * change import in convert script
      
      * change default in convert script
      
      * update readme and remove useless line
      
      * Update tests/models/wav2vec2_bert/test_processor_wav2vec2_bert.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * refactor BERT to Bert for consistency
      
      * remove useless ignore copy statement
      
      * add persistent to buffer in rotary
      
      * add eps in LayerNorm init and remove copied from
      
      * add adapter activation parameters and add copied from statements
      
      * Fix copied statements and add unitest.skip reasons
      
      * add copied statement in test_processor
      
      * refactor processor
      
      * make style
      
      * replace numpy random by torch rand
      
      * remove expected output CTC
      
      * improve converting script with processor class
      
      * Apply suggestions from code review
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * remove gumbel class
      
      * remove tests related to previously deleted class
      
      * Update src/transformers/models/wav2vec2_bert/configuration_wav2vec2_bert.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * correct typos
      
      * remove uused parameters
      
      * update processor to takes both text and audio
      
      * update checkpoints
      
      * update expected output and add ctc expected output
      
      * add label_attention_mask
      
      * replace pt with np in processor tests
      
      * fix typo
      
      * revert to behaviour with labels_attention_mask
      
      ---------
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      d2cdefb9
    • Arthur's avatar
      [`Core Tokenization`] Support a fix for spm fast models (#26678) · 81899778
      Arthur authored
      * fix
      
      * last attempt
      
      * current work
      
      * fix forward compatibility
      
      * save all special tokens
      
      * current state
      
      * revert additional changes
      
      * updates
      
      * remove tokenizer.model
      
      * add a test and the fix
      
      * nit
      
      * revert one more break
      
      * fix typefield issue
      
      * quality
      
      * more tests
      
      * fix fields for FC
      
      * more nits?
      
      * new additional changes
      
      * how
      
      * some updates
      
      * the fix
      
      * where do we stand
      
      * nits
      
      * nits
      
      * revert unrelated changes
      
      * nits nits nits
      
      * styling
      
      * don't break llama just yet
      
      * revert llama changes
      
      * safe arg check
      
      * fixup
      
      * Add a test for T5
      
      * Necessary changes
      
      * Tests passing, added tokens need to not be normalized. If the added tokens are normalized, it will the stripping which seems to be unwanted for a normal functioning
      
      * Add even more tests, when normalization is set to True (which does not work 馃槗 )
      
      * Add even more tests, when normalization is set to True (which does not work 馃槗 )
      
      * Update to main
      
      * nits
      
      * fmt
      
      * more and more test
      
      * comments
      
      * revert change as tests are failing
      
      * make the test more readble
      
      * nits
      
      * refactor the test
      
      * nit
      
      * updates
      
      * simplify
      
      * style
      
      * style
      
      * style convert slow
      
      * Update src/transformers/convert_slow_tokenizer.py
      81899778
    • Yih-Dar's avatar
      Save `Processor` (#27761) · 3005f965
      Yih-Dar authored
      
      
      * save processor
      
      * Update tests/models/auto/test_processor_auto.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update tests/test_processing_common.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * fix
      
      ---------
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      3005f965
  12. 17 Jan, 2024 3 commits
    • fxmarty's avatar
      Fix SDPA tests (#28552) · 2c1eebc1
      fxmarty authored
      
      
      * skip bf16 test if not supported by device
      
      * fix
      
      * fix bis
      
      * use is_torch_bf16_available_on_device
      
      * use is_torch_fp16_available_on_device
      
      * fix & use public llama
      
      * use 1b model
      
      * fix flacky test
      
      ---------
      Co-authored-by: default avatarYour Name <you@example.com>
      2c1eebc1
    • Junyang Lin's avatar
      Add qwen2 (#28436) · d6ffe74d
      Junyang Lin authored
      
      
      * add config, modeling, and tokenization
      
      * add auto and init
      
      * update readme
      
      * update readme
      
      * update team name
      
      * fixup
      
      * fixup
      
      * update config
      
      * update code style
      
      * update for fixup
      
      * update for fixup
      
      * update for fixup
      
      * update for testing
      
      * update for testing
      
      * fix bug for config and tokenization
      
      * fix bug for bos token
      
      * not doctest
      
      * debug tokenizer
      
      * not doctest
      
      * debug tokenization
      
      * debug init for tokenizer
      
      * fix style
      
      * update init
      
      * delete if in token auto
      
      * add tokenizer doc
      
      * add tokenizer in init
      
      * Update dummy_tokenizers_objects.py
      
      * update
      
      * update
      
      * debug
      
      * Update tokenization_qwen2.py
      
      * debug
      
      * Update convert_slow_tokenizer.py
      
      * add copies
      
      * add copied from and make style
      
      * update files map
      
      * update test
      
      * fix style
      
      * fix merge reading and update tests
      
      * fix tests
      
      * fix tests
      
      * fix style
      
      * debug a variable in readme
      
      * Update src/transformers/models/qwen2/configuration_qwen2.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * update test and copied from
      
      * fix style
      
      * update qwen2 tokenization  and tests
      
      * Update tokenization_qwen2.py
      
      * delete the copied from after property
      
      * fix style
      
      * update tests
      
      * update tests
      
      * add copied from
      
      * fix bugs
      
      * update doc
      
      * add warning for sliding window attention
      
      * update qwen2 tokenization
      
      * fix style
      
      * Update src/transformers/models/qwen2/modeling_qwen2.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * fix tokenizer fast
      
      ---------
      Co-authored-by: default avatarRen Xuancheng <jklj077@users.noreply.github.com>
      Co-authored-by: default avatarrenxuancheng.rxc <renxuancheng.rxc@alibaba-inc.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      d6ffe74d
    • fxmarty's avatar
      symbolic_trace: add past_key_values, llama, sdpa support (#28447) · a6adc05e
      fxmarty authored
      * torch.fx: add pkv, llama, sdpa support
      
      * Update src/transformers/models/opt/modeling_opt.py
      
      * remove spaces
      
      * trigger ci
      
      * use explicit variable names
      a6adc05e
  13. 16 Jan, 2024 6 commits
    • Joao Gante's avatar
    • Arthur's avatar
      [`SpeechT5Tokenization`] Add copied from and fix the... · fe23256b
      Arthur authored
      [`SpeechT5Tokenization`]  Add copied from and fix the `convert_tokens_to_string` to match the fast decoding scheme (#28522)
      
      * Add copied from and fix the `convert_tokens_to_string` to match the fast decoding scheme
      
      * fixup
      
      * add a small test
      
      * style test file
      
      * nites
      fe23256b
    • Arthur's avatar
      [`TokenizationRoformerFast`] Fix the save and loading (#28527) · 96d08831
      Arthur authored
      * cleanup
      
      * add a test
      
      * update the test
      
      * style
      
      * revert part that allows to pickle the tokenizer
      96d08831
    • Arthur's avatar
      [ `TokenizationUtils`] Fix `add_special_tokens` when the token is already there (#28520) · 716df5fb
      Arthur authored
      
      
      * fix adding special tokens when the token is already there.
      
      * add a test
      
      * add a test
      
      * nit
      
      * fix the test: make sure the order is preserved
      
      * Update tests/test_tokenization_common.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      ---------
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      716df5fb
    • Nima Yaqmuri's avatar
      Fix/speecht5 bug (#28481) · 07ae53e6
      Nima Yaqmuri authored
      * Fix bug in SpeechT5 speech decoder prenet's forward method
      
      - Removed redundant `repeat` operation on speaker_embeddings in the forward method. This line was erroneously duplicating the embeddings, leading to incorrect input size for concatenation and performance issues.
      - Maintained original functionality of the method, ensuring the integrity of the speech decoder prenet's forward pass remains intact.
      - This change resolves a critical bug affecting the model's performance in handling speaker embeddings.
      
      * Refactor SpeechT5 text to speech integration tests
      
      - Updated SpeechT5ForTextToSpeechIntegrationTests to accommodate the variability in sequence lengths due to dropout in the speech decoder pre-net. This change ensures that our tests are robust against random variations in generated speech, enhancing the reliability of our test suite.
      - Removed hardcoded dimensions in test assertions. Replaced with dynamic checks based on model configuration and seed settings, ensuring tests remain valid across different runs and configurations.
      - Added new test cases to thoroughly validate the shapes of generated spectrograms and waveforms. These tests leverage seed settings to ensure consistent and predictable behavior in testing, addressing potential issues in speech generation and vocoder processing.
      - Fixed existing test cases where incorrect assumptions about output shapes led to potential errors.
      
      * Fix bug in SpeechT5 speech decoder prenet's forward method
      
      - Removed redundant `repeat` operation on speaker_embeddings in the forward method. This line was erroneously duplicating the embeddings, leading to incorrect input size for concatenation and performance issues.
      - Maintained original functionality of the method, ensuring the integrity of the speech decoder prenet's forward pass remains intact.
      - This change resolves a critical bug affecting the model's performance in handling speaker embeddings.
      
      * Refactor SpeechT5 text to speech integration tests
      
      - Updated SpeechT5ForTextToSpeechIntegrationTests to accommodate the variability in sequence lengths due to dropout in the speech decoder pre-net. This change ensures that our tests are robust against random variations in generated speech, enhancing the reliability of our test suite.
      - Removed hardcoded dimensions in test assertions. Replaced with dynamic checks based on model configuration and seed settings, ensuring tests remain valid across different runs and configurations.
      - Added new test cases to thoroughly validate the shapes of generated spectrograms and waveforms. These tests leverage seed settings to ensure consistent and predictable behavior in testing, addressing potential issues in speech generation and vocoder processing.
      - Fixed existing test cases where incorrect assumptions about output shapes led to potential errors.
      
      * Enhance handling of speaker embeddings in SpeechT5
      
      - Refined the generate and generate_speech functions in the SpeechT5 class to robustly handle two scenarios for speaker embeddings: matching the batch size (one embedding per sample) and one-to-many (a single embedding for all samples in the batch).
      - The update includes logic to repeat the speaker embedding when a single embedding is provided for multiple samples, and a ValueError is raised for any mismatched dimensions.
      - Also added corresponding test cases to validate both scenarios, ensuring complete coverage and functionality for diverse speaker embedding situations.
      
      * Improve Test Robustness with Randomized Speaker Embeddings
      07ae53e6
    • fxmarty's avatar
      Fix mismatching loading in from_pretrained with/without accelerate (#28414) · 66db33dd
      fxmarty authored
      * fix mismatching behavior in from_pretrained with/without accelerate
      
      * meaningful refactor
      
      * remove added space
      
      * add test
      
      * fix model on the hub
      
      * comment
      
      * use tiny model
      
      * style
      66db33dd
  14. 15 Jan, 2024 1 commit