1. 16 Jan, 2024 3 commits
    • Arthur's avatar
      [ `TokenizationUtils`] Fix `add_special_tokens` when the token is already there (#28520) · 716df5fb
      Arthur authored
      
      
      * fix adding special tokens when the token is already there.
      
      * add a test
      
      * add a test
      
      * nit
      
      * fix the test: make sure the order is preserved
      
      * Update tests/test_tokenization_common.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      ---------
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      716df5fb
    • Nima Yaqmuri's avatar
      Fix/speecht5 bug (#28481) · 07ae53e6
      Nima Yaqmuri authored
      * Fix bug in SpeechT5 speech decoder prenet's forward method
      
      - Removed redundant `repeat` operation on speaker_embeddings in the forward method. This line was erroneously duplicating the embeddings, leading to incorrect input size for concatenation and performance issues.
      - Maintained original functionality of the method, ensuring the integrity of the speech decoder prenet's forward pass remains intact.
      - This change resolves a critical bug affecting the model's performance in handling speaker embeddings.
      
      * Refactor SpeechT5 text to speech integration tests
      
      - Updated SpeechT5ForTextToSpeechIntegrationTests to accommodate the variability in sequence lengths due to dropout in the speech decoder pre-net. This change ensures that our tests are robust against random variations in generated speech, enhancing the reliability of our test suite.
      - Removed hardcoded dimensions in test assertions. Replaced with dynamic checks based on model configuration and seed settings, ensuring tests remain valid across different runs and configurations.
      - Added new test cases to thoroughly validate the shapes of generated spectrograms and waveforms. These tests leverage seed settings to ensure consistent and predictable behavior in testing, addressing potential issues in speech generation and vocoder processing.
      - Fixed existing test cases where incorrect assumptions about output shapes led to potential errors.
      
      * Fix bug in SpeechT5 speech decoder prenet's forward method
      
      - Removed redundant `repeat` operation on speaker_embeddings in the forward method. This line was erroneously duplicating the embeddings, leading to incorrect input size for concatenation and performance issues.
      - Maintained original functionality of the method, ensuring the integrity of the speech decoder prenet's forward pass remains intact.
      - This change resolves a critical bug affecting the model's performance in handling speaker embeddings.
      
      * Refactor SpeechT5 text to speech integration tests
      
      - Updated SpeechT5ForTextToSpeechIntegrationTests to accommodate the variability in sequence lengths due to dropout in the speech decoder pre-net. This change ensures that our tests are robust against random variations in generated speech, enhancing the reliability of our test suite.
      - Removed hardcoded dimensions in test assertions. Replaced with dynamic checks based on model configuration and seed settings, ensuring tests remain valid across different runs and configurations.
      - Added new test cases to thoroughly validate the shapes of generated spectrograms and waveforms. These tests leverage seed settings to ensure consistent and predictable behavior in testing, addressing potential issues in speech generation and vocoder processing.
      - Fixed existing test cases where incorrect assumptions about output shapes led to potential errors.
      
      * Enhance handling of speaker embeddings in SpeechT5
      
      - Refined the generate and generate_speech functions in the SpeechT5 class to robustly handle two scenarios for speaker embeddings: matching the batch size (one embedding per sample) and one-to-many (a single embedding for all samples in the batch).
      - The update includes logic to repeat the speaker embedding when a single embedding is provided for multiple samples, and a ValueError is raised for any mismatched dimensions.
      - Also added corresponding test cases to validate both scenarios, ensuring complete coverage and functionality for diverse speaker embedding situations.
      
      * Improve Test Robustness with Randomized Speaker Embeddings
      07ae53e6
    • fxmarty's avatar
      Fix mismatching loading in from_pretrained with/without accelerate (#28414) · 66db33dd
      fxmarty authored
      * fix mismatching behavior in from_pretrained with/without accelerate
      
      * meaningful refactor
      
      * remove added space
      
      * add test
      
      * fix model on the hub
      
      * comment
      
      * use tiny model
      
      * style
      66db33dd
  2. 15 Jan, 2024 5 commits
  3. 13 Jan, 2024 1 commit
  4. 12 Jan, 2024 5 commits
  5. 11 Jan, 2024 5 commits
    • Yih-Dar's avatar
      Byebye torch 1.10 (#28207) · 59cd9de3
      Yih-Dar authored
      
      
      * fix
      
      * fix
      
      ---------
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      59cd9de3
    • liangxuZhang's avatar
      Fix load balancing loss func for mixtral (#28256) · e768616a
      liangxuZhang authored
      
      
      * Correct the implementation of auxiliary loss of mixtrtal
      
      * correct the implementation of auxiliary loss of mixtrtal
      
      * Implement a simpler calculation method
      
      ---------
      Co-authored-by: default avatarzhangliangxu3 <zhangliangxu3@jd.com>
      e768616a
    • Gustavo de Rosa's avatar
      [Phi] Extend implementation to use GQA/MQA. (#28163) · 55090585
      Gustavo de Rosa authored
      * chore(phi): Updates configuration_phi with missing keys.
      
      * chore(phi): Adds first draft of combined modeling_phi.
      
      * fix(phi): Fixes according to latest review.
      
      * fix(phi): Removes pad_vocab_size_multiple to prevent inconsistencies.
      
      * fix(phi): Fixes unit and integration tests.
      
      * fix(phi): Ensures that everything works with microsoft/phi-1 for first integration.
      
      * fix(phi): Fixes output of docstring generation.
      
      * fix(phi): Fixes according to latest review.
      
      * fix(phi): Fixes according to latest review.
      
      * fix(tests): Re-enables Phi-1.5 test.
      
      * fix(phi): Fixes attention overflow on PhiAttention (for Phi-2).
      
      * fix(phi): Improves how queries and keys are upcast.
      
      * fix(phi): Small updates on latest changes.
      55090585
    • Harisankar Babu's avatar
      Optionally preprocess segmentation maps for MobileViT (#28420) · d5606378
      Harisankar Babu authored
      * optionally preprocess segmentation maps for mobilevit
      
      * changed pretrained model name to that of segmentation model
      
      * removed voc-deeplabv3 from model archive list
      
      * added preprocess_image and preprocess_mask methods for processing images and segmentation masks respectively
      
      * added tests for segmentation masks based on segformer feature extractor
      
      * use crop_size instead of size
      
      * reverting to initial model
      d5606378
    • amyeroberts's avatar
      Enable multi-label image classification in pipeline (#28433) · 66964c00
      amyeroberts authored
      Enable multi-label image classification
      66964c00
  6. 10 Jan, 2024 6 commits
  7. 09 Jan, 2024 2 commits
  8. 08 Jan, 2024 3 commits
    • NielsRogge's avatar
      Add SigLIP (#26522) · 3b742ea8
      NielsRogge authored
      
      
      * Add first draft
      
      * Use appropriate gelu function
      
      * More improvements
      
      * More improvements
      
      * More improvements
      
      * Convert checkpoint
      
      * More improvements
      
      * Improve docs, remove print statements
      
      * More improvements
      
      * Add link
      
      * remove unused masking function
      
      * begin tokenizer
      
      * do_lower_case
      
      * debug
      
      * set split_special_tokens=True
      
      * Remove script
      
      * Fix style
      
      * Fix rebase
      
      * Use same design as CLIP
      
      * Add fast tokenizer
      
      * Add SiglipTokenizer to init, remove extra_ids
      
      * Improve conversion script
      
      * Use smaller inputs in conversion script
      
      * Update conversion script
      
      * More improvements
      
      * Add processor to conversion script
      
      * Add tests
      
      * Remove print statements
      
      * Add tokenizer tests
      
      * Fix more tests
      
      * More improvements related to weight initialization
      
      * More improvements
      
      * Make more tests pass
      
      * More improvements
      
      * More improvements
      
      * Add copied from
      
      * Add canonicalize_text
      
      * Enable fast tokenizer tests
      
      * More improvements
      
      * Fix most slow tokenizer tests
      
      * Address comments
      
      * Fix style
      
      * Remove script
      
      * Address some comments
      
      * Add copied from to tests
      
      * Add more copied from
      
      * Add more copied from
      
      * Add more copied from
      
      * Remove is_flax_available
      
      * More updates
      
      * Address comment
      
      * Remove SiglipTokenizerFast for now
      
      * Add caching
      
      * Remove umt5 test
      
      * Add canonicalize_text inside _tokenize, thanks Arthur
      
      * Fix image processor tests
      
      * Skip tests which are not applicable
      
      * Skip test_initialization
      
      * More improvements
      
      * Compare pixel values
      
      * Fix doc tests, add integration test
      
      * Add do_normalize
      
      * Remove causal mask and leverage ignore copy
      
      * Fix attention_mask
      
      * Fix remaining tests
      
      * Fix dummies
      
      * Rename temperature and bias
      
      * Address comments
      
      * Add copied from to tokenizer tests
      
      * Add SiglipVisionModel to auto mapping
      
      * Add copied from to image processor tests
      
      * Improve doc
      
      * Remove SiglipVisionModel from index
      
      * Address comments
      
      * Improve docs
      
      * Simplify config
      
      * Add first draft
      
      * Make it like mistral
      
      * More improvements
      
      * Fix attention_mask
      
      * Fix output_attentions
      
      * Add note in docs
      
      * Convert multilingual model
      
      * Convert large checkpoint
      
      * Convert more checkpoints
      
      * Add pipeline support, correct image_mean and image_std
      
      * Use padding=max_length by default
      
      * Make processor like llava
      
      * Add code snippet
      
      * Convert more checkpoints
      
      * Set keep_punctuation_string=None as in OpenCLIP
      
      * Set normalized=False for special tokens
      
      * Fix doc test
      
      * Update integration test
      
      * Add figure
      
      * Update organization
      
      * Happy new year
      
      * Use AutoModel everywhere
      
      ---------
      Co-authored-by: default avatarpatil-suraj <surajp815@gmail.com>
      3b742ea8
    • Rosie Wood's avatar
      Add segmentation map processing to SAM Image Processor (#27463) · 73c88012
      Rosie Wood authored
      
      
      * add segmentation map processing to sam image processor
      
      * fixup
      
      * add tests
      
      * reshaped_input_size is shape before padding
      
      * update tests for size/shape outputs
      
      * fixup
      
      * add code snippet to docs
      
      * Update docs/source/en/model_doc/sam.md
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Add missing backticks
      
      * add `segmentation_maps` as arg for SamProcessor.__call__()
      
      ---------
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      73c88012
    • Mohamed Abu El-Nasr's avatar
      Fix building alibi tensor when num_heads is not a power of 2 (#28380) · 0c2121f9
      Mohamed Abu El-Nasr authored
      * Fix building alibi tensor when num_heads is not a power of 2
      
      * Remove print function
      0c2121f9
  9. 07 Jan, 2024 1 commit
  10. 05 Jan, 2024 3 commits
  11. 04 Jan, 2024 2 commits
  12. 03 Jan, 2024 2 commits
    • Apsod's avatar
      Remove token_type_ids from model_input_names (like #24788) (#28325) · 45b1dfa3
      Apsod authored
      * remove token_type_ids from model_input_names (like #24788)
      
      * removed test that assumed token_type_ids should be present and updated a model reference so that it points to an available model)
      45b1dfa3
    • Connor Henderson's avatar
      Add FastSpeech2Conformer (#23439) · d83ff5ee
      Connor Henderson authored
      * start - docs, SpeechT5 copy and rename
      
      * add relevant code from FastSpeech2 draft, have tests pass
      
      * make it an actual conformer, demo ex.
      
      * matching inference with original repo, includes debug code
      
      * refactor nn.Sequentials, start more desc. var names
      
      * more renaming
      
      * more renaming
      
      * vocoder scratchwork
      
      * matching vocoder outputs
      
      * hifigan vocoder conversion script
      
      * convert model script, rename some config vars
      
      * replace postnet with speecht5's implementation
      
      * passing common tests, file cleanup
      
      * expand testing, add output hidden states and attention
      
      * tokenizer + passing tokenizer tests
      
      * variety of updates and tests
      
      * g2p_en pckg setup
      
      * import structure edits
      
      * docstrings and cleanup
      
      * repo consistency
      
      * deps
      
      * small cleanup
      
      * forward signature param order
      
      * address comments except for masks and labels
      
      * address comments on attention_mask and labels
      
      * address second round of comments
      
      * remove old unneeded line
      
      * address comments part 1
      
      * address comments pt 2
      
      * rename auto mapping
      
      * fixes for failing tests
      
      * address comments part 3 (bart-like, train loss)
      
      * make style
      
      * pass config where possible
      
      * add forward method + tests to WithHifiGan model
      
      * make style
      
      * address arg passing and generate_speech comments
      
      * address Arthur comments
      
      * address Arthur comments pt2
      
      * lint  changes
      
      * Sanchit comment
      
      * add g2p-en to doctest deps
      
      * move up self.encoder
      
      * onnx compatible tensor method
      
      * fix is symbolic
      
      * fix paper url
      
      * move models to espnet org
      
      * make style
      
      * make fix-copies
      
      * update docstring
      
      * Arthur comments
      
      * update docstring w/ new updates
      
      * add model architecture images
      
      * header size
      
      * md wording update
      
      * make style
      d83ff5ee
  13. 25 Dec, 2023 1 commit
  14. 22 Dec, 2023 1 commit