1. 05 Feb, 2024 6 commits
  2. 02 Feb, 2024 9 commits
  3. 01 Feb, 2024 8 commits
  4. 31 Jan, 2024 13 commits
    • Shichao Song's avatar
      [docs] Correct the statement in the docstirng of compute_transition_scores in... · 7b2bd1fb
      Shichao Song authored
      [docs] Correct the statement in the docstirng of compute_transition_scores in generation/utils.py (#28786)
      
      7b2bd1fb
    • Yih-Dar's avatar
      Split daily CI using 2 level matrix (#28773) · 47358661
      Yih-Dar authored
      
      
      * update / add new workflow files
      
      * Add comment
      
      * Use env.NUM_SLICES
      
      * use scripts
      
      * use scripts
      
      * use scripts
      
      * Fix
      
      * using one script
      
      * Fix
      
      * remove unused file
      
      * update
      
      * fail-fast: false
      
      * remove unused file
      
      * fix
      
      * fix
      
      * use matrix
      
      * inputs
      
      * style
      
      * update
      
      * fix
      
      * fix
      
      * no model name
      
      * add doc
      
      * allow args
      
      * style
      
      * pass argument
      
      ---------
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      47358661
    • Yih-Dar's avatar
      Add artifact name in job step to maintain job / artifact correspondence (#28682) · 95346e9d
      Yih-Dar authored
      
      
      * avoid using job name
      
      * apply to other files
      
      ---------
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      95346e9d
    • Joao Gante's avatar
      DeepSpeed: hardcode `torch.arange` dtype on `float` usage to avoid incorrect... · beb2a096
      Joao Gante authored
      DeepSpeed: hardcode `torch.arange` dtype on `float` usage to avoid incorrect initialization (#28760)
      
      beb2a096
    • Kian Sierra McGettigan's avatar
      Flax mistral (#26943) · f7076cd3
      Kian Sierra McGettigan authored
      * direct copy from llama work
      
      * mistral modules forward pass working
      
      * flax mistral forward pass with sliding window
      
      * added tests
      
      * added layer collection approach
      
      * Revert "added layer collection approach"
      
      This reverts commit 0e2905bf2236ec323163fc1a9f0c016b21aa8b8f.
      
      * Revert "Revert "added layer collection approach""
      
      This reverts commit fb17b6187ac5d16da7c461e1130514dc3d137a43.
      
      * fixed attention outputs
      
      * added mistral to init and auto
      
      * fixed import name
      
      * fixed layernorm weight dtype
      
      * freeze initialized weights
      
      * make sure conversion consideres bfloat16
      
      * added backend
      
      * added docstrings
      
      * added cache
      
      * fixed sliding window causal mask
      
      * passes cache tests
      
      * passed all tests
      
      * applied make style
      
      * removed commented out code
      
      * applied fix-copies ignored other model changes
      
      * applied make fix-copies
      
      * removed unused functions
      
      * passed generation integration test
      
      * slow tests pass
      
      * fixed slow tests
      
      * changed default dtype from jax.numpy.float32 to float32 for docstring check
      
      * skip cache test  for FlaxMistralForSequenceClassification since if pad_token_id in input_ids it doesn't score previous input_ids
      
      * updated checkpoint since from_pt not included
      
      * applied black style
      
      * removed unused args
      
      * Applied styling and fixup
      
      * changed checkpoint for doc back
      
      * fixed rf after adding it to hf hub
      
      * Add dummy ckpt
      
      * applied styling
      
      * added tokenizer to new ckpt
      
      * fixed slice format
      
      * fix init and slice
      
      * changed ref for placeholder TODO
      
      * added copies from Llama
      
      * applied styling
      
      * applied fix-copies
      
      * fixed docs
      
      * update weight dtype reconversion for sharded weights
      
      * removed Nullable input ids
      
      * Removed unnecessary output attentions in Module
      
      * added embedding weight initialziation
      
      * removed unused past_key_values
      
      * fixed deterministic
      
      * Fixed RMS Norm and added copied from
      
      * removed input_embeds
      
      * applied make style
      
      * removed nullable input ids from sequence classification model
      
      * added copied from GPTJ
      
      * added copied from Llama on FlaxMistralDecoderLayer
      
      * added copied from to FlaxMistralPreTrainedModel methods
      
      * fix test deprecation warning
      
      * freeze gpt neox random_params and fix copies
      
      * applied make style
      
      * fixed doc issue
      
      * skipped docstring test to allign # copied from
      
      * applied make style
      
      * removed FlaxMistralForSequenceClassification
      
      * removed unused padding_idx
      
      * removed more sequence classification
      
      * removed sequence classification
      
      * applied styling and consistency
      
      * added copied from in tests
      
      * removed sequence classification test logic
      
      * applied styling
      
      * applied make style
      
      * removed freeze and fixed copies
      
      * undo test change
      
      * changed repeat_kv to tile
      
      * fixed to key value groups
      
      * updated copyright year
      
      * split casual_mask
      
      * empty to rerun failed pt_flax_equivalence test FlaxWav2Vec2ModelTest
      
      * went back to 2023 for tests_pr_documentation_tests
      
      * went back to 2024
      
      * changed tile to repeat
      
      * applied make style
      
      * empty for retry on Wav2Vec2
      f7076cd3
    • Matt's avatar
      Wrap Keras methods to support BatchEncoding (#28734) · 7a496100
      Matt authored
      * Shim the Keras methods to support BatchEncoding
      
      * Extract everything to a convert_batch_encoding function
      
      * Convert BatchFeature too (thanks Amy)
      
      * tf.keras -> keras
      7a496100
    • Julien Chaumond's avatar
      canonical repos moves (#28795) · 721e2d94
      Julien Chaumond authored
      
      
      * canonical repos moves
      
      * Style
      
      ---------
      Co-authored-by: default avatarLysandre <lysandre@huggingface.co>
      721e2d94
    • Hieu Lam's avatar
      Resolve DeepSpeed cannot resume training with PeftModel (#28746) · bebeeee0
      Hieu Lam authored
      * fix: resolve deepspeed resume peft model issues
      
      * chore: update something
      
      * chore: update model instance pass into is peft model checks
      
      * chore: remove hard code value to tests
      
      * fix: format code
      bebeeee0
    • Patrick von Platen's avatar
      [Whisper] Refactor forced_decoder_ids & prompt ids (#28687) · 65a926e8
      Patrick von Platen authored
      
      
      * up
      
      * Fix more
      
      * Correct more
      
      * Fix more tests
      
      * fix fast tests
      
      * Fix more
      
      * fix more
      
      * push all files
      
      * finish all
      
      * make style
      
      * Fix timestamp wrap
      
      * make style
      
      * make style
      
      * up
      
      * up
      
      * up
      
      * Fix lang detection behavior
      
      * Fix lang detection behavior
      
      * Add lang detection test
      
      * Fix lang detection behavior
      
      * make style
      
      * Update src/transformers/models/whisper/generation_whisper.py
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * better error message
      
      * make style tests
      
      * add warning
      
      ---------
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      65a926e8
    • Younes Belkada's avatar
      [`HFQuantizer`] Remove `check_packages_compatibility` logic (#28789) · f9f1f2ac
      Younes Belkada authored
      remove `check_packages_compatibility` logic
      f9f1f2ac
    • tom-p-reichel's avatar
      don't initialize the output embeddings if we're going to tie them to input embeddings (#28192) · ae0c27ad
      tom-p-reichel authored
      * test that tied output embeddings aren't initialized on load
      
      * don't initialize the output embeddings if we're going to tie them to the input embeddings
      ae0c27ad
    • Alessio Serra's avatar
      Prevent MLflow exception from disrupting training (#28779) · a937425e
      Alessio Serra authored
      
      
      Modified MLflow logging metrics from synchronous to asynchronous
      Co-authored-by: default avatarcodiceSpaghetti <alessio.ser@hotmail.it>
      a937425e
    • Younes Belkada's avatar
      [`bnb`] Fix bnb slow tests (#28788) · d703eaae
      Younes Belkada authored
      fix bnb slow tests
      d703eaae
  5. 30 Jan, 2024 4 commits
    • Matt's avatar
      Pin Torch to <2.2.0 (#28785) · 74c9cfea
      Matt authored
      
      
      * Pin torch to <2.2.0
      
      * Pin torchvision and torchaudio as well
      
      * Playing around with versions to see if this helps
      
      * twiddle something to restart the CI
      
      * twiddle it back
      
      * Try changing the natten version
      
      * make fixup
      
      * Revert "Try changing the natten version"
      
      This reverts commit de0d6592c35dc39ae8b5a616c27285db28262d06.
      
      * make fixup
      
      * fix fix fix
      
      * fix fix fix
      
      ---------
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      74c9cfea
    • Matt's avatar
      Add tf_keras imports to prepare for Keras 3 (#28588) · 415e9a09
      Matt authored
      * Port core files + ESM (because ESM code is odd)
      
      * Search-replace in modelling code
      
      * Fix up transfo_xl as well
      
      * Fix other core files + tests (still need to add correct import to tests)
      
      * Fix cookiecutter
      
      * make fixup, fix imports in some more core files
      
      * Auto-add imports to tests
      
      * Cleanup, add imports to sagemaker tests
      
      * Use correct exception for importing tf_keras
      
      * Fixes in modeling_tf_utils
      
      * make fixup
      
      * Correct version parsing code
      
      * Ensure the pipeline tests correctly revert to float32 after each test
      
      * Ensure the pipeline tests correctly revert to float32 after each test
      
      * More tf.keras -> keras
      
      * Add dtype cast
      
      * Better imports of tf_keras
      
      * Add a cast for tf.assign, just in case
      
      * Fix callback imports
      415e9a09
    • amyeroberts's avatar
      Task-specific pipeline init args (#28439) · 1d489b3e
      amyeroberts authored
      * Abstract out pipeline init args
      
      * Address PR comments
      
      * Reword
      
      * BC PIPELINE_INIT_ARGS
      
      * Remove old arguments
      
      * Small fix
      1d489b3e
    • amyeroberts's avatar
      [`Backbone`] Use `load_backbone` instead of `AutoBackbone.from_config` (#28661) · 2fa1c808
      amyeroberts authored
      * Enable instantiating model with pretrained backbone weights
      
      * Remove doc updates until changes made in modeling code
      
      * Use load_backbone instead
      
      * Add use_timm_backbone to the model configs
      
      * Add missing imports and arguments
      
      * Update docstrings
      
      * Make sure test is properly configured
      
      * Include recent DPT updates
      2fa1c808