1. 26 Apr, 2024 3 commits
    • JB (Don)'s avatar
      [`BERT`] Add support for sdpa (#28802) · dfa7b580
      JB (Don) authored
      * Adding SDPA support for BERT
      
      * Using the proper input name for testing model input in inference()
      
      * Adding documentation for SDPA in BERT model page
      
      * Use the stable link for the documentation
      
      * Adding a gate to only call .contiguous() for torch < 2.2.0
      
      * Additions and fixes to the documentation
      
      * Minor updates to documentation
      
      * Adding extra requirements needed for the contiguous() bug
      
      * Adding "Adapted from" in plcae of the "Copied from"
      
      * Add benchmark speedup tables to the documentation
      
      * Minor fixes to the documentation
      
      * Use ClapText as a replacemenet for Bert in the Copied-From
      
      * Some more fixes for the fix-copies references
      
      * Overriding the test_eager_matches_sdpa_generate in bert tests to not load with low_cpu_mem_usage
      
      [test all]
      
      * Undo changes to separate test
      
      * Refactored SDPA self attention code for KV projections
      
      * Change use_sdpa to attn_implementation
      
      * Fix test_sdpa_can_dispatch_on_flash by preparing input (required for MultipleChoice models)
      dfa7b580
    • Matt's avatar
      Use the Keras set_random_seed in tests (#30504) · 2de5cb12
      Matt authored
      Use the Keras set_random_seed to ensure reproducible weight initialization
      2de5cb12
    • Michael Goin's avatar
      Update `dtype_byte_size` to handle torch.float8_e4m3fn/float8_e5m2 types (#30488) · 20081c74
      Michael Goin authored
      * Update modeling_utils/dtype_byte_size to handle float8 types
      
      * Add a test for dtype_byte_size
      
      * Format
      
      * Fix bool
      20081c74
  2. 25 Apr, 2024 5 commits
    • Raushan Turganbay's avatar
      Fix Llava for 0-embeddings (#30473) · e60491ad
      Raushan Turganbay authored
      e60491ad
    • Zach Mueller's avatar
      Introduce Stateful Callbacks (#29666) · ad697f18
      Zach Mueller authored
      
      
      * Introduce saveable callbacks
      
      * Add note
      
      * Test for non-present and flag
      
      * Support early stopping and refusing to train further
      
      * Update docstring
      
      * More saving
      
      * Import oopsie
      
      * Apply suggestions from code review
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Make it go through TrainerArguments
      
      * Document
      
      * Fix test
      
      * Apply suggestions from code review
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Rework to allow for duplicates
      
      * CLean
      
      * Fix failing tests
      
      ---------
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      ad697f18
    • Alexander Visheratin's avatar
      Add WSD scheduler (#30231) · 7b1170b0
      Alexander Visheratin authored
      * Added WSD scheduler.
      
      * Added tests.
      
      * Fixed errors.
      
      * Fix formatting.
      
      * CI fixes.
      7b1170b0
    • Yoach Lacombe's avatar
      馃毃 Add training compatibility for Musicgen-like models (#29802) · 90cb55bf
      Yoach Lacombe authored
      
      
      * first modeling code
      
      * make repository
      
      * still WIP
      
      * update model
      
      * add tests
      
      * add latest change
      
      * clean docstrings and copied from
      
      * update docstrings md and readme
      
      * correct chroma function
      
      * correct copied from and remove unreleated test
      
      * add doc to toctree
      
      * correct imports
      
      * add convert script to notdoctested
      
      * Add suggestion from Sanchit
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * correct get_uncoditional_inputs docstrings
      
      * modify README according to SANCHIT feedback
      
      * add chroma to audio utils
      
      * clean librosa and torchaudio hard dependencies
      
      * fix FE
      
      * refactor audio decoder -> audio encoder for consistency with previous musicgen
      
      * refactor conditional -> encoder
      
      * modify sampling rate logics
      
      * modify license at the beginning
      
      * refactor all_self_attns->all_attentions
      
      * remove ignore copy from causallm generate
      
      * add copied from for from_sub_models
      
      * fix make copies
      
      * add warning if audio is truncated
      
      * add copied from where relevant
      
      * remove artefact
      
      * fix convert script
      
      * fix torchaudio and FE
      
      * modify chroma method according to feedback-> better naming
      
      * refactor input_values->input_features
      
      * refactor input_values->input_features and fix import fe
      
      * add input_features to docstrigs
      
      * correct inputs_embeds logics
      
      * remove dtype conversion
      
      * refactor _prepare_conditional_hidden_states_kwargs_for_generation ->_prepare_encoder_hidden_states_kwargs_for_generation
      
      * change warning for chroma length
      
      * Update src/transformers/models/musicgen_melody/convert_musicgen_melody_transformers.py
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * change way to save wav, using soundfile
      
      * correct docs and change to soundfile
      
      * fix import
      
      * fix init proj layers
      
      * add draft training
      
      * fix cross entropy
      
      * clean loss computation
      
      * fix labels
      
      * remove line breaks from md
      
      * fix issue with docstrings
      
      * add FE suggestions
      
      * improve is in logics and remove useless imports
      
      * remove custom from_pretrained
      
      * simplify docstring code
      
      * add suggestions for modeling tests
      
      * make style
      
      * update converting script with sanity check
      
      * remove encoder attention mask from conditional generation
      
      * replace musicgen melody checkpoints with official orga
      
      * rename ylacombe->facebook in checkpoints
      
      * fix copies
      
      * remove unecessary warning
      
      * add shape in code docstrings
      
      * add files to slow doc tests
      
      * fix md bug and add md to not_tested
      
      * make fix-copies
      
      * fix hidden states test and batching
      
      * update training code
      
      * add training tests for melody
      
      * add training for o.g musicgen
      
      * fix copied from
      
      * remove final todos
      
      * make style
      
      * fix style
      
      * add suggestions from review
      
      * add ref to the original loss computation code
      
      * rename method + fix labels in tests
      
      * make style
      
      ---------
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      90cb55bf
    • amyeroberts's avatar
      aca4a103
  3. 24 Apr, 2024 5 commits
  4. 23 Apr, 2024 4 commits
  5. 22 Apr, 2024 6 commits
  6. 19 Apr, 2024 8 commits
    • Jo茫o David's avatar
      Add TF swiftformer (#23342) · d2cec09b
      Jo茫o David authored
      
      
      * Duplicate swiftformer
      
      * Convert SwiftFormerPatchEmbedding
      
      * Convert SwiftFormerEmbeddings
      
      * Convert TFSwiftFormerMlp
      
      * Convert TFSwiftFormerConvEncoder
      
      * Convert TFSwiftFormerLocalRepresentation
      
      * convert TFSwiftFormerEncoderBlock
      
      * Convert SwiftFormerStage
      
      * Convert SwiftFormerEncoder
      
      * Add TFSWiftFormerPreTrainedModel
      
      * Convert SwiftFormerForImageClassification
      
      * Add kwargs and start drop path
      
      * Fix syntax
      
      * Change Model class name
      
      * Add TFSwiftFormer to __init__
      
      * Duplicate test_modeling_swiftformer
      
      * First test conversions
      
      * Change require_torch to require_tf
      
      * Add exports to swiftformer __init__
      
      * Add TFSwiftFormerModel wrapper
      
      * Fix __init__ and run black
      
      * Remove docstring from MainLayer, fix padding
      
      * Use keras.layers.Activation on keras.Sequential
      
      * Fix swiftformer exports
      
      * Fix activation layer from config
      
      * Remove post_inits
      
      * Use tf.keras.layers.ZeroPadding2D
      
      * Convert torch normalize
      
      * Change tf test input shape
      
      * Fix softmax and reduce_sum
      
      * Convert expand_dims and repeat
      
      * Add missing reshape and tranpose
      
      * Simplify TFSwiftFormerEncoderBlock.call
      
      * Fix mismatch in patch embeddings
      
      * Fix expected output shape to match channels last
      
      * Fix swiftformer typo
      
      * Disable test_onnx
      
      * Fix TFSwiftFormerForImageClassification call
      
      * Add unpack inputs
      
      * Convert flatten(2).mean(-1)
      
      * Change vision dummy inputs (to be reviewed)
      
      * Change test_forward_signature to use .call
      
      * Fix @unpack_inputs
      
      * Set return_tensors="tf" and rename class
      
      * Rename wrongly named patch_embeddings layer
      
      * Add serving_output and change dummy_input shape
      
      * Make dimensions BCHW and transpose inside embedding layer
      
      * Change SwiftFormerEncoderBlock
      
      * Fix ruff problems
      
      * Add image size to swiftformer config
      
      * Change tranpose to MainLayer and use -1 for reshape
      
      * Remove serving_outputs and dummy_inputs
      
      * Remove test_initialization test from tf model
      
      * Make Sequential component a separate layer
      
      * Fix layers' names
      
      * Tranpose encoder outputs
      
      * Fix tests and check if hidden states is not None
      
      * Fix TFSwiftFormerForImageClassification
      
      * Run make fixup
      
      * Run make fix-copies
      
      * Update modeling_tf_auto
      
      * Update docs
      
      * Fix modeling auto mapping
      
      * Update modelint_tf_swiftformer docs
      
      * Fill image_size doc and type
      
      * Add reduction=None to loss computation
      
      * Update docs
      
      * make style
      
      * Debug: Delete the tip to see if that changes anything
      
      * Re-add tip
      
      * Remove add_code_sample_docstrings
      
      * Remove unused import
      
      * Get the debug to actually tell us the problem it has with the docs
      
      * Try a substitution to match the PyTorch file?
      
      * Add swiftformer to ignore list
      
      * Add build() methods
      
      * Update copyright year
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Remove FIXME comment
      
      * Remove from_pt
      
      * Update copyright year
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Rename one-letter variables
      
      * Remove FIXMEs related to momentum
      
      * Remove old TODO comment
      
      * Remove outstanding FIXME comments
      
      * Get dropout rate from config
      
      * Add specific dropout config for MLP
      
      * Add convencoder dropout to config
      
      * Pass config to SwiftFormerDropPath layer
      
      * Fix drop_path variable name and add Adapted from comment
      
      * Run ruff
      
      * Removed copied from comment
      
      * Run fix copies
      
      * Change drop_path to identity to match pt
      
      * Cleanup build() methods and move to new keras imports
      
      * Update docs/source/en/model_doc/swiftformer.md
      Co-authored-by: default avatarMatt <Rocketknight1@users.noreply.github.com>
      
      * Raise error if drop_path_rate > 0.0
      
      * Apply suggestions from code review
      
      Replace (self.dim), with self.dim,
      Co-authored-by: default avatarMatt <Rocketknight1@users.noreply.github.com>
      
      * Remove drop_path function
      
      * Add training to TFSwiftFormerEncoder
      
      * Set self.built = True last
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Should have been added to previous commit
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Apply suggestions from code review
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Change default_feature_extractor to default_image_processor
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Import Keras from modeling_tf_utils
      
      * Remove relative import
      
      * Run ruff --fix
      
      * Move import keras to tf_available
      
      * Add copied from comment to test_forward_signature
      
      * Reduce batch size and num_labels
      
      * Extract loss logic to hf_compute_loss
      
      * Run ruff format
      
      ---------
      Co-authored-by: default avatarMatt <rocketknight1@gmail.com>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      Co-authored-by: default avatarMatt <Rocketknight1@users.noreply.github.com>
      d2cec09b
    • hoshi-hiyouga's avatar
      Fix config + attn_implementation in AutoModelForCausalLM.from_pretrained (#30299) · 21c912e7
      hoshi-hiyouga authored
      * Update modeling_utils.py
      
      * Update test_modeling_utils.py
      
      * Update test_modeling_utils.py
      
      * Update test_modeling_utils.py
      21c912e7
    • Raushan Turganbay's avatar
      Do not remove half seq length in generation tests (#30016) · b1cd4874
      Raushan Turganbay authored
      
      
      * remove seq length from generation tests
      
      * style and quality
      
      * [test_all] & PR suggestion
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      
      * Update tests/generation/test_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * [test all] remove unused variables
      
      ---------
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      b1cd4874
    • Marc Sun's avatar
      Update unwrap from accelerate (#29933) · b4fd49b6
      Marc Sun authored
      
      
      * Use unwrap with the one in accelerate
      
      * oups
      
      * update unwrap
      
      * fix
      
      * wording
      
      * raise error instead
      
      * comment
      
      * doc
      
      * Update src/transformers/modeling_utils.py
      Co-authored-by: default avatarZach Mueller <muellerzr@gmail.com>
      
      * style
      
      * put else
      
      ---------
      Co-authored-by: default avatarZach Mueller <muellerzr@gmail.com>
      b4fd49b6
    • Sanchit Gandhi's avatar
      [Whisper] Fix slow tests (#30152) · 4ed0e51c
      Sanchit Gandhi authored
      
      
      * fix tests
      
      * style
      
      * more fixes
      
      * move model to device
      
      * move logits to cpu
      
      * update expected values
      
      * use ungated dataset
      
      * fix
      
      * fix
      
      * update
      
      ---------
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      4ed0e51c
    • Sanchit Gandhi's avatar
      cd09a8df
    • Jacky Lee's avatar
      Enable multi-device for some models (#30207) · 30b45320
      Jacky Lee authored
      
      
      * feat: multidevice for resnet
      
      * feat: yes! resnet
      
      * fix: compare all elements in tuple
      
      * feat: support for regnet
      
      * feat: support for convnextv2
      
      * feat: support for bit
      
      * feat: support for cvt
      
      * feat: add support for focalnet
      
      * feat: support for yolos
      
      * feat: support for glpn
      
      * feat: support for imagegpt
      
      * feat: support for levit
      
      * feat: support for mgp_str
      
      * feat: support for mobilnet_v1
      
      * feat: support for mobilnet_v2
      
      * feat: support for mobilevit
      
      * feat: support for mobilevitv2
      
      * feat: support for poolformer
      
      * fix: copies
      
      * fix: code quality check
      
      * update: upstream changes from main
      
      * fix: consistency check
      
      * feat: support for sam
      
      * feat: support for switchformer
      
      * feat: support for swin
      
      * feat: support for swinv2
      
      * feat: support for timesformer
      
      * feat: suport for trocr
      
      * feat: support for upernet
      
      * fix: check copies
      
      * update: rerun CI
      
      * update: rerun again, maybe
      
      * update: one more rerun
      
      ---------
      Co-authored-by: default avatarJacky Lee <jackylee328@gmail.com>
      30b45320
    • NielsRogge's avatar
      [UDOP] Add special tokens to tokenizer (#29594) · ecfe9be7
      NielsRogge authored
      * Add special tokens
      
      * Add special tokens
      
      * Use fmt
      
      * Uncomment code
      
      * Add test
      
      * Remove scripts
      
      * Address comments
      
      * Improve tests
      
      * Address comment
      
      * Remove flag
      ecfe9be7
  7. 18 Apr, 2024 9 commits
    • Zach Mueller's avatar
      馃毃馃毃馃毃Deprecate `evaluation_strategy` to `eval_strategy`馃毃馃毃馃毃 (#30190) · 60d5f8f9
      Zach Mueller authored
      * Alias
      
      * Note alias
      
      * Tests and src
      
      * Rest
      
      * Clean
      
      * Change typing?
      
      * Fix tests
      
      * Deprecation versions
      60d5f8f9
    • Albert Villanova del Moral's avatar
      Fix test transposing image with EXIF Orientation tag (#30319) · c86d020e
      Albert Villanova del Moral authored
      * Fix test with exif_transpose image
      
      * Replace datasets with PIL to load image in tests
      c86d020e
    • Younes Belkada's avatar
      FIX: Fixes unexpected behaviour for Llava / LLama & AWQ Fused modules + revert... · 5728b5ad
      Younes Belkada authored
      FIX: Fixes unexpected behaviour for Llava / LLama & AWQ Fused modules + revert #30070 at the same time (#30317)
      
      * Update awq.py
      
      * style
      
      * revert felix PR
      
      * fix
      
      * add felix comments
      5728b5ad
    • Abhi Venigalla's avatar
      Add DBRX Model (#29921) · 005b957f
      Abhi Venigalla authored
      
      
      * wip
      
      * fix __init__.py
      
      * add docs
      
      * Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * address comments 1
      
      * work on make fixup
      
      * pass configs down
      
      * add sdpa attention
      
      * remove DbrxBlock
      
      * add to configuration_auto
      
      * docstring now passes formatting test
      
      * fix style
      
      * update READMEs
      
      * add dbrx to modeling_auto
      
      * make fix-copies generated this
      
      * add DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP
      
      * config docstring passes formatting test
      
      * rename moe_loss_weight to router_aux_loss_coef
      
      * add to flash-attn documentation
      
      * fix model-path in tests
      
      * Explicitly make `"suli"` the default `ffn_act_fn`
      Co-authored-by: default avatarWing Lian <wing.lian@gmail.com>
      
      * default to using router_aux_loss_coef over ffn_config[moe_loss_weight]
      
      * fix _flash_attn_uses_top_left_mask and is_causal
      
      * fix tests path
      
      * don't use token type IDs
      
      * follow Llama and remove token_type_ids from test
      
      * init ConfigTester differently so tests pass
      
      * remove multiple choice test
      
      * remove question + answer test
      
      * remove sequence classification test
      
      * remove token classification test
      
      * copy Llama tests and remove token_type_ids from test inputs
      
      * do not test pruning or headmasking; style code
      
      * add _tied_weights_keys parameter to pass test
      
      * add type hints
      
      * fix type check
      
      * update config tester
      
      * remove masked_lm test
      
      * remove encoder tests
      
      * initialize DbrxModelTester with correct params
      
      * style
      
      * torch_dtype does not rely on torch
      
      * run make fixup, fix-copies
      
      * use https://huggingface.co/v2ray/dbrx-base-fixed/blob/main/modeling_dbrx.py
      
      
      
      * add copyright info
      
      * fix imports and DbrxRotaryEmbedding
      
      * update DbrxModel docstring
      
      * use copies
      
      * change model path in docstring
      
      * use config in DbrxFFN
      
      * fix flashattention2, sdpaattention
      
      * input config to DbrXAttention, DbrxNormAttentionNorm
      
      * more fixes
      
      * fix
      
      * fix again!
      
      * add informative comment
      
      * fix ruff?
      
      * remove print statement + style
      
      * change doc-test
      
      * fix doc-test
      
      * fix docstring
      
      * delete commented out text
      
      * make defaults match dbrx-instruct
      
      * replace `router_aux_loss_coef` with `moe_loss_weight`
      
      * is_decoder=True
      
      * remove is_decoder from configtester
      
      * implement sdpa properly
      
      * make is_decoder pass tests
      
      * start on the GenerationTesterMixin tests
      
      * add dbrx to sdpa documentation
      
      * skip weight typing test
      
      * style
      
      * initialize smaller model
      Co-authored-by: default avatarMatt <Rocketknight1@users.noreply.github.com>
      
      * Add DBRX to toctree
      
      * skip test_new_cache_format
      
      * make config defaults smaller again
      
      * add pad_token_id
      
      * remove pad_token_id from config
      
      * Remove all references to DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP
      
      * Update src/transformers/models/dbrx/__init__.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/models/dbrx/modeling_dbrx.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update docs/source/en/model_doc/dbrx.md
      Co-authored-by: default avatarMatt <Rocketknight1@users.noreply.github.com>
      
      * Update src/transformers/models/dbrx/configuration_dbrx.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update docs/source/en/model_doc/dbrx.md
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * fix typo
      
      * Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * update docs, fix configuration_auto.py
      
      * address pr comments
      
      * remove is_decoder flag
      
      * slice
      
      * fix requires grad
      
      * remove grad
      
      * disconnect differently
      
      * remove grad
      
      * enable grads
      
      * patch
      
      * detach expert
      
      * nissan al ghaib
      
      * Update modeling_dbrx.py
      
      * Update src/transformers/models/dbrx/modeling_dbrx.py
      Co-authored-by: default avatarMatt <Rocketknight1@users.noreply.github.com>
      
      * replace "Gemma" with "Dbrx"
      
      * remove # type: ignore
      
      * don't hardcode vocab_size
      
      * remove ToDo
      
      * Re-add removed idefics2 line
      
      * Update test to use tiny-random!
      
      * Remove TODO
      
      * Remove one more case of loading the entire dbrx-instruct in the tests
      
      * Update src/transformers/models/dbrx/modeling_dbrx.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * address some comments
      
      * small model
      
      * add dbrx to tokenization_auto
      
      * More docstrings with add_start_docstrings
      
      * Dbrx for now
      
      * add PipelineTesterMixin
      
      * Update src/transformers/models/dbrx/configuration_dbrx.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * remove flash-attn2 import error
      
      * fix docstring
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * add useage example
      
      * put on one line
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * fix ffn_act_fn
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * change "dbrx" to "DBRX" for display purposes.
      
      * fix __init__.py?
      
      * fix __init__.py
      
      * fix README
      
      * return the aux_loss
      
      * remove extra spaces
      
      * fix configuration_auto.py
      
      * fix format in tokenization_auto
      
      * remove new line
      
      * add more useage examples
      
      ---------
      Co-authored-by: default avatarAbhi Venigalla <abhi.venigalla@databricks.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      Co-authored-by: default avatarEitan Turok <eitan.turok@databricks.com>
      Co-authored-by: default avatarEitan Turok <150733043+eitanturok@users.noreply.github.com>
      Co-authored-by: default avatarWing Lian <wing.lian@gmail.com>
      Co-authored-by: default avatarEitan Turok <eitanturok@gmail.com>
      Co-authored-by: default avatarMatt <Rocketknight1@users.noreply.github.com>
      Co-authored-by: default avatarMatt <rocketknight1@gmail.com>
      Co-authored-by: default avatarYour Name <you@example.com>
      Co-authored-by: default avatarMihir Patel <mihir.v.patel7@gmail.com>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      005b957f
    • Arthur's avatar
      Revert "Re-enable SDPA's FA2 path (#30070)" (#30314) · acab997b
      Arthur authored
      * Revert "Re-enable SDPA's FA2 path (#30070)"
      
      This reverts commit 05bdef16.
      
      * Revert "Fix quality Olmo + SDPA (#30302)"
      
      This reverts commit ec92f983.
      acab997b
    • fxmarty's avatar
      Add atol for sliding window test (#30303) · 9459efb8
      fxmarty authored
      atol for sliding window test
      9459efb8
    • tomeras91's avatar
      Add jamba (#29943) · 3f20877d
      tomeras91 authored
      * Add jamba arch
      
      * apply "make fix-copies" changes
      
      * fix link to model in JambaConfig docstring
      
      * Add n_ctx in modeling file because repo-consistency wants that
      
      * Add jamba to flash attention and sdpa documentation
      
      * mamba dt_proj quant fix now works for LoRA as well
      
      * override test_left_padding_compatibility and use a more permissive tolerance. left padding numerical difference are accentuated by mamba layers
      
      * add jamba to tokenization auto
      
      * fix comments of shape (PR #24 in the model page: https://huggingface.co/ai21labs/Jamba-v0.1/discussions/24)
      
      * simple PR fixes
      
      * remove unnecessary kwargs from JambaAttentionDecoderLayer and JambaMambaDecoderLayer
      
      * remove the LoRA hack for the mamba dt_proj bias. It was solved in huggingface/peft#1530 (https://github.com/huggingface/peft/pull/1530)
      
      * Add copied comment on JambaMLP (it's the same as MixtralMLP)
      
      * remove padding_mask warnings. It's not supported anymore
      
      * fix docstring. Float instead of int
      
      * A few more minor PR fixes
      
      * (1) lowercase names for mamba layernorms (2) remove _apply_inner_layernorms and do it directly in the forward pass
      
      * Return None attention weights from mamba layers. Append to all attentions only if not None.
      
      * remove some leftover jamba archive lists
      
      * Better separation between expert vs non-expert layers. non-expert layers return None as router_logits, and it is not concatenated to all_router_logits returned from JambaModel
      
      * no need to take router_logits at config.expert_layer_offset anymore. result.router_logits now holds results only for expert layers
      
      * Add Jamba paper on READMEs
      
      * (1) rename n_ctx -> max_position_embeddings (2) don't use it in the modeling file since it's not needed (set it as an exception to check_config_attributes)
      
      * Add copied from comment
      
      * remove the code path for apply_inner_layernorms=False. Jamba always has the inner mamba layernorms
      
      * clearer docstring for _convert_to_standard_cache
      
      * style fixes
      
      * Change calc_logits_for_entire_prompt (bool) to num_logits_to_keep (int). Adapt assisted decoding code tp use it. Also small change in low memory beam search decoding path to support this new int value in model_inputs
      
      * rename test so it still overrides what its meant to override
      
      * draft
      
      * oups
      
      * nit
      
      * remove more complexe logic
      
      * fix names used in config
      
      * fix fix fix
      
      * style
      
      * fix some more failing tests
      
      * generate did not init the cache 馃檭
      
      
      
      * more small nits
      
      * typo
      
      * config.mamba_expand * config.hidden_size for the intermediate size of the mamba shapes
      
      * fix init of pkv with torch.tensor()
      
      * empty tensor
      
      * fix some init issues
      
      * stupid changes required by generate because it does not even support it's own DynamicCache class
      
      * more fixes
      
      * fix general assisted gen cache_position bug
      
      * tests passing
      
      * Add offsets and periods as SPECIAL_CASES_TO_ALLOW in check_config_attributes.py
      
      * fix reorder_cache to reorder mamba states and override some more functions in HybridMambaAttentionDynamicCache
      
      * no need to override test_past_key_values_format() and _check_past_key_values_for_generate() in tests anymore
      
      * fix docstrings and typehints for past_key_values
      
      * style fixes
      
      * fix docs
      
      * change typehint due to copy from Mixtral
      
      * forgot import
      
      * import order
      
      * Add configuration_jamba and modeling_jamba to not_doctested because the model is too big to download (in docstring of JambaForCausalLM.forward)
      
      * Add integration test with tiny tandom Jamba model on hub
      
      * fix flash attention cache shapes
      
      * bring back forgotten hidden states
      
      * rename HybridMambaAttentionDynamicCache.seqlen_offset to has_previous_state (and make bool) and bugfix - it should be set to True after a finished forward pass of the entire model
      
      * align integration test after modeling fixes
      
      * bugfix - mamba can use precomputed states only of forward pass is on a single token
      
      * bugfix - mamba can use precomputed states only if they match the batch size
      
      * typo
      
      * remove making _prepare_4d_causal_attention_mask a leaf function
      
      * stop using past_seq_len.get_seq_length(). Use cache positions instead. Adjust test (test_decoder_model_past_with_large_inputs) accordingly
      
      ---------
      Co-authored-by: default avatarArthur Zucker <arthur.zucker@gmail.com>
      Co-authored-by: default avatarJoao Gante <joao@huggingface.co>
      3f20877d
    • Yih-Dar's avatar
      Fix all torch pipeline failures except one (#30290) · 28a22834
      Yih-Dar authored
      
      
      * fix
      
      * fix
      
      ---------
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      28a22834
    • Pavel Iakubovskii's avatar
      Fix donut token2json multiline (#30300) · 7915a259
      Pavel Iakubovskii authored
      * Fix multiline processing
      
      * Update test for token2json
      7915a259