1. 10 May, 2024 1 commit
  2. 09 May, 2024 2 commits
  3. 08 May, 2024 2 commits
  4. 07 May, 2024 1 commit
  5. 06 May, 2024 1 commit
  6. 02 May, 2024 4 commits
    • mobicham's avatar
      Add HQQ quantization support (#29637) · 59952994
      mobicham authored
      
      
      * update HQQ transformers integration
      
      * push import_utils.py
      
      * add force_hooks check in modeling_utils.py
      
      * fix | with Optional
      
      * force bias as param
      
      * check bias is Tensor
      
      * force forward for multi-gpu
      
      * review fixes pass
      
      * remove torch grad()
      
      * if any key in linear_tags fix
      
      * add cpu/disk check
      
      * isinstance return
      
      * add multigpu test + refactor tests
      
      * clean hqq_utils imports in hqq.py
      
      * clean hqq_utils imports in quantizer_hqq.py
      
      * delete hqq_utils.py
      
      * Delete src/transformers/utils/hqq_utils.py
      
      * ruff init
      
      * remove torch.float16 from __init__ in test
      
      * refactor test
      
      * isinstance -> type in quantizer_hqq.py
      
      * cpu/disk device_map check in quantizer_hqq.py
      
      * remove type(module) nn.linear check in quantizer_hqq.py
      
      * add BaseQuantizeConfig import inside HqqConfig init
      
      * remove hqq import in hqq.py
      
      * remove accelerate import from test_hqq.py
      
      * quant config.py doc update
      
      * add hqqconfig to main_classes doc
      
      * make style
      
      * __init__ fix
      
      * ruff __init__
      
      * skip_modules list
      
      * hqqconfig format fix
      
      * hqqconfig doc fix
      
      * hqqconfig doc fix
      
      * hqqconfig doc fix
      
      * hqqconfig doc fix
      
      * hqqconfig doc fix
      
      * hqqconfig doc fix
      
      * hqqconfig doc fix
      
      * hqqconfig doc fix
      
      * hqqconfig doc fix
      
      * test_hqq.py remove mistral comment
      
      * remove self.using_multi_gpu is False
      
      * torch_dtype default val set and logger.info
      
      * hqq.py isinstance fix
      
      * remove torch=None
      
      * torch_device test_hqq
      
      * rename test_hqq
      
      * MODEL_ID in test_hqq
      
      * quantizer_hqq setattr fix
      
      * quantizer_hqq typo fix
      
      * imports quantizer_hqq.py
      
      * isinstance quantizer_hqq
      
      * hqq_layer.bias reformat quantizer_hqq
      
      * Step 2 as comment in quantizer_hqq
      
      * prepare_for_hqq_linear() comment
      
      * keep_in_fp32_modules fix
      
      * HqqHfQuantizer reformat
      
      * quantization.md hqqconfig
      
      * quantization.md model example reformat
      
      * quantization.md # space
      
      * quantization.md space   })
      
      * quantization.md space   })
      
      * quantization_config fix doc
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * axis value check in quantization_config
      
      * format
      
      * dynamic config explanation
      
      * quant config method in quantization.md
      
      * remove shard-level progress
      
      * .cuda fix modeling_utils
      
      * test_hqq fixes
      
      * make fix-copies
      
      ---------
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      59952994
    • Joao Gante's avatar
      Docs: add missing `StoppingCriteria` autodocs (#30617) · 66abe139
      Joao Gante authored
      
      
      * add missing docstrings to docs
      
      * Update src/transformers/generation/stopping_criteria.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      ---------
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      66abe139
    • Joao Gante's avatar
      Docs: fix `generate`-related rendering issues (#30600) · aa55ff44
      Joao Gante authored
      * does this work?
      
      * like this?
      
      * fix the other generate links
      
      * missing these
      aa55ff44
    • amitportnoy's avatar
      phi3 chat_template does not support system role (#30606) · 801894e0
      amitportnoy authored
      * phi3 chat_template does not support system role
      
      * fix doc test error
      801894e0
  7. 01 May, 2024 2 commits
  8. 30 Apr, 2024 2 commits
  9. 29 Apr, 2024 1 commit
  10. 26 Apr, 2024 4 commits
    • Eitan Turok's avatar
      Fix link in dbrx.md (#30509) · 73014b56
      Eitan Turok authored
      73014b56
    • Eduardo Pacheco's avatar
      [SegGPT] Fix seggpt image processor (#29550) · 6d4cabda
      Eduardo Pacheco authored
      * Fixed SegGptImageProcessor to handle 2D and 3D prompt mask inputs
      
      * Added new test to check prompt mask equivalence
      
      * New proposal
      
      * Better proposal
      
      * Removed unnecessary method
      
      * Updated seggpt docs
      
      * Introduced do_convert_rgb
      
      * nits
      6d4cabda
    • amyeroberts's avatar
      Fix GroundingDINO, DPR after BERT SDPA update (#30506) · e7d52a10
      amyeroberts authored
      Fix GroundingDINO, DPR after BET SDPA update
      e7d52a10
    • JB (Don)'s avatar
      [`BERT`] Add support for sdpa (#28802) · dfa7b580
      JB (Don) authored
      * Adding SDPA support for BERT
      
      * Using the proper input name for testing model input in inference()
      
      * Adding documentation for SDPA in BERT model page
      
      * Use the stable link for the documentation
      
      * Adding a gate to only call .contiguous() for torch < 2.2.0
      
      * Additions and fixes to the documentation
      
      * Minor updates to documentation
      
      * Adding extra requirements needed for the contiguous() bug
      
      * Adding "Adapted from" in plcae of the "Copied from"
      
      * Add benchmark speedup tables to the documentation
      
      * Minor fixes to the documentation
      
      * Use ClapText as a replacemenet for Bert in the Copied-From
      
      * Some more fixes for the fix-copies references
      
      * Overriding the test_eager_matches_sdpa_generate in bert tests to not load with low_cpu_mem_usage
      
      [test all]
      
      * Undo changes to separate test
      
      * Refactored SDPA self attention code for KV projections
      
      * Change use_sdpa to attn_implementation
      
      * Fix test_sdpa_can_dispatch_on_flash by preparing input (required for MultipleChoice models)
      dfa7b580
  11. 25 Apr, 2024 3 commits
  12. 24 Apr, 2024 4 commits
    • Gustavo de Rosa's avatar
      Phi-3 (#30423) · c9693db2
      Gustavo de Rosa authored
      * chore(root): Initial commit of Phi-3 files.
      
      * fix(root): Fixes Phi-3 missing on readme.
      
      * fix(root): Ensures files are consistent.
      
      * fix(phi3): Fixes unit tests.
      
      * fix(tests): Fixes style of phi-3 test file.
      
      * chore(tests): Adds integration tests for Phi-3.
      
      * fix(phi3): Removes additional flash-attention usage, .e.g, swiglu and rmsnorm.
      
      * fix(phi3): Fixes incorrect docstrings.
      
      * fix(phi3): Fixes docstring typos.
      
      * fix(phi3): Adds support for Su and Yarn embeddings.
      
      * fix(phi3): Improves according first batch of reviews.
      
      * fix(phi3): Uses up_states instead of y in Phi3MLP.
      
      * fix(phi3): Uses gemma rotary embedding to support torch.compile.
      
      * fix(phi3): Improves how rotary embedding classes are defined.
      
      * fix(phi3): Fixes inv_freq not being re-computed for extended RoPE.
      
      * fix(phi3): Adds last suggestions to modeling file.
      
      * fix(phi3): Splits inv_freq calculation in two lines.
      c9693db2
    • Arthur's avatar
      Add llama3 (#30334) · 89c510d8
      Arthur authored
      
      
      * nuke
      
      * add co-author
      
      * add co-author
      
      * update card
      
      * fixup and fix copies to please our ci
      
      * nit fixup
      
      * super small nits
      
      * remove tokenizer_path from call to `write_model`
      
      * always safe serialize by default
      
      ---------
      Co-authored-by: default avatarpcuenca <pcuenca@users.noreply.github.com>
      Co-authored-by: default avatarxenova <xenova@users.noreply.github.com>
      89c510d8
    • Lysandre Debut's avatar
      Remove add-new-model in favor of add-new-model-like (#30424) · d4e92f1a
      Lysandre Debut authored
      * Remove add-new-model in favor of add-new-model-like
      
      * nits
      d4e92f1a
    • Lysandre Debut's avatar
  13. 23 Apr, 2024 2 commits
  14. 22 Apr, 2024 4 commits
  15. 19 Apr, 2024 3 commits
    • NielsRogge's avatar
      [Grounding DINO] Add resources (#30232) · 8c12690c
      NielsRogge authored
      
      
      * Add resources
      
      * Address comments
      
      * Apply suggestions from code review
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      ---------
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      8c12690c
    • Jo茫o David's avatar
      Add TF swiftformer (#23342) · d2cec09b
      Jo茫o David authored
      
      
      * Duplicate swiftformer
      
      * Convert SwiftFormerPatchEmbedding
      
      * Convert SwiftFormerEmbeddings
      
      * Convert TFSwiftFormerMlp
      
      * Convert TFSwiftFormerConvEncoder
      
      * Convert TFSwiftFormerLocalRepresentation
      
      * convert TFSwiftFormerEncoderBlock
      
      * Convert SwiftFormerStage
      
      * Convert SwiftFormerEncoder
      
      * Add TFSWiftFormerPreTrainedModel
      
      * Convert SwiftFormerForImageClassification
      
      * Add kwargs and start drop path
      
      * Fix syntax
      
      * Change Model class name
      
      * Add TFSwiftFormer to __init__
      
      * Duplicate test_modeling_swiftformer
      
      * First test conversions
      
      * Change require_torch to require_tf
      
      * Add exports to swiftformer __init__
      
      * Add TFSwiftFormerModel wrapper
      
      * Fix __init__ and run black
      
      * Remove docstring from MainLayer, fix padding
      
      * Use keras.layers.Activation on keras.Sequential
      
      * Fix swiftformer exports
      
      * Fix activation layer from config
      
      * Remove post_inits
      
      * Use tf.keras.layers.ZeroPadding2D
      
      * Convert torch normalize
      
      * Change tf test input shape
      
      * Fix softmax and reduce_sum
      
      * Convert expand_dims and repeat
      
      * Add missing reshape and tranpose
      
      * Simplify TFSwiftFormerEncoderBlock.call
      
      * Fix mismatch in patch embeddings
      
      * Fix expected output shape to match channels last
      
      * Fix swiftformer typo
      
      * Disable test_onnx
      
      * Fix TFSwiftFormerForImageClassification call
      
      * Add unpack inputs
      
      * Convert flatten(2).mean(-1)
      
      * Change vision dummy inputs (to be reviewed)
      
      * Change test_forward_signature to use .call
      
      * Fix @unpack_inputs
      
      * Set return_tensors="tf" and rename class
      
      * Rename wrongly named patch_embeddings layer
      
      * Add serving_output and change dummy_input shape
      
      * Make dimensions BCHW and transpose inside embedding layer
      
      * Change SwiftFormerEncoderBlock
      
      * Fix ruff problems
      
      * Add image size to swiftformer config
      
      * Change tranpose to MainLayer and use -1 for reshape
      
      * Remove serving_outputs and dummy_inputs
      
      * Remove test_initialization test from tf model
      
      * Make Sequential component a separate layer
      
      * Fix layers' names
      
      * Tranpose encoder outputs
      
      * Fix tests and check if hidden states is not None
      
      * Fix TFSwiftFormerForImageClassification
      
      * Run make fixup
      
      * Run make fix-copies
      
      * Update modeling_tf_auto
      
      * Update docs
      
      * Fix modeling auto mapping
      
      * Update modelint_tf_swiftformer docs
      
      * Fill image_size doc and type
      
      * Add reduction=None to loss computation
      
      * Update docs
      
      * make style
      
      * Debug: Delete the tip to see if that changes anything
      
      * Re-add tip
      
      * Remove add_code_sample_docstrings
      
      * Remove unused import
      
      * Get the debug to actually tell us the problem it has with the docs
      
      * Try a substitution to match the PyTorch file?
      
      * Add swiftformer to ignore list
      
      * Add build() methods
      
      * Update copyright year
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Remove FIXME comment
      
      * Remove from_pt
      
      * Update copyright year
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Rename one-letter variables
      
      * Remove FIXMEs related to momentum
      
      * Remove old TODO comment
      
      * Remove outstanding FIXME comments
      
      * Get dropout rate from config
      
      * Add specific dropout config for MLP
      
      * Add convencoder dropout to config
      
      * Pass config to SwiftFormerDropPath layer
      
      * Fix drop_path variable name and add Adapted from comment
      
      * Run ruff
      
      * Removed copied from comment
      
      * Run fix copies
      
      * Change drop_path to identity to match pt
      
      * Cleanup build() methods and move to new keras imports
      
      * Update docs/source/en/model_doc/swiftformer.md
      Co-authored-by: default avatarMatt <Rocketknight1@users.noreply.github.com>
      
      * Raise error if drop_path_rate > 0.0
      
      * Apply suggestions from code review
      
      Replace (self.dim), with self.dim,
      Co-authored-by: default avatarMatt <Rocketknight1@users.noreply.github.com>
      
      * Remove drop_path function
      
      * Add training to TFSwiftFormerEncoder
      
      * Set self.built = True last
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Should have been added to previous commit
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Apply suggestions from code review
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Change default_feature_extractor to default_image_processor
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Import Keras from modeling_tf_utils
      
      * Remove relative import
      
      * Run ruff --fix
      
      * Move import keras to tf_available
      
      * Add copied from comment to test_forward_signature
      
      * Reduce batch size and num_labels
      
      * Extract loss logic to hf_compute_loss
      
      * Run ruff format
      
      ---------
      Co-authored-by: default avatarMatt <rocketknight1@gmail.com>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      Co-authored-by: default avatarMatt <Rocketknight1@users.noreply.github.com>
      d2cec09b
    • Matt's avatar
      Deprecate default chat templates (#30346) · 0927bfd0
      Matt authored
      * initial commit, remove warnings on default chat templates
      
      * stash commit
      
      * Raise a much sterner warning for default chat templates, and prepare for depreciation
      
      * Update the docs
      0927bfd0
  16. 18 Apr, 2024 4 commits
    • Zach Mueller's avatar
      馃毃馃毃馃毃Deprecate `evaluation_strategy` to `eval_strategy`馃毃馃毃馃毃 (#30190) · 60d5f8f9
      Zach Mueller authored
      * Alias
      
      * Note alias
      
      * Tests and src
      
      * Rest
      
      * Clean
      
      * Change typing?
      
      * Fix tests
      
      * Deprecation versions
      60d5f8f9
    • Abhi Venigalla's avatar
      Add DBRX Model (#29921) · 005b957f
      Abhi Venigalla authored
      
      
      * wip
      
      * fix __init__.py
      
      * add docs
      
      * Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * address comments 1
      
      * work on make fixup
      
      * pass configs down
      
      * add sdpa attention
      
      * remove DbrxBlock
      
      * add to configuration_auto
      
      * docstring now passes formatting test
      
      * fix style
      
      * update READMEs
      
      * add dbrx to modeling_auto
      
      * make fix-copies generated this
      
      * add DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP
      
      * config docstring passes formatting test
      
      * rename moe_loss_weight to router_aux_loss_coef
      
      * add to flash-attn documentation
      
      * fix model-path in tests
      
      * Explicitly make `"suli"` the default `ffn_act_fn`
      Co-authored-by: default avatarWing Lian <wing.lian@gmail.com>
      
      * default to using router_aux_loss_coef over ffn_config[moe_loss_weight]
      
      * fix _flash_attn_uses_top_left_mask and is_causal
      
      * fix tests path
      
      * don't use token type IDs
      
      * follow Llama and remove token_type_ids from test
      
      * init ConfigTester differently so tests pass
      
      * remove multiple choice test
      
      * remove question + answer test
      
      * remove sequence classification test
      
      * remove token classification test
      
      * copy Llama tests and remove token_type_ids from test inputs
      
      * do not test pruning or headmasking; style code
      
      * add _tied_weights_keys parameter to pass test
      
      * add type hints
      
      * fix type check
      
      * update config tester
      
      * remove masked_lm test
      
      * remove encoder tests
      
      * initialize DbrxModelTester with correct params
      
      * style
      
      * torch_dtype does not rely on torch
      
      * run make fixup, fix-copies
      
      * use https://huggingface.co/v2ray/dbrx-base-fixed/blob/main/modeling_dbrx.py
      
      
      
      * add copyright info
      
      * fix imports and DbrxRotaryEmbedding
      
      * update DbrxModel docstring
      
      * use copies
      
      * change model path in docstring
      
      * use config in DbrxFFN
      
      * fix flashattention2, sdpaattention
      
      * input config to DbrXAttention, DbrxNormAttentionNorm
      
      * more fixes
      
      * fix
      
      * fix again!
      
      * add informative comment
      
      * fix ruff?
      
      * remove print statement + style
      
      * change doc-test
      
      * fix doc-test
      
      * fix docstring
      
      * delete commented out text
      
      * make defaults match dbrx-instruct
      
      * replace `router_aux_loss_coef` with `moe_loss_weight`
      
      * is_decoder=True
      
      * remove is_decoder from configtester
      
      * implement sdpa properly
      
      * make is_decoder pass tests
      
      * start on the GenerationTesterMixin tests
      
      * add dbrx to sdpa documentation
      
      * skip weight typing test
      
      * style
      
      * initialize smaller model
      Co-authored-by: default avatarMatt <Rocketknight1@users.noreply.github.com>
      
      * Add DBRX to toctree
      
      * skip test_new_cache_format
      
      * make config defaults smaller again
      
      * add pad_token_id
      
      * remove pad_token_id from config
      
      * Remove all references to DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP
      
      * Update src/transformers/models/dbrx/__init__.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/models/dbrx/modeling_dbrx.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update docs/source/en/model_doc/dbrx.md
      Co-authored-by: default avatarMatt <Rocketknight1@users.noreply.github.com>
      
      * Update src/transformers/models/dbrx/configuration_dbrx.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update docs/source/en/model_doc/dbrx.md
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * fix typo
      
      * Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * update docs, fix configuration_auto.py
      
      * address pr comments
      
      * remove is_decoder flag
      
      * slice
      
      * fix requires grad
      
      * remove grad
      
      * disconnect differently
      
      * remove grad
      
      * enable grads
      
      * patch
      
      * detach expert
      
      * nissan al ghaib
      
      * Update modeling_dbrx.py
      
      * Update src/transformers/models/dbrx/modeling_dbrx.py
      Co-authored-by: default avatarMatt <Rocketknight1@users.noreply.github.com>
      
      * replace "Gemma" with "Dbrx"
      
      * remove # type: ignore
      
      * don't hardcode vocab_size
      
      * remove ToDo
      
      * Re-add removed idefics2 line
      
      * Update test to use tiny-random!
      
      * Remove TODO
      
      * Remove one more case of loading the entire dbrx-instruct in the tests
      
      * Update src/transformers/models/dbrx/modeling_dbrx.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * address some comments
      
      * small model
      
      * add dbrx to tokenization_auto
      
      * More docstrings with add_start_docstrings
      
      * Dbrx for now
      
      * add PipelineTesterMixin
      
      * Update src/transformers/models/dbrx/configuration_dbrx.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * remove flash-attn2 import error
      
      * fix docstring
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * add useage example
      
      * put on one line
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * fix ffn_act_fn
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * change "dbrx" to "DBRX" for display purposes.
      
      * fix __init__.py?
      
      * fix __init__.py
      
      * fix README
      
      * return the aux_loss
      
      * remove extra spaces
      
      * fix configuration_auto.py
      
      * fix format in tokenization_auto
      
      * remove new line
      
      * add more useage examples
      
      ---------
      Co-authored-by: default avatarAbhi Venigalla <abhi.venigalla@databricks.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      Co-authored-by: default avatarEitan Turok <eitan.turok@databricks.com>
      Co-authored-by: default avatarEitan Turok <150733043+eitanturok@users.noreply.github.com>
      Co-authored-by: default avatarWing Lian <wing.lian@gmail.com>
      Co-authored-by: default avatarEitan Turok <eitanturok@gmail.com>
      Co-authored-by: default avatarMatt <Rocketknight1@users.noreply.github.com>
      Co-authored-by: default avatarMatt <rocketknight1@gmail.com>
      Co-authored-by: default avatarYour Name <you@example.com>
      Co-authored-by: default avatarMihir Patel <mihir.v.patel7@gmail.com>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      005b957f
    • tomeras91's avatar
      Add jamba (#29943) · 3f20877d
      tomeras91 authored
      * Add jamba arch
      
      * apply "make fix-copies" changes
      
      * fix link to model in JambaConfig docstring
      
      * Add n_ctx in modeling file because repo-consistency wants that
      
      * Add jamba to flash attention and sdpa documentation
      
      * mamba dt_proj quant fix now works for LoRA as well
      
      * override test_left_padding_compatibility and use a more permissive tolerance. left padding numerical difference are accentuated by mamba layers
      
      * add jamba to tokenization auto
      
      * fix comments of shape (PR #24 in the model page: https://huggingface.co/ai21labs/Jamba-v0.1/discussions/24)
      
      * simple PR fixes
      
      * remove unnecessary kwargs from JambaAttentionDecoderLayer and JambaMambaDecoderLayer
      
      * remove the LoRA hack for the mamba dt_proj bias. It was solved in huggingface/peft#1530 (https://github.com/huggingface/peft/pull/1530)
      
      * Add copied comment on JambaMLP (it's the same as MixtralMLP)
      
      * remove padding_mask warnings. It's not supported anymore
      
      * fix docstring. Float instead of int
      
      * A few more minor PR fixes
      
      * (1) lowercase names for mamba layernorms (2) remove _apply_inner_layernorms and do it directly in the forward pass
      
      * Return None attention weights from mamba layers. Append to all attentions only if not None.
      
      * remove some leftover jamba archive lists
      
      * Better separation between expert vs non-expert layers. non-expert layers return None as router_logits, and it is not concatenated to all_router_logits returned from JambaModel
      
      * no need to take router_logits at config.expert_layer_offset anymore. result.router_logits now holds results only for expert layers
      
      * Add Jamba paper on READMEs
      
      * (1) rename n_ctx -> max_position_embeddings (2) don't use it in the modeling file since it's not needed (set it as an exception to check_config_attributes)
      
      * Add copied from comment
      
      * remove the code path for apply_inner_layernorms=False. Jamba always has the inner mamba layernorms
      
      * clearer docstring for _convert_to_standard_cache
      
      * style fixes
      
      * Change calc_logits_for_entire_prompt (bool) to num_logits_to_keep (int). Adapt assisted decoding code tp use it. Also small change in low memory beam search decoding path to support this new int value in model_inputs
      
      * rename test so it still overrides what its meant to override
      
      * draft
      
      * oups
      
      * nit
      
      * remove more complexe logic
      
      * fix names used in config
      
      * fix fix fix
      
      * style
      
      * fix some more failing tests
      
      * generate did not init the cache 馃檭
      
      
      
      * more small nits
      
      * typo
      
      * config.mamba_expand * config.hidden_size for the intermediate size of the mamba shapes
      
      * fix init of pkv with torch.tensor()
      
      * empty tensor
      
      * fix some init issues
      
      * stupid changes required by generate because it does not even support it's own DynamicCache class
      
      * more fixes
      
      * fix general assisted gen cache_position bug
      
      * tests passing
      
      * Add offsets and periods as SPECIAL_CASES_TO_ALLOW in check_config_attributes.py
      
      * fix reorder_cache to reorder mamba states and override some more functions in HybridMambaAttentionDynamicCache
      
      * no need to override test_past_key_values_format() and _check_past_key_values_for_generate() in tests anymore
      
      * fix docstrings and typehints for past_key_values
      
      * style fixes
      
      * fix docs
      
      * change typehint due to copy from Mixtral
      
      * forgot import
      
      * import order
      
      * Add configuration_jamba and modeling_jamba to not_doctested because the model is too big to download (in docstring of JambaForCausalLM.forward)
      
      * Add integration test with tiny tandom Jamba model on hub
      
      * fix flash attention cache shapes
      
      * bring back forgotten hidden states
      
      * rename HybridMambaAttentionDynamicCache.seqlen_offset to has_previous_state (and make bool) and bugfix - it should be set to True after a finished forward pass of the entire model
      
      * align integration test after modeling fixes
      
      * bugfix - mamba can use precomputed states only of forward pass is on a single token
      
      * bugfix - mamba can use precomputed states only if they match the batch size
      
      * typo
      
      * remove making _prepare_4d_causal_attention_mask a leaf function
      
      * stop using past_seq_len.get_seq_length(). Use cache positions instead. Adjust test (test_decoder_model_past_with_large_inputs) accordingly
      
      ---------
      Co-authored-by: default avatarArthur Zucker <arthur.zucker@gmail.com>
      Co-authored-by: default avatarJoao Gante <joao@huggingface.co>
      3f20877d
    • Alexander Visheratin's avatar
      Add Flash Attention 2 to M2M100 model (#30256) · b65df514
      Alexander Visheratin authored
      
      
      * Added flash attention 2.
      
      * Fixes.
      
      * Fix inheritance.
      
      * Fixed init.
      
      * Remove stuff.
      
      * Added documentation.
      
      * Add FA2 to M2M100 documentation.
      
      * Add test.
      
      * Fixed documentation.
      
      * Update src/transformers/models/m2m_100/modeling_m2m_100.py
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      
      * Update docs/source/en/model_doc/nllb.md
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Fixed variable name.
      
      ---------
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      b65df514