1. 24 Apr, 2024 4 commits
  2. 23 Apr, 2024 3 commits
  3. 22 Apr, 2024 3 commits
  4. 19 Apr, 2024 5 commits
    • Jo茫o David's avatar
      Add TF swiftformer (#23342) · d2cec09b
      Jo茫o David authored
      
      
      * Duplicate swiftformer
      
      * Convert SwiftFormerPatchEmbedding
      
      * Convert SwiftFormerEmbeddings
      
      * Convert TFSwiftFormerMlp
      
      * Convert TFSwiftFormerConvEncoder
      
      * Convert TFSwiftFormerLocalRepresentation
      
      * convert TFSwiftFormerEncoderBlock
      
      * Convert SwiftFormerStage
      
      * Convert SwiftFormerEncoder
      
      * Add TFSWiftFormerPreTrainedModel
      
      * Convert SwiftFormerForImageClassification
      
      * Add kwargs and start drop path
      
      * Fix syntax
      
      * Change Model class name
      
      * Add TFSwiftFormer to __init__
      
      * Duplicate test_modeling_swiftformer
      
      * First test conversions
      
      * Change require_torch to require_tf
      
      * Add exports to swiftformer __init__
      
      * Add TFSwiftFormerModel wrapper
      
      * Fix __init__ and run black
      
      * Remove docstring from MainLayer, fix padding
      
      * Use keras.layers.Activation on keras.Sequential
      
      * Fix swiftformer exports
      
      * Fix activation layer from config
      
      * Remove post_inits
      
      * Use tf.keras.layers.ZeroPadding2D
      
      * Convert torch normalize
      
      * Change tf test input shape
      
      * Fix softmax and reduce_sum
      
      * Convert expand_dims and repeat
      
      * Add missing reshape and tranpose
      
      * Simplify TFSwiftFormerEncoderBlock.call
      
      * Fix mismatch in patch embeddings
      
      * Fix expected output shape to match channels last
      
      * Fix swiftformer typo
      
      * Disable test_onnx
      
      * Fix TFSwiftFormerForImageClassification call
      
      * Add unpack inputs
      
      * Convert flatten(2).mean(-1)
      
      * Change vision dummy inputs (to be reviewed)
      
      * Change test_forward_signature to use .call
      
      * Fix @unpack_inputs
      
      * Set return_tensors="tf" and rename class
      
      * Rename wrongly named patch_embeddings layer
      
      * Add serving_output and change dummy_input shape
      
      * Make dimensions BCHW and transpose inside embedding layer
      
      * Change SwiftFormerEncoderBlock
      
      * Fix ruff problems
      
      * Add image size to swiftformer config
      
      * Change tranpose to MainLayer and use -1 for reshape
      
      * Remove serving_outputs and dummy_inputs
      
      * Remove test_initialization test from tf model
      
      * Make Sequential component a separate layer
      
      * Fix layers' names
      
      * Tranpose encoder outputs
      
      * Fix tests and check if hidden states is not None
      
      * Fix TFSwiftFormerForImageClassification
      
      * Run make fixup
      
      * Run make fix-copies
      
      * Update modeling_tf_auto
      
      * Update docs
      
      * Fix modeling auto mapping
      
      * Update modelint_tf_swiftformer docs
      
      * Fill image_size doc and type
      
      * Add reduction=None to loss computation
      
      * Update docs
      
      * make style
      
      * Debug: Delete the tip to see if that changes anything
      
      * Re-add tip
      
      * Remove add_code_sample_docstrings
      
      * Remove unused import
      
      * Get the debug to actually tell us the problem it has with the docs
      
      * Try a substitution to match the PyTorch file?
      
      * Add swiftformer to ignore list
      
      * Add build() methods
      
      * Update copyright year
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Remove FIXME comment
      
      * Remove from_pt
      
      * Update copyright year
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Rename one-letter variables
      
      * Remove FIXMEs related to momentum
      
      * Remove old TODO comment
      
      * Remove outstanding FIXME comments
      
      * Get dropout rate from config
      
      * Add specific dropout config for MLP
      
      * Add convencoder dropout to config
      
      * Pass config to SwiftFormerDropPath layer
      
      * Fix drop_path variable name and add Adapted from comment
      
      * Run ruff
      
      * Removed copied from comment
      
      * Run fix copies
      
      * Change drop_path to identity to match pt
      
      * Cleanup build() methods and move to new keras imports
      
      * Update docs/source/en/model_doc/swiftformer.md
      Co-authored-by: default avatarMatt <Rocketknight1@users.noreply.github.com>
      
      * Raise error if drop_path_rate > 0.0
      
      * Apply suggestions from code review
      
      Replace (self.dim), with self.dim,
      Co-authored-by: default avatarMatt <Rocketknight1@users.noreply.github.com>
      
      * Remove drop_path function
      
      * Add training to TFSwiftFormerEncoder
      
      * Set self.built = True last
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Should have been added to previous commit
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Apply suggestions from code review
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Change default_feature_extractor to default_image_processor
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Import Keras from modeling_tf_utils
      
      * Remove relative import
      
      * Run ruff --fix
      
      * Move import keras to tf_available
      
      * Add copied from comment to test_forward_signature
      
      * Reduce batch size and num_labels
      
      * Extract loss logic to hf_compute_loss
      
      * Run ruff format
      
      ---------
      Co-authored-by: default avatarMatt <rocketknight1@gmail.com>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      Co-authored-by: default avatarMatt <Rocketknight1@users.noreply.github.com>
      d2cec09b
    • Raushan Turganbay's avatar
      Do not remove half seq length in generation tests (#30016) · b1cd4874
      Raushan Turganbay authored
      
      
      * remove seq length from generation tests
      
      * style and quality
      
      * [test_all] & PR suggestion
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      
      * Update tests/generation/test_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * [test all] remove unused variables
      
      ---------
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      b1cd4874
    • Sanchit Gandhi's avatar
      [Whisper] Fix slow tests (#30152) · 4ed0e51c
      Sanchit Gandhi authored
      
      
      * fix tests
      
      * style
      
      * more fixes
      
      * move model to device
      
      * move logits to cpu
      
      * update expected values
      
      * use ungated dataset
      
      * fix
      
      * fix
      
      * update
      
      ---------
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      4ed0e51c
    • Sanchit Gandhi's avatar
      cd09a8df
    • NielsRogge's avatar
      [UDOP] Add special tokens to tokenizer (#29594) · ecfe9be7
      NielsRogge authored
      * Add special tokens
      
      * Add special tokens
      
      * Use fmt
      
      * Uncomment code
      
      * Add test
      
      * Remove scripts
      
      * Address comments
      
      * Improve tests
      
      * Address comment
      
      * Remove flag
      ecfe9be7
  5. 18 Apr, 2024 4 commits
    • Abhi Venigalla's avatar
      Add DBRX Model (#29921) · 005b957f
      Abhi Venigalla authored
      
      
      * wip
      
      * fix __init__.py
      
      * add docs
      
      * Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * address comments 1
      
      * work on make fixup
      
      * pass configs down
      
      * add sdpa attention
      
      * remove DbrxBlock
      
      * add to configuration_auto
      
      * docstring now passes formatting test
      
      * fix style
      
      * update READMEs
      
      * add dbrx to modeling_auto
      
      * make fix-copies generated this
      
      * add DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP
      
      * config docstring passes formatting test
      
      * rename moe_loss_weight to router_aux_loss_coef
      
      * add to flash-attn documentation
      
      * fix model-path in tests
      
      * Explicitly make `"suli"` the default `ffn_act_fn`
      Co-authored-by: default avatarWing Lian <wing.lian@gmail.com>
      
      * default to using router_aux_loss_coef over ffn_config[moe_loss_weight]
      
      * fix _flash_attn_uses_top_left_mask and is_causal
      
      * fix tests path
      
      * don't use token type IDs
      
      * follow Llama and remove token_type_ids from test
      
      * init ConfigTester differently so tests pass
      
      * remove multiple choice test
      
      * remove question + answer test
      
      * remove sequence classification test
      
      * remove token classification test
      
      * copy Llama tests and remove token_type_ids from test inputs
      
      * do not test pruning or headmasking; style code
      
      * add _tied_weights_keys parameter to pass test
      
      * add type hints
      
      * fix type check
      
      * update config tester
      
      * remove masked_lm test
      
      * remove encoder tests
      
      * initialize DbrxModelTester with correct params
      
      * style
      
      * torch_dtype does not rely on torch
      
      * run make fixup, fix-copies
      
      * use https://huggingface.co/v2ray/dbrx-base-fixed/blob/main/modeling_dbrx.py
      
      
      
      * add copyright info
      
      * fix imports and DbrxRotaryEmbedding
      
      * update DbrxModel docstring
      
      * use copies
      
      * change model path in docstring
      
      * use config in DbrxFFN
      
      * fix flashattention2, sdpaattention
      
      * input config to DbrXAttention, DbrxNormAttentionNorm
      
      * more fixes
      
      * fix
      
      * fix again!
      
      * add informative comment
      
      * fix ruff?
      
      * remove print statement + style
      
      * change doc-test
      
      * fix doc-test
      
      * fix docstring
      
      * delete commented out text
      
      * make defaults match dbrx-instruct
      
      * replace `router_aux_loss_coef` with `moe_loss_weight`
      
      * is_decoder=True
      
      * remove is_decoder from configtester
      
      * implement sdpa properly
      
      * make is_decoder pass tests
      
      * start on the GenerationTesterMixin tests
      
      * add dbrx to sdpa documentation
      
      * skip weight typing test
      
      * style
      
      * initialize smaller model
      Co-authored-by: default avatarMatt <Rocketknight1@users.noreply.github.com>
      
      * Add DBRX to toctree
      
      * skip test_new_cache_format
      
      * make config defaults smaller again
      
      * add pad_token_id
      
      * remove pad_token_id from config
      
      * Remove all references to DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP
      
      * Update src/transformers/models/dbrx/__init__.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/models/dbrx/modeling_dbrx.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update docs/source/en/model_doc/dbrx.md
      Co-authored-by: default avatarMatt <Rocketknight1@users.noreply.github.com>
      
      * Update src/transformers/models/dbrx/configuration_dbrx.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update docs/source/en/model_doc/dbrx.md
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * fix typo
      
      * Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * update docs, fix configuration_auto.py
      
      * address pr comments
      
      * remove is_decoder flag
      
      * slice
      
      * fix requires grad
      
      * remove grad
      
      * disconnect differently
      
      * remove grad
      
      * enable grads
      
      * patch
      
      * detach expert
      
      * nissan al ghaib
      
      * Update modeling_dbrx.py
      
      * Update src/transformers/models/dbrx/modeling_dbrx.py
      Co-authored-by: default avatarMatt <Rocketknight1@users.noreply.github.com>
      
      * replace "Gemma" with "Dbrx"
      
      * remove # type: ignore
      
      * don't hardcode vocab_size
      
      * remove ToDo
      
      * Re-add removed idefics2 line
      
      * Update test to use tiny-random!
      
      * Remove TODO
      
      * Remove one more case of loading the entire dbrx-instruct in the tests
      
      * Update src/transformers/models/dbrx/modeling_dbrx.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * address some comments
      
      * small model
      
      * add dbrx to tokenization_auto
      
      * More docstrings with add_start_docstrings
      
      * Dbrx for now
      
      * add PipelineTesterMixin
      
      * Update src/transformers/models/dbrx/configuration_dbrx.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * remove flash-attn2 import error
      
      * fix docstring
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * add useage example
      
      * put on one line
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * fix ffn_act_fn
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * change "dbrx" to "DBRX" for display purposes.
      
      * fix __init__.py?
      
      * fix __init__.py
      
      * fix README
      
      * return the aux_loss
      
      * remove extra spaces
      
      * fix configuration_auto.py
      
      * fix format in tokenization_auto
      
      * remove new line
      
      * add more useage examples
      
      ---------
      Co-authored-by: default avatarAbhi Venigalla <abhi.venigalla@databricks.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      Co-authored-by: default avatarEitan Turok <eitan.turok@databricks.com>
      Co-authored-by: default avatarEitan Turok <150733043+eitanturok@users.noreply.github.com>
      Co-authored-by: default avatarWing Lian <wing.lian@gmail.com>
      Co-authored-by: default avatarEitan Turok <eitanturok@gmail.com>
      Co-authored-by: default avatarMatt <Rocketknight1@users.noreply.github.com>
      Co-authored-by: default avatarMatt <rocketknight1@gmail.com>
      Co-authored-by: default avatarYour Name <you@example.com>
      Co-authored-by: default avatarMihir Patel <mihir.v.patel7@gmail.com>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      005b957f
    • tomeras91's avatar
      Add jamba (#29943) · 3f20877d
      tomeras91 authored
      * Add jamba arch
      
      * apply "make fix-copies" changes
      
      * fix link to model in JambaConfig docstring
      
      * Add n_ctx in modeling file because repo-consistency wants that
      
      * Add jamba to flash attention and sdpa documentation
      
      * mamba dt_proj quant fix now works for LoRA as well
      
      * override test_left_padding_compatibility and use a more permissive tolerance. left padding numerical difference are accentuated by mamba layers
      
      * add jamba to tokenization auto
      
      * fix comments of shape (PR #24 in the model page: https://huggingface.co/ai21labs/Jamba-v0.1/discussions/24)
      
      * simple PR fixes
      
      * remove unnecessary kwargs from JambaAttentionDecoderLayer and JambaMambaDecoderLayer
      
      * remove the LoRA hack for the mamba dt_proj bias. It was solved in huggingface/peft#1530 (https://github.com/huggingface/peft/pull/1530)
      
      * Add copied comment on JambaMLP (it's the same as MixtralMLP)
      
      * remove padding_mask warnings. It's not supported anymore
      
      * fix docstring. Float instead of int
      
      * A few more minor PR fixes
      
      * (1) lowercase names for mamba layernorms (2) remove _apply_inner_layernorms and do it directly in the forward pass
      
      * Return None attention weights from mamba layers. Append to all attentions only if not None.
      
      * remove some leftover jamba archive lists
      
      * Better separation between expert vs non-expert layers. non-expert layers return None as router_logits, and it is not concatenated to all_router_logits returned from JambaModel
      
      * no need to take router_logits at config.expert_layer_offset anymore. result.router_logits now holds results only for expert layers
      
      * Add Jamba paper on READMEs
      
      * (1) rename n_ctx -> max_position_embeddings (2) don't use it in the modeling file since it's not needed (set it as an exception to check_config_attributes)
      
      * Add copied from comment
      
      * remove the code path for apply_inner_layernorms=False. Jamba always has the inner mamba layernorms
      
      * clearer docstring for _convert_to_standard_cache
      
      * style fixes
      
      * Change calc_logits_for_entire_prompt (bool) to num_logits_to_keep (int). Adapt assisted decoding code tp use it. Also small change in low memory beam search decoding path to support this new int value in model_inputs
      
      * rename test so it still overrides what its meant to override
      
      * draft
      
      * oups
      
      * nit
      
      * remove more complexe logic
      
      * fix names used in config
      
      * fix fix fix
      
      * style
      
      * fix some more failing tests
      
      * generate did not init the cache 馃檭
      
      
      
      * more small nits
      
      * typo
      
      * config.mamba_expand * config.hidden_size for the intermediate size of the mamba shapes
      
      * fix init of pkv with torch.tensor()
      
      * empty tensor
      
      * fix some init issues
      
      * stupid changes required by generate because it does not even support it's own DynamicCache class
      
      * more fixes
      
      * fix general assisted gen cache_position bug
      
      * tests passing
      
      * Add offsets and periods as SPECIAL_CASES_TO_ALLOW in check_config_attributes.py
      
      * fix reorder_cache to reorder mamba states and override some more functions in HybridMambaAttentionDynamicCache
      
      * no need to override test_past_key_values_format() and _check_past_key_values_for_generate() in tests anymore
      
      * fix docstrings and typehints for past_key_values
      
      * style fixes
      
      * fix docs
      
      * change typehint due to copy from Mixtral
      
      * forgot import
      
      * import order
      
      * Add configuration_jamba and modeling_jamba to not_doctested because the model is too big to download (in docstring of JambaForCausalLM.forward)
      
      * Add integration test with tiny tandom Jamba model on hub
      
      * fix flash attention cache shapes
      
      * bring back forgotten hidden states
      
      * rename HybridMambaAttentionDynamicCache.seqlen_offset to has_previous_state (and make bool) and bugfix - it should be set to True after a finished forward pass of the entire model
      
      * align integration test after modeling fixes
      
      * bugfix - mamba can use precomputed states only of forward pass is on a single token
      
      * bugfix - mamba can use precomputed states only if they match the batch size
      
      * typo
      
      * remove making _prepare_4d_causal_attention_mask a leaf function
      
      * stop using past_seq_len.get_seq_length(). Use cache positions instead. Adjust test (test_decoder_model_past_with_large_inputs) accordingly
      
      ---------
      Co-authored-by: default avatarArthur Zucker <arthur.zucker@gmail.com>
      Co-authored-by: default avatarJoao Gante <joao@huggingface.co>
      3f20877d
    • Pavel Iakubovskii's avatar
      Fix donut token2json multiline (#30300) · 7915a259
      Pavel Iakubovskii authored
      * Fix multiline processing
      
      * Update test for token2json
      7915a259
    • Alexander Visheratin's avatar
      Add Flash Attention 2 to M2M100 model (#30256) · b65df514
      Alexander Visheratin authored
      
      
      * Added flash attention 2.
      
      * Fixes.
      
      * Fix inheritance.
      
      * Fixed init.
      
      * Remove stuff.
      
      * Added documentation.
      
      * Add FA2 to M2M100 documentation.
      
      * Add test.
      
      * Fixed documentation.
      
      * Update src/transformers/models/m2m_100/modeling_m2m_100.py
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      
      * Update docs/source/en/model_doc/nllb.md
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Fixed variable name.
      
      ---------
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      b65df514
  6. 17 Apr, 2024 3 commits
    • Shane A's avatar
      Add OLMo model family (#29890) · e4ea19b9
      Shane A authored
      * Add OLMo using add-new-model-like with Llama
      
      * Fix incorrect tokenizer for OLMo
      
      * Copy-paste relevant OLMo methods and their imports
      
      * Add OLMo config
      
      * Modify OLMo config to follow HF conventions
      
      * Remove unneeded Llama code from OLMo model
      
      * Add ability for OLMo model to output attentions
      
      * Add OLMoPreTrainedModel and OLMoModel
      
      * Add OLMoForCausalLM
      
      * Minor fixes to OLMo model for style and missing functions
      
      * Implement OLMo tokenizer
      
      * Implement OLMo to HF conversion script
      
      * Add tests for OLMo model
      
      * Add tests for OLMo fast tokenizer
      
      * Add auto-generated dummy objects
      
      * Remove unimplemented OLMo classes from auto and init classes and re-format
      
      * Add README and associated auto-generated files
      
      * Use OLMo names for common properties
      
      * Run make fixup
      
      * Remove `|` from OLMo typing
      
      * Remove unneeded tokenization_olmo.py
      
      * Revert model, config and converter to add-new-model-like Llama
      
      * Move logic for adding bos/eos token into GPTNeoxTokenizerFast
      
      * Change OLMoConfig defaults to match OLMo-7B
      
      * Use GPTNeoXToknizerFast in OLMo tokenizer tests
      
      * Modify auto-generated OLMoModelTests to work for OLMo
      
      * Add non-parametric layer norm OLMoLayerNorm
      
      * Update weight conversion script for OLMo
      
      * Fix __init__ and auto structure for OLMo
      
      * Fix errors from make fixup
      
      * Remove OLMoTokenizerFast from documentation
      
      * Add missing 'Copied from' for OLMoModel._update_causal_mask
      
      * Run make fix-copies
      
      * Rearrange string replacements in OLMoForCausalLM Copied from
      
      * Move OLMo and Llama CausalLM.forward example into global constants
      
      * Fix OLMO_GENERATION_EXAMPLE doc string typo
      
      * Add option for qkv clipping to OLMo
      
      * Rearrange OLMoConfig kwargs in convert_olmo_weights_to_hf
      
      * Add clip_qkv to OLMoConfig in convert_olmo_weights_to_hf
      
      * Fix OLMo tokenization bug using conversion script
      
      * Keep model in full precision after conversion
      
      * Do not add eos token automatically
      
      * Update references to OLMo model in HF Hub
      
      * Do not add eos token during encoding by default
      
      * Fix Llama generation example
      
      * Run make fixup
      
      * OLMo 7B integration test fix
      
      * Remove unneeded special case for OLMoConfig
      
      * OLMo 7B Twin 2T integration test fix
      
      * Fix test_model_7b_greedy_generation
      
      * Remove test_compile_static_cache
      
      * Fix OLMo and Llama generation example
      
      * Run make fixup
      
      * Revert "OLMo 7B integration test fix"
      
      This reverts commit 4df56a4b150681bfa559846f40e9b7b7f97d7908.
      
      * Revert "OLMo 7B Twin 2T integration test fix"
      
      This reverts commit 9ff65a4a294ace89ab047b793ca55e623a9ceefc.
      
      * Ungate 7B integration tests and fix greedy generation test
      
      * Add retries for flaky test_eager_matches_sdpa_generate
      
      * Fix output of doc example for OLMoForCausalLM.forward
      
      * Downsize OLMo doc test for OLMoForCausalLM.forward to 1B model
      
      * Try fix incorrect characters in OLMoForCausalLM.forward doct test
      
      * Try fix incorrect characters in OLMoForCausalLM.forward doc test using end quotes
      
      * Remove pretraining_tp from OLMo config and model
      
      * Add missing 'Copied from' instances
      
      * Remove unneeded causal_mask from OLMoModel
      
      * Revert Llama changes
      
      * Ignore copy for OLMoForCausalLM.forward
      
      * Change 'OLMo' to 'Olmo' in classes
      
      * Move minimal OLMo tokenization tests to model tests
      
      * Add missed 'Copied from' for repeat_kv
      e4ea19b9
    • st81's avatar
      Add token type ids to CodeGenTokenizer (#29265) · 8d6b5096
      st81 authored
      * Add create token type ids to CodeGenTokenizer
      
      * Fix inconsistent length of token type ids
      
      * Format source codes
      
      * Fix inconsistent order of methods
      
      * Update docstring
      
      * add test_tokenizer_integration test
      
      * Format source codes
      
      * Add `copied from` comment to CodeGenTokenizerFast
      
      * Add doc of create_token_type_ids_from_sequences
      
      * Make return_token_type_ids False by default
      
      * Make test_tokenizer_integration as slow test
      
      * Add return_token_type_ids to tokenizer init arg
      
      * Add test for tokenizer's init return_token_type_ids
      
      * Format source codes
      8d6b5096
    • Raushan Turganbay's avatar
      Enable fx tracing for Mistral (#30209) · 304c6a1e
      Raushan Turganbay authored
      * tracing for mistral
      
      * typo
      
      * fix copies
      304c6a1e
  7. 16 Apr, 2024 1 commit
  8. 15 Apr, 2024 3 commits
    • amyeroberts's avatar
      Add Idefics2 (#30253) · 6b78360e
      amyeroberts authored
      
      
      * Initial add model additions
      
      * Test
      
      * All weights loading
      
      * Can perform full forward pass
      
      * Local and remote the same
      
      * Matching local and remote
      
      * Fixup
      
      * Idefics2Model importable; fixup docstrings
      
      * Don't skip by default
      
      * Remove deprecated use_resampler arg
      
      * Remove self.config
      
      * DecoupledLinear takes config
      
      * Tidy up
      
      * Enable eager attention and tidy up
      
      * Most tests passing
      
      * Update for batch of processed images
      
      * Add image processor
      
      * Update doc pages
      
      * Update conversion script
      
      * Remove erroneous breakpoint
      
      * Remove accidendtal spelling change
      
      * Update to reflect changes on hub - make generate work
      
      * Fix up
      
      * Image processor tests
      
      * Update tests
      
      * Add a processor
      
      * Add a processor
      
      * Update convert script
      
      * Update modeling file - remove fixmes
      
      * Bug fix
      
      * Add processing test
      
      * Use processor
      
      * Fix up
      
      * Update src/transformers/models/idefics2/modeling_idefics2.py
      Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
      
      * Update src/transformers/models/idefics2/modeling_idefics2.py
      Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
      
      * Fix test
      
      * Update config - PR comments and defaults align with checkpoint
      
      * Reviewer comments
      
      * Add copied froms for flahs attention
      
      * Update src/transformers/models/idefics2/modeling_idefics2.py
      Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
      
      * Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Remove qk_layer_norm and freeze_layers functionality
      
      * Fix
      
      * Remove freeze_layer options from config
      
      * Sync with upstream main
      
      * Fix attention shapes siglip
      
      * Remove Llava-next refs - TO REBASE
      
      * Use AutoModel for text model
      
      * Add comment to explain vision embeddings
      
      * Fix issue with tie_word_embeddings
      
      * Address review comments
      
      * Fix and fix up
      
      * Chat templates for idefics
      
      * Fix copies
      
      * Fix
      
      * Add layer norms to FA2
      
      * Fix tests
      
      * Apply suggestions from code review
      Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
      
      * Fix
      
      * Review comments
      
      * Update src/transformers/models/idefics2/modeling_idefics2.py
      Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
      
      * Update inputs merger
      
      * Merge weights in correct order
      
      * Update convert script
      
      * Update src/transformers/models/idefics2/processing_idefics2.py
      Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
      
      * Update template
      
      * Model code examples (fix idefics too)
      
      * More review comments
      
      * Tidy up
      
      * Update processing
      
      * Fix attention mask preparation
      
      * Update inputs_merger inputs
      
      * Vectorize inputs_merger
      
      * Update src/transformers/models/idefics2/__init__.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/models/idefics2/modeling_idefics2.py
      
      * Review comments
      
      * saying bye to the `qk_layer_norms`
      
      * Simplify
      
      * Update latents
      
      * Remove erroneuous readme changes
      
      * Return images when applying chat template
      
      * Fix bug - prompt images are for a single sample
      
      * Update src/transformers/models/idefics2/modeling_idefics2.py
      
      * image splitting
      
      * fix test
      
      * some more comment
      
      * some comment
      
      * Apply suggestions from code review
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update src/transformers/models/idefics2/image_processing_idefics2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update processor
      
      * Update model tests
      
      * Update src/transformers/models/idefics2/processing_idefics2.py
      Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
      
      * Update src/transformers/models/idefics2/processing_idefics2.py
      Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
      
      * Don't add BOS in template
      
      * Update src/transformers/models/idefics2/processing_idefics2.py
      Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
      
      * Remove index in examples
      
      * Update tests to reflect #13
      
      * Update src/transformers/models/idefics2/processing_idefics2.py
      Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
      
      * PR comment - consistent typing
      
      * Update readme and model doc
      
      * Update docs
      
      * Update checkpoint references
      
      * Update examples
      
      * Fix and update tests
      
      * Small addition
      
      * Update tests - remove copied from as no ignore placement copy could be found
      
      * Update example
      
      * small fixes
      
      * Update docs/source/en/model_doc/idefics2.md
      Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
      
      * Update docs/source/en/model_doc/idefics2.md
      Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
      
      * Update README.md
      Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
      
      * Connector model as bridge
      
      * Fix up
      
      * Fix up
      
      * Don't pass model inputs for generation kwargs update
      
      * IDEFICS-2 -> Idefics2
      
      * Remove config archive name
      
      * IDEFICS-2 -> Idefics2
      
      * Add back llava-next
      
      * Update readmes
      
      * Add requirements for processor tester
      
      * Use custom convert_to_rgb to avoid possible BC
      
      * Fix doc example
      
      * Fix doc example
      
      * Skip model doc tests - as model to large
      
      * More doc example - account for image splitting
      
      * Update src/transformers/image_transforms.py
      
      * Fix config doctest
      
      ---------
      Co-authored-by: default avatarPablo Montalvo <39954772+molbap@users.noreply.github.com>
      Co-authored-by: default avatarArthurZucker <arthur.zucker@gmail.com>
      Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      6b78360e
    • Fanli Lin's avatar
      [tests] add the missing `require_torch_multi_gpu` flag (#30250) · 667939a2
      Fanli Lin authored
      add gpu flag
      667939a2
    • Sai-Suraj-27's avatar
      fix: Replace deprecated `assertEquals` with `assertEqual` (#30241) · 06b11927
      Sai-Suraj-27 authored
      Replace deprecated assertEquals with assertEqual.
      06b11927
  9. 12 Apr, 2024 1 commit
  10. 11 Apr, 2024 3 commits
    • NielsRogge's avatar
      Update output of SuperPointForKeypointDetection (#29809) · 5569552c
      NielsRogge authored
      * Remove auto class
      
      * Update ImagePointDescriptionOutput
      
      * Update model outputs
      
      * Rename output class
      
      * Revert "Remove auto class"
      
      This reverts commit ed4a8f549d79cdb0cdf7aa74205a185c41471519.
      
      * Address comments
      5569552c
    • lewtun's avatar
      Fix Llava chat template examples (#30130) · fbdb978e
      lewtun authored
      fbdb978e
    • Eduardo Pacheco's avatar
      Adding grounding dino (#26087) · b752ad30
      Eduardo Pacheco authored
      
      
      * Fixed typo when converting weigths to GroundingDINO vision backbone
      
      * Final modifications on modeling
      
      * Removed unnecessary class
      
      * Fixed convert structure
      
      * Added image processing
      
      * make fixup partially completed
      
      * Now text_backbone_config has its own class
      
      * Modified convert script
      
      * Removed unnecessary config attribute
      
      * Added new function to generate sub sentence mask
      
      * Renamed parameters with gamma in the name as it's currently not allowed
      
      * Removed tokenization and image_processing scripts since we'll map from existing models
      
      * Fixed some issues with configuration
      
      * Just some modifications on conversion script
      
      * Other modifications
      
      * Copied deformable detr
      
      * First commit
      
      * Added bert to model
      
      * Bert validated
      
      * Created Text and Fusion layers for Encoder
      
      * Adapted Encoder layer
      
      * Fixed typos
      
      * Adjusted Encoder
      
      * Converted encoder to hf
      
      * Modified Decoder Layer
      
      * Modified main decoder class
      
      * Removed copy comments
      
      * Fixed forward from GroundingDINOModel and GroundingDINODecoder
      
      * Added all necessary layers, configurations and forward logic up to GroundingDINOModel
      
      * Added all layers to convertion
      
      * Fixed outputs for GroundingDINOModel and GroundingDINOForObjectDetection
      
      * Fixed mask input to encoders and fixed nn.MultiheadAttention batch first and attn output
      
      * Fixed forward from GroundingDINOTextEnhancerLayer
      
      * Fixed output bug with GroundingDINODeformableLayer
      
      * Fixed bugs that prevent GroundingDINOForObjectDetection to run forward method
      
      * Fixed attentions to be passed correctly
      
      * Passing temperature arg when creating Sine position embedding
      
      * Removed copy comments
      
      * Added temperature argument for position embedding
      
      * Fixed typo when converting weigths to GroundingDINO vision backbone
      
      * Final modifications on modeling
      
      * Removed unnecessary class
      
      * Fixed convert structure
      
      * Added image processing
      
      * make fixup partially completed
      
      * Now text_backbone_config has its own class
      
      * Modified convert script
      
      * Removed unnecessary config attribute
      
      * Added new function to generate sub sentence mask
      
      * Renamed parameters with gamma in the name as it's currently not allowed
      
      * Removed tokenization and image_processing scripts since we'll map from existing models
      
      * Fixed some issues with configuration
      
      * Just some modifications on conversion script
      
      * Other modifications
      
      * Fix style
      
      * Improve fixup
      
      * Improve conversion script
      
      * Improve conversion script
      
      * Add GroundingDINOProcessor
      
      * More improvements
      
      * Return token type ids
      
      * something
      
      * Fix more tests
      
      * More improvements
      
      * More cleanup
      
      * More improvements
      
      * Fixed tests, improved modeling and config
      
      * More improvements and fixing tests
      
      * Improved tests and modeling
      
      * Improved tests and added image processor
      
      * Improved tests inference
      
      * More improvements
      
      * More test improvements
      
      * Fixed last test
      
      * Improved docstrings and comments
      
      * Fix style
      
      * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
      Co-authored-by: default avatarRafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>
      
      * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
      Co-authored-by: default avatarRafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>
      
      * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
      Co-authored-by: default avatarRafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>
      
      * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
      Co-authored-by: default avatarRafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>
      
      * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
      Co-authored-by: default avatarRafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>
      
      * Better naming
      
      * Better naming
      
      * Added Copied statement
      
      * Added Copied statement
      
      * Moved param init from GroundingDINOBiMultiHeadAttention
      
      * Better naming
      
      * Fixing clamp style
      
      * Better naming
      
      * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      
      * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      
      * Update src/transformers/models/grounding_dino/configuration_grounding_dino.py
      Co-authored-by: default avatarRafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>
      
      * Update src/transformers/models/grounding_dino/convert_grounding_dino_to_hf.py
      Co-authored-by: default avatarRafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>
      
      * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
      Co-authored-by: default avatarRafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>
      
      * Improving conversion script
      
      * Improved config
      
      * Improved naming
      
      * Improved naming again
      
      * Improved grouding-dino.md
      
      * Moved grounding dino to multimodal
      
      * Update src/transformers/models/grounding_dino/convert_grounding_dino_to_hf.py
      Co-authored-by: default avatarRafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>
      
      * Fixed docstrings and style
      
      * Fix docstrings
      
      * Remove timm attributes
      
      * Reorder imports
      
      * More improvements
      
      * Add Grounding DINO to pipeline
      
      * Remove model from check_repo
      
      * Added grounded post_process to GroundingDINOProcessor
      
      * Fixed style
      
      * Fixed GroundingDINOTextPrenetConfig docstrings
      
      * Aligned inputs.keys() when both image and text are passed with model_input_names
      
      * Added tests for GroundingDINOImageProcessor and GroundingDINOProcessor
      
      * Testing post_process_grounded_object_detection from GroundingDINOProcessor at test_inference_object_detection_head
      
      * Fixed order
      
      * Marked test with require_torch
      
      * Temporarily changed repo_id
      
      * More improvements
      
      * Fix style
      
      * Final improvements
      
      * Improve annotators
      
      * Fix style
      
      * Add is_torch_available
      
      * Remove type hints
      
      * vocab_tokens as one liner
      
      * Removed print statements
      
      * Renamed GroundingDINOTextPrenetConfig to GroundingDINOTextConfig
      
      * remove unnecessary comments
      
      * Removed unnecessary tests on conversion script
      
      * Renamed GroundingDINO to camel case GroundingDino
      
      * Fixed GroundingDinoProcessor docstrings
      
      * loading MSDA kernels in the modeling file
      
      * Fix copies
      
      * Replace nn.multiheadattention
      
      * Replace nn.multiheadattention
      
      * Fixed inputs for GroundingDinoMultiheadAttention & order of modules
      
      * Fixed processing to avoid messing with inputs
      
      * Added more tips for GroundingDino
      
      * Make style
      
      * Chaning name to align with SAM
      
      * Replace final nn.multiheadattention
      
      * Fix model tests
      
      * Update year, remove GenerationTesterMixin
      
      * Address comments
      
      * Address more comments
      
      * Rename TextPrenet to TextModel
      
      * Rename hidden_states
      
      * Address more comments
      
      * Address more comments
      
      * Address comment
      
      * Address more comments
      
      * Address merge
      
      * Address comment
      
      * Address comment
      
      * Address comment
      
      * Make style
      
      * Added layer norm eps to layer norms
      
      * Address more comments
      
      * More fixes
      
      * Fixed equivalence
      
      * Make fixup
      
      * Remove print statements
      
      * Address comments
      
      * Address comments
      
      * Address comments
      
      * Address comments
      
      * Address comments
      
      * Address comments
      
      * Add comment
      
      * Address comment
      
      * Remove overwriting of test
      
      * Fix bbox_embed
      
      * Improve decoder_bbox_embed_share
      
      * Simplify outputs
      
      * Updated post_process_grounded_object_detection
      
      * Renamed sources to feature_maps
      
      * Improved tests for Grounding Dino ImageProcessor and Processor
      
      * Fixed test requirements and imports
      
      * Fixed image_processing
      
      * Fixed processor tests
      
      * Fixed imports for image processing tests
      
      * Fix copies
      
      * Updated modeling
      
      * Fix style
      
      * Moved functions to correct position
      
      * Fixed copy issues
      
      * Update src/transformers/models/deformable_detr/modeling_deformable_detr.py
      Co-authored-by: default avatarSangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>
      
      * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
      Co-authored-by: default avatarSangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>
      
      * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
      Co-authored-by: default avatarSangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>
      
      * Keeping consistency custom cuda kernels for MSDA
      
      * Make GroundingDinoProcessor logic clearer
      
      * Updated Grounding DINO checkpoints
      
      * Changed tests to correct structure
      
      * Updated gpu-cpu equivalence test
      
      * fix copies
      
      * Update src/transformers/models/grounding_dino/processing_grounding_dino.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update src/transformers/models/grounding_dino/processing_grounding_dino.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update src/transformers/models/grounding_dino/configuration_grounding_dino.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Fixed erros and style
      
      * Fix copies
      
      * Removed inheritance from PreTrainedModel from GroundingDinoTextModel
      
      * Fixed GroundingDinoTextModel
      
      * Fixed type of default backbone config
      
      * Fixed missing methods for GroundingDinoTextModel and Added timm support for GroundingDinoConvEncoder
      
      * Addressed comments
      
      * Addressed batched image processing tests
      
      * Addressed zero shot test comment
      
      * Addressed tip comment
      
      * Removed GroundingDinoTextModel from check_repo
      
      * Removed inplace masking
      
      * Addressed comments
      
      * Addressed comments
      
      * Addressed comments
      
      * Fix copies
      
      * Fixing timm test
      
      * Fixed batching equivalence test
      
      * Update docs/source/en/model_doc/grounding-dino.md
      Co-authored-by: default avatarTianqi Xu <40522713+dandansamax@users.noreply.github.com>
      
      * Update docs/source/en/model_doc/grounding-dino.md
      Co-authored-by: default avatarTianqi Xu <40522713+dandansamax@users.noreply.github.com>
      
      * Update docs/source/en/model_doc/grounding-dino.md
      Co-authored-by: default avatarTianqi Xu <40522713+dandansamax@users.noreply.github.com>
      
      * Addressed more comments
      
      * Added a new comment
      
      * Reduced image size
      
      * Addressed more comments
      
      * Nits
      
      * Nits
      
      * Changed the way text_config is initialized
      
      * Update src/transformers/models/grounding_dino/processing_grounding_dino.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      ---------
      Co-authored-by: default avatarNiels <niels.rogge1@gmail.com>
      Co-authored-by: default avatarRafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      Co-authored-by: default avatarEduardo Pacheco <eduardo.pacheco@limehome.com>
      Co-authored-by: default avatarSangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      Co-authored-by: default avatarTianqi Xu <40522713+dandansamax@users.noreply.github.com>
      b752ad30
  11. 10 Apr, 2024 3 commits
    • Arthur's avatar
      Add recurrent gemma (#30143) · 0fe44059
      Arthur authored
      
      
      * Fork.
      
      * RecurrentGemma initial commit.
      
      * Updating __init__.py.
      
      * Minor modification to how we initialize the cache.
      Changing how the config specifies the architecture.
      
      * Reformat code to 4 spaces.
      Fixed a few typos.
      
      * Fixed the forward pass.
      Still unclear on the cache?
      
      * Fixed the RecurrentGemmaForCausalLM
      
      * Minor comment that we might not need attention_mask and output_attention arguments.
      
      * Now cache should work as well.
      
      * Adding a temporary example to check whether the model generation works.
      
      * Adding the tests and updating imports.
      
      * Adding the example file missing in the previous commit.
      
      * First working example.
      
      * Removing .gitignore and reverting parts of __init__.
      
      * Re-add .gitignore.
      
      * Addressing comments for configuration.
      
      * Move mask creation to `_prepare_inputs_for_generation`.
      
      * First try at integration tests:
      1. AttributeError: 'GriffinCausalLMOutput' object has no attribute 'attentions'.
      2. `cache_position` not passed
      
      * Transfoering between machines.
      
      * Running normal tests.
      
      * Minor fix.
      
      * More fixes.
      
      * Addressing more comments.
      
      * Minor fixes.
      
      * first stab at cleanup
      
      * more refactoring
      
      * fix copies and else
      
      * renaming and get init to work
      
      * fix causal mask creation
      
      * update
      
      * nit
      
      * fix a hell lot of things
      
      * updates
      
      * update conversion script
      
      * make all keys importable
      
      * nits
      
      * add auto mappings
      
      * properly convert ffw_up and down
      
      * add scaling
      
      * fix generations
      
      * for recurrent dtype
      
      * update
      
      * fix going beyong window
      
      * fixup
      
      * add missing files
      
      * current updates to remove last einops
      
      * finish modeling refactor
      
      * TADA
      
      * fix compile
      
      * fix most failing testt ? ?
      
      * update tests
      
      * refactor and update
      
      * update
      
      * nits, fixup and update tests
      
      * more fixup
      
      * nits
      
      * fix imports
      
      * test format
      
      * fixups
      
      * nits
      
      * tuple typing
      
      * fix code quality
      
      * add model card
      
      * fix doc
      
      * skip most generation tests
      
      * nits
      
      * style
      
      * doc fixes
      
      * fix pr and check_copies?
      
      * last nit
      
      * oupsy
      
      * Apply suggestions from code review
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      
      * update
      
      * Update src/transformers/models/recurrent_gemma/convert_recurrent_gemma_to_hf.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * update based on review
      
      * doc nit
      
      * fix quality
      
      * quality
      
      * fix slow test model path
      
      * update default dype
      
      * ignore attributes that can be safely ignored in check config attributes
      
      * 0lallalala come on
      
      * save nit
      
      * style
      
      * remove to dict update
      
      * make sure we can also run in float16
      
      * style
      
      ---------
      Co-authored-by: default avatarPablo Montalvo <39954772+molbap@users.noreply.github.com>
      Co-authored-by: default avatarAleksandar Botev <botev@google.com>
      Co-authored-by: default avatarLeonard Berrada <lberrada@users.noreply.github.com>
      Co-authored-by: default avataranushanf <anushanf@google.com>
      Co-authored-by: default avatarbotev <botevmg@gmail.com>
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      0fe44059
    • NielsRogge's avatar
      [UDOP] Fix tests (#29573) · 50c1c19f
      NielsRogge authored
      * Fix tests
      
      * Fix tests
      
      * Remove no_split_modules
      50c1c19f
    • Fanli Lin's avatar
      [tests] make 2 tests device-agnostic (#30008) · 18546378
      Fanli Lin authored
      add torch device
      18546378
  12. 09 Apr, 2024 1 commit
  13. 08 Apr, 2024 4 commits
  14. 05 Apr, 2024 2 commits