1. 05 Jun, 2024 1 commit
    • Cyril Vallez's avatar
      Reduce by 2 the memory requirement in `generate()` 馃敟馃敟馃敟 (#30536) · bd5091df
      Cyril Vallez authored
      * Fix contrastive_search for new cache structure, and improve performance by removing inneficient torch.stack(torch.split(x, top_k, dim=0))
      
      * Fix _contrastive_search for non-standard cache using ellipsis slicing
      
      * Fix all outputs.logits memory leaks for all decoding strategies!
      
      * Fix small error in _contrastive_search()
      
      * Make all necessary change and revert for the new class
      
      * Apply coding style
      
      * Remove pipes in type hints for compatibility
      
      * correct type hint
      
      * apply style
      
      * Use DynamicCache by default and solve conflicts
      
      * Fix rebase issues
      
      * Add `_supports_dynamic_cache_class` in models for models that support DynamicCache but not other caches to make DynamicCache the default for more models
      
      * Create generation config to return legacy format by default, or to choose not to
      
      * style
      
      * Fix case when use_cache is False
      
      * Remove default DynamicCache in assiste_decoding if assistant_model does not support it + fix _seen_tokens when cropping cache
      
      * Update prepare_inputs_for_generation() for case with empty DynamicCache
      
      * Correct return of args in _assisted_decoding
      
      * Remove EfficientDynamicCache as it is no longer needed
      
      * Correct mistake in generation config
      
      * Move cache logic of assisted decoding to AssistedCandidateGenerator.__init__
      
      * change DynamicCache function names from "split" to "batch_split" for readability + apply coding style
      
      * Remove `_supports_dynamic_cache_class` attribute after rebase
      
      * Correct missing line lost in conflict resolution during rebasing
      
      * Add special case for Jamba
      
      * Fix jamba test
      
      * Coding style
      
      * coding style
      
      * Correct missing import in rebasing
      
      * Simplify _validate_model_kwargs based on removal of _supports_dynamic_cache attribute
      
      * Simplify code paths in _contrastive_search
      
      * coding style
      
      * Update docstrings of cache methods
      
      * Update prepare_inputs_for_generation() -> past_key_values are always Cache objects
      bd5091df
  2. 03 Jun, 2024 1 commit
  3. 31 May, 2024 1 commit
    • Arthur's avatar
      Diff converter v2 (#30868) · 96eb0628
      Arthur authored
      * current working example!
      
      * commit regex and result file
      
      * update
      
      * nit
      
      * push the conversion file
      
      * oups
      
      * roadmap and nits
      
      * attempt diffs for 3 files
      
      * persimmon
      
      * nit
      
      * add diff file that is the same as the modeling_llama.py
      
      * fix rope nits
      
      * updates
      
      * updates with converted versions
      
      * give some breathing space to the code
      
      * delete
      
      * update
      
      * update
      
      * push the actual result
      
      * update regex patterns
      
      * update regex patterns
      
      * fix some issues
      
      * fix some issues
      
      * fix some issues
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * revert changes done to llama
      
      * updates
      
      * update gemma
      
      * updates
      
      * oups
      
      * current state
      
      * current state
      
      * update
      
      * ouiiii
      
      * nit
      
      * clear diffs
      
      * nit
      
      * fixup
      
      * update
      
      * doc 馃殌
      
      * 馃敟
      
      * for now use gemma
      
      * deal with comments
      
      * style
      
      * handle funtions
      
      * deal with assigns
      
      * todos
      
      * process inheritage
      
      * keep decorators?
      
      * 馃
      
      * deal with duplicates
      
      * fixup
      
      * correctly remove duplicate code
      
      * run ruff post script
      
      * ruff deals pretty well with imports, let's leave it to him
      
      * ah maybe not lol
      
      * for now remove all imports from child.
      
      * nit
      
      * conversion of llama
      
      * okay
      
      * convert starcoder2
      
      * synch with main
      
      * update llama diff
      
      * updates
      
      * https://docs.astral.sh/ruff/rules/redefined-while-unused/
      
       fixes the imports, bit needs later version of ruff
      
      * updates
      
      * okay actual state
      
      * non zero exit
      
      * update!
      
      * revert unrelated
      
      * remove other diff files
      
      * updates
      
      * cleanup
      
      * update
      
      * less diff!
      
      * stash
      
      * current updates
      
      * updates
      
      * No need for call
      
      * finished fining deps
      
      * update
      
      * current changes
      
      * current state
      
      * current state
      
      * new status
      
      * nit
      
      * finally
      
      * fixes
      
      * nits
      
      * order is now expected
      
      * use logger info instead of prints
      
      * fixup
      
      * up
      
      * nit
      
      * update
      
      * nits
      
      * update
      
      * correct merge
      
      * update
      
      * update
      
      * update
      
      * add warning
      
      * update caution message
      
      * update
      
      * better merging strategy
      
      * copy class statements :wink
      
      * fixups
      
      * nits
      
      * update
      
      * Apply suggestions from code review
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * nits
      
      * smaller header
      
      * do cleanup some stuff
      
      * even simpler header?
      
      * fixup
      
      * updates
      
      * ruff
      
      * update examples
      
      * nit
      
      * TODO
      
      * state
      
      * OUUUUUUF
      
      * current state
      
      * nits
      
      * final state
      
      * add a readme
      
      * fixup
      
      * remove diff llama
      
      * fix
      
      * nit
      
      * dummy noy funny
      
      * ruff format tests src utils --check
      
      * everless diffs
      
      * less diffs and fix test
      
      * fixes
      
      * naming nit?
      
      * update converter and add supper example
      
      * nits
      
      * updated for function signatures
      
      * update
      
      * update
      
      * add converted dummies
      
      * autoformat
      
      * single target assign fix
      
      * fixup
      
      * fix some imports
      
      * fixes
      
      * don't push them
      
      * `# noqa: F841`
      
      ---------
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      96eb0628
  4. 23 May, 2024 1 commit
  5. 22 May, 2024 1 commit
  6. 20 May, 2024 3 commits
    • Longjie Zheng's avatar
      Add torch.compile for Mistral (#30642) · 616bb11d
      Longjie Zheng authored
      * first version
      
      * fix sliding window
      
      * fix style
      
      * add sliding window cache
      
      * fix style
      
      * address comments
      
      * fix test
      
      * fix style
      
      * move sliding window check inside cache init
      
      * revert changes on irrelevant files & add comment on SlidingWindowCache
      
      * address comments & fix style
      
      fix style
      
      * update causal mask
      
      * [run-slow] mistral
      
      * [run-slow] mistral
      
      * [run-slow] mistral
      
      * [run-slow] mistral
      
      * [run-slow] mistral
      
      * [run-slow] llama
      
      * [run-slow] mistral
      
      * [run-slow] mistral
      
      * [run-slow] mistral
      
      * revert CI from a10 to t4
      
      * wrap up
      616bb11d
    • Benjamin Warner's avatar
      Add support for torch.compile dynamic shapes (#30560) · cd6bd0af
      Benjamin Warner authored
      * add torch.compile dynamic support
      
      * Add SDPA dynamic shapes compile test & improve SDPA comment
      
      * comment consistency
      cd6bd0af
    • Joseph Enguehard's avatar
      Add TokenClassification for Mistral, Mixtral and Qwen2 (#29878) · 07bf2dff
      Joseph Enguehard authored
      
      
      * Add MistralForTokenClassification
      
      * Add tests and docs
      
      * Add token classification for Mixtral and Qwen2
      
      * Save llma for token classification draft
      
      * Add token classification support for Llama, Gemma, Persimmon, StableLm and StarCoder2
      
      * Formatting
      
      * Add token classification support for Qwen2Moe model
      
      * Add dropout layer to each ForTokenClassification model
      
      * Add copied from in tests
      
      * Update src/transformers/models/llama/modeling_llama.py
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      
      * Propagate suggested changes
      
      * Style
      
      ---------
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      07bf2dff
  7. 17 May, 2024 1 commit
    • amyeroberts's avatar
      Remove deprecated logic and warnings (#30743) · 57c965a8
      amyeroberts authored
      * Remove deprecated logic and warnings
      
      * Add back some code that seems to be important...
      
      * Let's just add all he nllb stuff back; removing it is a bit more involved
      
      * Remove kwargs
      
      * Remove more kwargs
      57c965a8
  8. 16 May, 2024 2 commits
  9. 15 May, 2024 1 commit
  10. 14 May, 2024 1 commit
    • Pablo Montalvo's avatar
      Add PaliGemma (#30814) · 1360801a
      Pablo Montalvo authored
      
      
      * add new model like
      
      * add state dict slicing + new model config
      
      * update palma config and weights, passes vision activations
      
      * fix
      
      * update
      
      * reorder loading/unpacking
      
      * clean up
      
      * add debug statements
      
      * change device
      
      * fix
      
      * debugging
      
      * fix noncausal mask
      
      * fixup sdpa + causal mask
      
      * fix activation function
      
      * remove debug before changing modeling file
      
      * add variants
      
      * debug attention mask in generate
      
      * revert to non-debug sdpa
      
      * revert gemma modifications
      
      * add custom language modeling
      
      * use Processor
      
      * add language modeling file to init
      
      * try thin wrapper around generate
      
      * Update
      
      * update mask
      
      * breakpoints galore
      
      * remove conflict
      
      * switch to left-padding
      
      * add incomplete model doc
      
      * add paligemma global files
      
      * batch rename paligemma
      
      * make generation match outputs and captioning
      
      * style
      
      * style
      
      * remove copied from + doc
      
      * remove more copied from
      
      * remove copy from projector
      
      * minor fix
      
      * update config and style
      
      * add readme - dummy
      
      * CORRECT image captioning
      
      * moving to args
      
      * add siglip proper + fix merging image + text features
      
      * take update_causal_mask from upstream
      
      * remove breakpoint
      
      * leverage AutoModel
      
      * fix input_ids slicing
      
      * make siglip head conditional
      
      * remove encoder_decoder value
      
      * remove unneeded modeling file
      
      * add commented 4d attention mask
      
      * FIXED generation with 4D mask
      
      * Update src/transformers/models/siglip/modeling_siglip.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * fix left padding detection
      
      * shuffle order of verifications
      
      * fix missing labels for training
      
      * fix
      
      * vectorize merging of features, improve slicing
      
      * improve testing before conversion
      
      * handle merging in processor
      
      * image token index depends on checkpoint
      
      * add variants, save processor too
      
      * save processors, base tokenizer off spm file
      
      * expand model embeddings due to additional image token
      
      * pass image processing args
      
      * add convert rgb to siglip processor
      
      * add \n token separately
      
      * fix tokenizer and prompts
      
      * fix docstrings
      
      * change to camel
      
      * fix casing
      
      * debug pos_ids and sdpa
      
      * pass and use cache_position
      
      * add flag for newline tokenization
      
      * Update src/transformers/models/paligemma/processing_paligemma.py
      Co-authored-by: default avatarMerve Noyan <merveenoyan@gmail.com>
      
      * simplify conversion script
      
      * add copied from
      
      * add precision to conversion script
      
      * Update src/transformers/models/paligemma/modeling_paligemma.py
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * clean up
      
      * Shift attention mask from `1:`
      
      After discussion with @molbap
      
      * add docs, fix quality
      
      * quality, tied weights inheritance, and logits/label alignment
      
      * fix more tests
      
      * pass attn_implementation to language model correctly
      
      * add SiglipVisionTransformer to no split modules
      
      * skip paligemma test for sdpa dispatch to flash
      
      * skip incompatible tests
      
      * quality
      
      * [broken archive maps]
      
      * Apply suggestions
      
      - remove archive lists
      - style
      - take shape of inputs_embeds for batch
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/utils/dummy_pt_objects.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * simplify conversion script
      
      * add suggestions
      
      * add suggestions
      
      * add copied from
      
      * fix
      
      * move labels out
      
      * revert
      
      * fix
      
      * remove placeholder labels if None
      
      * use cache_position
      
      * fix quality + docstrings
      
      * fix quality
      
      * fix paligemma 4d gemma mask incompatibility
      
      * fix config docstring
      
      * fix query and attn_mask dtype
      
      ---------
      Co-authored-by: default avatarArthurZucker <arthur.zucker@gmail.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      Co-authored-by: default avatarMerve Noyan <merveenoyan@gmail.com>
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      1360801a
  11. 13 May, 2024 1 commit
    • Poedator's avatar
      Llama: fix custom 4D masks, v2 (#30348) · a0779b9e
      Poedator authored
      
      
      * 4d mask fixes
      
      * Update custom 4D mask logic
      
      * test moved to mixin
      
      * extra tests 4d mask
      
      * upd 4d mask and StaticCache handling
      
      * added Mask4DTestHard to mistral tests
      
      * post-rebase fixes
      
      * test fixes for StaticCache
      
      * make fix-copies
      
      * upd 1 after #30476
      
      * fix common tests
      
      * rm elif attention_mask.dim() == 4:
      
      * tests combined, fixed, mixtral supported
      
      * bigbird style chg reverted
      
      * rm if attention_mask.dim() == 2
      
      * modeling_llama formatting chg
      
      ---------
      Co-authored-by: default avatarJoao Gante <joao@huggingface.co>
      a0779b9e
  12. 09 May, 2024 1 commit
  13. 08 May, 2024 1 commit
  14. 02 May, 2024 1 commit
  15. 01 May, 2024 1 commit
    • Pedro Cuenca's avatar
      Gemma: update activation warning (#29995) · f4f18afd
      Pedro Cuenca authored
      * Gemma: only display act. warning when necessary
      
      This is a nit PR, but I was confused. I got the warning even after I
      had changed `hidden_act` to `gelu_pytorch_tanh`, telling me that I
      was using the "legacy" `gelu_pytorch_tanh`.
      
      Another option is to keep the warning but change the message to say
      something like "`hidden_act` is ignored, please use `hidden_activation`
      instead. Setting Gemma's activation function to `gelu_pytorch_tanh`".
      
      * Change message, and set `config.hidden_activation`
      f4f18afd
  16. 30 Apr, 2024 1 commit
  17. 29 Apr, 2024 1 commit
  18. 22 Apr, 2024 1 commit
  19. 18 Apr, 2024 2 commits
  20. 17 Apr, 2024 1 commit
  21. 05 Apr, 2024 1 commit
  22. 30 Mar, 2024 1 commit
  23. 28 Mar, 2024 1 commit
  24. 22 Mar, 2024 1 commit
  25. 21 Mar, 2024 1 commit
  26. 20 Mar, 2024 1 commit
    • Arthur's avatar
      [`BC 4.37 -> 4.38`] for Llama family, memory and speed (#29753) · ff841900
      Arthur authored
      * attempt to fix
      
      * the actual fix that works with compilation!
      
      * this?
      
      * temporary update
      
      * nit?
      
      * dispatcg to memory efficient?
      
      * update both models that have static cache support
      
      * fix copies fix compile
      
      * make sure fix
      
      * fix cohere and gemma
      
      * fix beams?
      
      * nit
      
      * slipped through the cracks
      
      * nit
      
      * nits
      
      * update
      
      * fix-copies
      
      * skip failing tests
      
      * nits
      ff841900
  27. 19 Mar, 2024 2 commits
  28. 14 Mar, 2024 1 commit
  29. 13 Mar, 2024 1 commit
  30. 08 Mar, 2024 1 commit
  31. 06 Mar, 2024 2 commits
  32. 01 Mar, 2024 1 commit
  33. 28 Feb, 2024 2 commits