1. 26 Jul, 2024 1 commit
  2. 25 Jul, 2024 1 commit
  3. 22 Jul, 2024 1 commit
  4. 15 Jul, 2024 1 commit
  5. 09 Jul, 2024 1 commit
  6. 03 Jul, 2024 1 commit
    • jiqing-feng's avatar
      fix assisted decoding (#31401) · 7f91f168
      jiqing-feng authored
      * fix assisted decoding
      
      * check None
      
      * fix typo
      
      * fix _prepare_special_tokens
      
      * fix style
      
      * fix lint
      
      * add tests for assisted decoding
      
      * fix style
      
      * fix tests check
      7f91f168
  7. 02 Jul, 2024 2 commits
    • Joao Gante's avatar
      馃毃馃毃 TextGenerationPipeline: rely on the tokenizer default kwargs (#31747) · 82486e59
      Joao Gante authored
      * rely on the tokenizer default kwargs
      
      * fix a few tests
      82486e59
    • Sanchit Gandhi's avatar
      [whisper] static kv cache (#31166) · a9701953
      Sanchit Gandhi authored
      
      
      * make work with cache abstraction
      
      * correct for static cache
      
      * hacks for compile
      
      * make fast
      
      * fix
      
      * fix pos ids
      
      * generate
      
      * fix sdpa
      
      * fix sdpa cache pos
      
      * fix fa2
      
      * clean fa2
      
      * integrate cache into generate
      
      * make style
      
      * copies
      
      * more copies
      
      * update eager
      
      * update sdpa
      
      * update fa2
      
      * simplify
      
      * use cache pos
      
      * always compute cross-cache for debug
      
      * avoid recompiles
      Co-authored-by: default avatarArthur Zucker <arthur@huggingface.co>
      
      * fix fix
      
      * fix fix fix
      
      * more fix
      
      * try encoder-decoder cache (too messy)
      
      * revert encoder-decoder cache
      
      * check cross-attn cache
      
      * use enc-dec dataclass
      
      * use richer enc-dec dataclass
      
      * clean-up
      
      * revert static cache changes
      
      * small fixes
      
      * revert to cpu flag
      
      * fix copies
      
      * add static slow test
      
      * past k/v docstring
      
      * more docstrings
      
      * cache_position docstrings
      
      * add to docs
      
      * add enc-dec cache to docs
      
      * make style
      
      * fix after rebase
      
      * fix beam
      
      * style
      
      * fix generation strategies
      
      * fix most decoder-only tests
      
      * style
      
      * skip test
      
      * more clean up
      
      * small docstrings
      
      * Apply suggestions from code review
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      
      * add todo
      
      * only crop self-attn
      
      * check cache in mixin
      
      * style
      
      * fix re-compile after rebase
      
      * move `is_updated` logic to enc-dec wrapper
      
      * revert back
      
      * revert cache back
      
      * finalise design
      
      * fix
      
      * fix fix
      
      * style
      
      * Update src/transformers/cache_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * deprecate
      
      * updates
      
      * final updates
      
      * style
      
      * style
      
      ---------
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      a9701953
  8. 26 Jun, 2024 1 commit
  9. 20 Jun, 2024 1 commit
  10. 19 Jun, 2024 1 commit
  11. 18 Jun, 2024 1 commit
  12. 06 Jun, 2024 1 commit
  13. 04 Jun, 2024 1 commit
  14. 03 Jun, 2024 1 commit
    • Ahmed Moubtahij's avatar
      Token healing (#30081) · 39b2ff69
      Ahmed Moubtahij authored
      
      
      * token healing impl + trie with extensions
      
      * make fixup
      
      * prefix-robust space tokenization
      
      * examples readme and requirements
      
      * make fixup
      
      * allow input prompt and model
      
      * redundant defaults
      
      * Specialized Trie
      
      * make fixup
      
      * updated tests with new inherited Tree
      
      * input ids to auto device_map
      
      * rm unused import
      
      * Update src/transformers/generation/utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * naming convention
      
      * Revert "naming convention"
      
      This reverts commit dd39d9c5b7a969e2d8a8d2a8e54f121b82dc44f0.
      
      * naming convention
      
      * last -hopefully- changes
      
      ---------
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      39b2ff69
  15. 28 May, 2024 1 commit
  16. 23 May, 2024 2 commits
  17. 22 May, 2024 1 commit
  18. 14 May, 2024 1 commit
  19. 09 May, 2024 2 commits
  20. 23 Apr, 2024 1 commit
  21. 22 Apr, 2024 1 commit
    • Matt's avatar
      Terminator strings for generate() (#28932) · 0d84901c
      Matt authored
      
      
      * stash commit (will discard all of this)
      
      * stash commit
      
      * First commit - needs a lot of testing!
      
      * Add a test
      
      * Fix imports and make the tests actually test something
      
      * Tests pass!
      
      * Rearrange test
      
      * Add comments (but it's still a bit confusing)
      
      * Stop storing the tokenizer
      
      * Comment fixup
      
      * Fix for input_ids with a single sequence
      
      * Update tests to test single sequences
      
      * make fixup
      
      * Fix incorrect use of isin()
      
      * Expand tests to catch more cases
      
      * Expand tests to catch more cases
      
      * make fixup
      
      * Fix length calculation and update tests
      
      * Handle 臓 as a space replacement too
      
      * Update src/transformers/generation/stopping_criteria.py
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      
      * Add optimizations from Joao's suggestion
      
      * Remove TODO
      
      * Update src/transformers/generation/stopping_criteria.py
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      
      * Update tests/generation/test_stopping_criteria.py
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      
      * make fixup
      
      * Rename some variables and remove some debugging clauses for clarity
      
      * Add tests for the sub-methods
      
      * Clarify one test slightly
      
      * Add stop_strings to GenerationConfig
      
      * generate() supports stop_string arg, asks for tokenizer if not provided
      
      * make fixup
      
      * Cleanup code and rename variables for clarity
      
      * Update tokenizer error
      
      * Update tokenizer passing, handle generation on GPU
      
      * Slightly more explanation cleanup
      
      * More comment cleanup
      
      * Factor out the token cleanup so it's more obvious what we're doing, and we can change it later
      
      * Careful with that cleanup!
      
      * Cleanup + optimizations to _get_matching_positions
      
      * More minor performance tweaks
      
      * Implement caching and eliminate some expensive ops (startup time: 200ms -> 9ms)
      
      * Remove the pin_memory call
      
      * Parallelize across all stop strings!
      
      * Quick fix for tensor devices
      
      * Update embeddings test for the new format
      
      * Fix test imports
      
      * Manual patching for BERT-like tokenizers
      
      * Return a bool vector instead of a single True/False
      
      * Better comment
      
      * Better comment
      
      * Add tests from @zucchini-nlp
      
      * Amy's list creation nit
      
      * tok_list -> token_list
      
      * Push a big expanded docstring (should we put it somewhere else?)
      
      * Expand docstrings
      
      * Docstring fixups
      
      * Rebase
      
      * make fixup
      
      * Make a properly general method for figuring out token strings
      
      * Fix naming throughout the functions
      
      * Move cache, refactor, fix tests
      
      * Add comment
      
      * Remove finished TODO
      
      * Remove finished TODO
      
      * make fixup
      
      * Update src/transformers/generation/stopping_criteria.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update and shorten docstring
      
      * Update tests to be shorter/clearer and test specific cases
      
      ---------
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      0d84901c
  22. 19 Apr, 2024 1 commit
  23. 18 Apr, 2024 1 commit
    • tomeras91's avatar
      Add jamba (#29943) · 3f20877d
      tomeras91 authored
      * Add jamba arch
      
      * apply "make fix-copies" changes
      
      * fix link to model in JambaConfig docstring
      
      * Add n_ctx in modeling file because repo-consistency wants that
      
      * Add jamba to flash attention and sdpa documentation
      
      * mamba dt_proj quant fix now works for LoRA as well
      
      * override test_left_padding_compatibility and use a more permissive tolerance. left padding numerical difference are accentuated by mamba layers
      
      * add jamba to tokenization auto
      
      * fix comments of shape (PR #24 in the model page: https://huggingface.co/ai21labs/Jamba-v0.1/discussions/24)
      
      * simple PR fixes
      
      * remove unnecessary kwargs from JambaAttentionDecoderLayer and JambaMambaDecoderLayer
      
      * remove the LoRA hack for the mamba dt_proj bias. It was solved in huggingface/peft#1530 (https://github.com/huggingface/peft/pull/1530)
      
      * Add copied comment on JambaMLP (it's the same as MixtralMLP)
      
      * remove padding_mask warnings. It's not supported anymore
      
      * fix docstring. Float instead of int
      
      * A few more minor PR fixes
      
      * (1) lowercase names for mamba layernorms (2) remove _apply_inner_layernorms and do it directly in the forward pass
      
      * Return None attention weights from mamba layers. Append to all attentions only if not None.
      
      * remove some leftover jamba archive lists
      
      * Better separation between expert vs non-expert layers. non-expert layers return None as router_logits, and it is not concatenated to all_router_logits returned from JambaModel
      
      * no need to take router_logits at config.expert_layer_offset anymore. result.router_logits now holds results only for expert layers
      
      * Add Jamba paper on READMEs
      
      * (1) rename n_ctx -> max_position_embeddings (2) don't use it in the modeling file since it's not needed (set it as an exception to check_config_attributes)
      
      * Add copied from comment
      
      * remove the code path for apply_inner_layernorms=False. Jamba always has the inner mamba layernorms
      
      * clearer docstring for _convert_to_standard_cache
      
      * style fixes
      
      * Change calc_logits_for_entire_prompt (bool) to num_logits_to_keep (int). Adapt assisted decoding code tp use it. Also small change in low memory beam search decoding path to support this new int value in model_inputs
      
      * rename test so it still overrides what its meant to override
      
      * draft
      
      * oups
      
      * nit
      
      * remove more complexe logic
      
      * fix names used in config
      
      * fix fix fix
      
      * style
      
      * fix some more failing tests
      
      * generate did not init the cache 馃檭
      
      
      
      * more small nits
      
      * typo
      
      * config.mamba_expand * config.hidden_size for the intermediate size of the mamba shapes
      
      * fix init of pkv with torch.tensor()
      
      * empty tensor
      
      * fix some init issues
      
      * stupid changes required by generate because it does not even support it's own DynamicCache class
      
      * more fixes
      
      * fix general assisted gen cache_position bug
      
      * tests passing
      
      * Add offsets and periods as SPECIAL_CASES_TO_ALLOW in check_config_attributes.py
      
      * fix reorder_cache to reorder mamba states and override some more functions in HybridMambaAttentionDynamicCache
      
      * no need to override test_past_key_values_format() and _check_past_key_values_for_generate() in tests anymore
      
      * fix docstrings and typehints for past_key_values
      
      * style fixes
      
      * fix docs
      
      * change typehint due to copy from Mixtral
      
      * forgot import
      
      * import order
      
      * Add configuration_jamba and modeling_jamba to not_doctested because the model is too big to download (in docstring of JambaForCausalLM.forward)
      
      * Add integration test with tiny tandom Jamba model on hub
      
      * fix flash attention cache shapes
      
      * bring back forgotten hidden states
      
      * rename HybridMambaAttentionDynamicCache.seqlen_offset to has_previous_state (and make bool) and bugfix - it should be set to True after a finished forward pass of the entire model
      
      * align integration test after modeling fixes
      
      * bugfix - mamba can use precomputed states only of forward pass is on a single token
      
      * bugfix - mamba can use precomputed states only if they match the batch size
      
      * typo
      
      * remove making _prepare_4d_causal_attention_mask a leaf function
      
      * stop using past_seq_len.get_seq_length(). Use cache positions instead. Adjust test (test_decoder_model_past_with_large_inputs) accordingly
      
      ---------
      Co-authored-by: default avatarArthur Zucker <arthur.zucker@gmail.com>
      Co-authored-by: default avatarJoao Gante <joao@huggingface.co>
      3f20877d
  24. 10 Apr, 2024 1 commit
  25. 09 Apr, 2024 1 commit
  26. 08 Apr, 2024 1 commit
  27. 02 Apr, 2024 2 commits
    • th茅o gigant's avatar
      Adding FlaxNoRepeatNGramLogitsProcessor (#29677) · fed27ffc
      th茅o gigant authored
      * fix issue with logit processor in beam search in Flax
      
      * adding FlaxNoRepeatNGramLogitsProcessor class + unit test
      
      * style correction and code verification
      
      * add FlaxNoRepeatNGramLogitsProcessor to the test_processor_list and test_processor_list_jitted tests
      
      * fix an issue where ngrams are banned only if they appear ==1 time + update description of get_previous_ngrams
      
      * replace non-jit compatible masking of ngrams that are not yet generated with jittable version
      
      * Revert "fix issue with logit processor in beam search in Flax"
      
      This reverts commit 09b70d7e4dc32d0cc4db61af09a835a9cd238b50.
      
      * add FlaxNoRepeatNGramLogitsProcessor to _get_logits_processor
      
      * change the method of casting to boolean of banned tokens indices
      
      * fix code style
      
      * remove some useless operations + significantly faster computation of update indices using jax.lax.fori_loop
      
      * remove useless loop iterations
      
      * set some variables that were calculated and used multiple times
      
      * fix format
      fed27ffc
    • Arthur's avatar
      [`generate`] fix breaking change for patch (#29976) · 83b26dd7
      Arthur authored
      * fix bug and add tests
      
      * nit
      
      * otherway to get the cur len instead of attention mask
      
      * more places where this might have been broken
      
      * nit
      
      * oups
      
      * inputs_embeds vs input_embeds
      
      * test generated outptus
      
      * style
      
      * nit
      
      * fix
      
      * skip failing biogpt
      83b26dd7
  28. 01 Apr, 2024 1 commit
  29. 27 Mar, 2024 1 commit
  30. 26 Mar, 2024 1 commit
  31. 21 Mar, 2024 1 commit
  32. 19 Mar, 2024 1 commit
  33. 08 Mar, 2024 2 commits
  34. 07 Mar, 2024 1 commit
  35. 06 Mar, 2024 1 commit