Commits · a30c865f991dfec9452cc64bd9a97bfbb96be036 · chenpangpang / transformers

07 Aug, 2024 1 commit

Cache: new Cache format in decoder-only models (#31421) · a30c865f

Raushan Turganbay authored Aug 07, 2024



* draft bart with new cache

* add cache for decoder-only models

* revert utils

* modify docstring

* revert bart

* minor fixes

* fix copies (not related)

* revert tests

* remove enc-dec related code

* remove bloom

* remove opt (enc-dec)

* update docstring

* git, codegen, gpt_neo, gpt_neox, gpj

* clean up

* copied from statements

* revert

* tmp

* update warning msg

* forgot git

* add more flags

* run-slow git,codegen,gpt_neo,gpt_neox,gpj

* add cache flag to VLMs

* remove files

* style

* video LLMs also need a flag

* style

* llava will go in another PR

* style

* [run-slow] codegen, falcon, git, gpt_neo, gpt_neox, gptj, idefics

* Update src/transformers/models/gpt_neo/modeling_gpt_neo.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* copy from

* deprecate until v4.45 and warn if not training

* nit

* fix test

* test static cache

* add more tests and fix models

* fix copies

* return sliding window mask

* run slow tests & fix + codestyle

* one more falcon fix for alibi

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

a30c865f

29 Jul, 2024 2 commits

Generate: end-to-end compilation (#30788) · 7ffe25f2

Joao Gante authored Jul 29, 2024

* mvp

* added test (a few models need fixes)

* fix a few test cases

* test nits

* harder test 😈

* revert changes in stablelm

* test with improved condition

* add todo

* tmp commit

* merged with main

* nits

* add todo

* final corrections

* add docs for generation compilation

* docs nits

* add  tip

* PR suggestions

* add more details to the compilation docs

* fix cache positions

* cache is now init in generate; update docs

* tag test as flaky

* docs

* post rebase make fixup and other nits

* remove unintended changes

* whisper (encoder-decoder) not supported

* move token default updates to ; add tests for token defaults

* push changes

* manual rebase

* chameleon doesn't support this

* fix test_static_cache_mha_mqa_gqa (broken in another PR)

* docs: dynamic is better with end-to-end compilation

7ffe25f2

🚨

Bloom support for cache class (#31445) · f7396876

Raushan Turganbay authored Jul 29, 2024



* bloom dynamic cache

* bloom follows standard cache format

* no skips for bloom anymore

* use cache position when possible

* clean up

* codestyle

* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* pr comments

* isinstance fix

* address comments

* make musicgen test happy

* [run-slow] bloom

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

f7396876

26 Jul, 2024 1 commit

Generation: stop at `eos` for assisted decoding (#31301) · 4ab33c2d

Raushan Turganbay authored Jul 26, 2024



* fix

* move changes to prompt lookup

* add test

* set eos in assistant model

* style

* fix flakiness

* changes for new `main`

* Update tests/generation/test_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/generation/test_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add comment to explain

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

4ab33c2d

25 Jul, 2024 1 commit

Follow up for #31973 (#32025) · df6eee92

Yih-Dar authored Jul 25, 2024



* fix

* [test_all] trigger full CI

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

df6eee92

22 Jul, 2024 1 commit
- Generate: store special token tensors under a unique variable name (#31980) · c38c55f4
  Joao Gante authored Jul 22, 2024
```
* rename stuff

* english; this one shouldn't be changed

* add a _ to the new var names

* musicgen

* derp
```
  c38c55f4
15 Jul, 2024 1 commit

Avoid race condition (#31973) · a1a34657

Yih-Dar authored Jul 15, 2024



* [test_all] hub

* remove delete

* remove delete

* remove delete

* remove delete

* remove delete

* remove delete

* [test_all]

* [test_all]

* [test_all]

* [test_all]

* [test_all]

* [test_all]

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

a1a34657

09 Jul, 2024 1 commit
- Generate: Add new decoding strategy "DoLa" in `.generate()` (#29619) · d094d8d9
  Yung-Sung Chuang authored Jul 09, 2024
```
Co-authored-by: Joao Gante <joao@huggingface.co>
```
  d094d8d9
03 Jul, 2024 1 commit

fix assisted decoding (#31401) · 7f91f168

jiqing-feng authored Jul 03, 2024

* fix assisted decoding

* check None

* fix typo

* fix _prepare_special_tokens

* fix style

* fix lint

* add tests for assisted decoding

* fix style

* fix tests check

7f91f168

02 Jul, 2024 2 commits

🚨🚨 TextGenerationPipeline: rely on the tokenizer default kwargs (#31747) · 82486e59
Joao Gante authored Jul 02, 2024
```
* rely on the tokenizer default kwargs

* fix a few tests
```
82486e59

[whisper] static kv cache (#31166) · a9701953

Sanchit Gandhi authored Jul 02, 2024



* make work with cache abstraction

* correct for static cache

* hacks for compile

* make fast

* fix

* fix pos ids

* generate

* fix sdpa

* fix sdpa cache pos

* fix fa2

* clean fa2

* integrate cache into generate

* make style

* copies

* more copies

* update eager

* update sdpa

* update fa2

* simplify

* use cache pos

* always compute cross-cache for debug

* avoid recompiles
Co-authored-by: Arthur Zucker <arthur@huggingface.co>

* fix fix

* fix fix fix

* more fix

* try encoder-decoder cache (too messy)

* revert encoder-decoder cache

* check cross-attn cache

* use enc-dec dataclass

* use richer enc-dec dataclass

* clean-up

* revert static cache changes

* small fixes

* revert to cpu flag

* fix copies

* add static slow test

* past k/v docstring

* more docstrings

* cache_position docstrings

* add to docs

* add enc-dec cache to docs

* make style

* fix after rebase

* fix beam

* style

* fix generation strategies

* fix most decoder-only tests

* style

* skip test

* more clean up

* small docstrings

* Apply suggestions from code review
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* add todo

* only crop self-attn

* check cache in mixin

* style

* fix re-compile after rebase

* move `is_updated` logic to enc-dec wrapper

* revert back

* revert cache back

* finalise design

* fix

* fix fix

* style

* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* deprecate

* updates

* final updates

* style

* style

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

a9701953

26 Jun, 2024 1 commit

Skip tests properly (#31308) · 1de7dc74

amyeroberts authored Jun 26, 2024

* Skip tests properly

* [test_all]

* Add 'reason' as kwarg for skipTest

* [test_all] Fix up

* [test_all]

1de7dc74

20 Jun, 2024 1 commit
- RWKV: enable generation tests (#31490) · 1fd60fec
  Joao Gante authored Jun 20, 2024
```
* add rwkv tests

* has_attentions set in individual tests
```
  1fd60fec
19 Jun, 2024 1 commit
- Mamba: add generative tests (#31478) · 83259e40
  Joao Gante authored Jun 19, 2024
  
  83259e40
18 Jun, 2024 1 commit

Fix single letter stop strings (#31448) · 28316d0e

Matt authored Jun 18, 2024



* Fix single letter stop strings

* Change the 0 to a 1 to avoid potential empty vector headaches later

* Restructure for clarity

* Update tests/generation/test_stopping_criteria.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Add the unsqueeze

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

28316d0e

06 Jun, 2024 1 commit
- Generation: fix handling of special tokens (#31254) · 5fabd1e8
  Raushan Turganbay authored Jun 06, 2024
```
* fix special tokens in generatioon

* fix test

* add warning

* fix the check

* warn once

* fix
```
  5fabd1e8
04 Jun, 2024 1 commit

Pass device in Logits Processor's init (#29804) · 83238eee

Raushan Turganbay authored Jun 04, 2024



* add device in logits processor

* remove device when not needed

* codestyle

* tests

* forgot `melody` version

* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* codestyle

* updates

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

83238eee

03 Jun, 2024 1 commit

Token healing (#30081) · 39b2ff69

Ahmed Moubtahij authored Jun 03, 2024



* token healing impl + trie with extensions

* make fixup

* prefix-robust space tokenization

* examples readme and requirements

* make fixup

* allow input prompt and model

* redundant defaults

* Specialized Trie

* make fixup

* updated tests with new inherited Tree

* input ids to auto device_map

* rm unused import

* Update src/transformers/generation/utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* naming convention

* Revert "naming convention"

This reverts commit dd39d9c5b7a969e2d8a8d2a8e54f121b82dc44f0.

* naming convention

* last -hopefully- changes

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

39b2ff69

28 May, 2024 1 commit

Watermark: fix tests (#30961) · 779bc360

Raushan Turganbay authored May 28, 2024



* fix tests

* style

* Update tests/generation/test_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

779bc360

23 May, 2024 2 commits

Quantized KV Cache (#30483) · d583f131

Raushan Turganbay authored May 23, 2024



* clean-up

* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixup

* Update tests/quantization/quanto_integration/test_quanto.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/generation/configuration_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* more suggestions

* mapping if torch available

* run tests & add 'support_quantized' flag

* fix jamba test

* revert, will be fixed by another PR

* codestyle

* HQQ and versatile cache classes

* final update

* typo

* make tests happy

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

d583f131

Using assistant in AutomaticSpeechRecognitionPipeline with different encoder size (#30637) · eb1a77bb

Kamil Akesbi authored May 23, 2024



* fiw input to generate in pipeline

* fixup

* pass input_features to generate with assistant

* error if model and assistant with different enc size

* fix

* apply review suggestions

* use self.config.is_encoder_decoder

* pass inputs to generate directly

* add slow tests

* Update src/transformers/generation/utils.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update tests/pipelines/test_pipelines_automatic_speech_recognition.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update tests/pipelines/test_pipelines_automatic_speech_recognition.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update tests/pipelines/test_pipelines_automatic_speech_recognition.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update tests/pipelines/test_pipelines_automatic_speech_recognition.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update tests/pipelines/test_pipelines_automatic_speech_recognition.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* apply review

* Update src/transformers/generation/utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/pipelines/test_pipelines_automatic_speech_recognition.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* apply code review

* update attributes encoder_xyz to check

* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* add slow test

* solve conflicts

---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

eb1a77bb

22 May, 2024 1 commit
- Generation: get special tokens from model config (#30899) · b1065aa0
  Raushan Turganbay authored May 22, 2024
```
* fix

* let's do this way?

* codestyle

* update

* add tests
```
  b1065aa0
14 May, 2024 1 commit

Add Watermarking LogitsProcessor and WatermarkDetector (#29676) · 5ad960f1

Raushan Turganbay authored May 14, 2024



* add watermarking processor

* remove the other hashing (context width=1 always)

* make style

* Update src/transformers/generation/logits_process.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/logits_process.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/logits_process.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/configuration_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* update watermarking process

* add detector

* update tests to use detector

* fix failing tests

* rename `input_seq`

* make style

* doc for processor

* minor fixes

* docs

* make quality

* Update src/transformers/generation/configuration_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/logits_process.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/watermarking.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/watermarking.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/watermarking.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* add PR suggestions

* let's use lru_cache's default max size (128)

* import processor if torch available

* maybe like this

* lets move the config to torch independet file

* add docs

* tiny docs fix to make the test happy

* Update src/transformers/generation/configuration_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/watermarking.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* PR suggestions

* add docs

* fix test

* fix docs

* address pr comments

* style

* Revert "style"

This reverts commit 7f33cc34ff08b414f8e7f90060889877606b43b2.

* correct style

* make doctest green

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

5ad960f1

09 May, 2024 2 commits

Generate: consistently handle special tokens as tensors (#30624) · 7130a22d

Joao Gante authored May 09, 2024



* tmp commit

* [test_all] mvp

* missing not

* [test_all] final test fixes

* fix musicgen_melody and rag

* [test_all] empty commit

* PR comments

* Update src/transformers/generation/utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

7130a22d

Generate: add `min_p` sampling (#30639) · df53c6e5

Joao Gante authored May 09, 2024



* min_p

* more relaxed test to avoid numerical issues

* Update src/transformers/generation/logits_process.py
Co-authored-by: menhguin <minh1228@gmail.com>

* Update src/transformers/generation/configuration_utils.py
Co-authored-by: menhguin <minh1228@gmail.com>

* docstring clarifications

* PR comments

* Update tests/generation/test_logits_process.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* make fixup

---------
Co-authored-by: menhguin <minh1228@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

df53c6e5

23 Apr, 2024 1 commit

Fix on "cache position" for assisted generation (#30068) · 77b59dce

Raushan Turganbay authored Apr 23, 2024



* clean commit history I hope

* get kv seq length correctly

* PR suggestions

* Update src/transformers/testing_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* add comment

* give gpt bigcode it's own overriden method

* remove code

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

77b59dce

22 Apr, 2024 1 commit

Terminator strings for generate() (#28932) · 0d84901c

Matt authored Apr 22, 2024



* stash commit (will discard all of this)

* stash commit

* First commit - needs a lot of testing!

* Add a test

* Fix imports and make the tests actually test something

* Tests pass!

* Rearrange test

* Add comments (but it's still a bit confusing)

* Stop storing the tokenizer

* Comment fixup

* Fix for input_ids with a single sequence

* Update tests to test single sequences

* make fixup

* Fix incorrect use of isin()

* Expand tests to catch more cases

* Expand tests to catch more cases

* make fixup

* Fix length calculation and update tests

* Handle Ġ as a space replacement too

* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Add optimizations from Joao's suggestion

* Remove TODO

* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update tests/generation/test_stopping_criteria.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* make fixup

* Rename some variables and remove some debugging clauses for clarity

* Add tests for the sub-methods

* Clarify one test slightly

* Add stop_strings to GenerationConfig

* generate() supports stop_string arg, asks for tokenizer if not provided

* make fixup

* Cleanup code and rename variables for clarity

* Update tokenizer error

* Update tokenizer passing, handle generation on GPU

* Slightly more explanation cleanup

* More comment cleanup

* Factor out the token cleanup so it's more obvious what we're doing, and we can change it later

* Careful with that cleanup!

* Cleanup + optimizations to _get_matching_positions

* More minor performance tweaks

* Implement caching and eliminate some expensive ops (startup time: 200ms -> 9ms)

* Remove the pin_memory call

* Parallelize across all stop strings!

* Quick fix for tensor devices

* Update embeddings test for the new format

* Fix test imports

* Manual patching for BERT-like tokenizers

* Return a bool vector instead of a single True/False

* Better comment

* Better comment

* Add tests from @zucchini-nlp

* Amy's list creation nit

* tok_list -> token_list

* Push a big expanded docstring (should we put it somewhere else?)

* Expand docstrings

* Docstring fixups

* Rebase

* make fixup

* Make a properly general method for figuring out token strings

* Fix naming throughout the functions

* Move cache, refactor, fix tests

* Add comment

* Remove finished TODO

* Remove finished TODO

* make fixup

* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update and shorten docstring

* Update tests to be shorter/clearer and test specific cases

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

0d84901c

19 Apr, 2024 1 commit

Do not remove half seq length in generation tests (#30016) · b1cd4874

Raushan Turganbay authored Apr 19, 2024



* remove seq length from generation tests

* style and quality

* [test_all] & PR suggestion
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update tests/generation/test_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* [test all] remove unused variables

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

b1cd4874

18 Apr, 2024 1 commit

Add jamba (#29943) · 3f20877d

tomeras91 authored Apr 18, 2024

* Add jamba arch

* apply "make fix-copies" changes

* fix link to model in JambaConfig docstring

* Add n_ctx in modeling file because repo-consistency wants that

* Add jamba to flash attention and sdpa documentation

* mamba dt_proj quant fix now works for LoRA as well

* override test_left_padding_compatibility and use a more permissive tolerance. left padding numerical difference are accentuated by mamba layers

* add jamba to tokenization auto

* fix comments of shape (PR #24 in the model page: https://huggingface.co/ai21labs/Jamba-v0.1/discussions/24)

* simple PR fixes

* remove unnecessary kwargs from JambaAttentionDecoderLayer and JambaMambaDecoderLayer

* remove the LoRA hack for the mamba dt_proj bias. It was solved in huggingface/peft#1530 (https://github.com/huggingface/peft/pull/1530)

* Add copied comment on JambaMLP (it's the same as MixtralMLP)

* remove padding_mask warnings. It's not supported anymore

* fix docstring. Float instead of int

* A few more minor PR fixes

* (1) lowercase names for mamba layernorms (2) remove _apply_inner_layernorms and do it directly in the forward pass

* Return None attention weights from mamba layers. Append to all attentions only if not None.

* remove some leftover jamba archive lists

* Better separation between expert vs non-expert layers. non-expert layers return None as router_logits, and it is not concatenated to all_router_logits returned from JambaModel

* no need to take router_logits at config.expert_layer_offset anymore. result.router_logits now holds results only for expert layers

* Add Jamba paper on READMEs

* (1) rename n_ctx -> max_position_embeddings (2) don't use it in the modeling file since it's not needed (set it as an exception to check_config_attributes)

* Add copied from comment

* remove the code path for apply_inner_layernorms=False. Jamba always has the inner mamba layernorms

* clearer docstring for _convert_to_standard_cache

* style fixes

* Change calc_logits_for_entire_prompt (bool) to num_logits_to_keep (int). Adapt assisted decoding code tp use it. Also small change in low memory beam search decoding path to support this new int value in model_inputs

* rename test so it still overrides what its meant to override

* draft

* oups

* nit

* remove more complexe logic

* fix names used in config

* fix fix fix

* style

* fix some more failing tests

* generate did not init the cache 🙃



* more small nits

* typo

* config.mamba_expand * config.hidden_size for the intermediate size of the mamba shapes

* fix init of pkv with torch.tensor()

* empty tensor

* fix some init issues

* stupid changes required by generate because it does not even support it's own DynamicCache class

* more fixes

* fix general assisted gen cache_position bug

* tests passing

* Add offsets and periods as SPECIAL_CASES_TO_ALLOW in check_config_attributes.py

* fix reorder_cache to reorder mamba states and override some more functions in HybridMambaAttentionDynamicCache

* no need to override test_past_key_values_format() and _check_past_key_values_for_generate() in tests anymore

* fix docstrings and typehints for past_key_values

* style fixes

* fix docs

* change typehint due to copy from Mixtral

* forgot import

* import order

* Add configuration_jamba and modeling_jamba to not_doctested because the model is too big to download (in docstring of JambaForCausalLM.forward)

* Add integration test with tiny tandom Jamba model on hub

* fix flash attention cache shapes

* bring back forgotten hidden states

* rename HybridMambaAttentionDynamicCache.seqlen_offset to has_previous_state (and make bool) and bugfix - it should be set to True after a finished forward pass of the entire model

* align integration test after modeling fixes

* bugfix - mamba can use precomputed states only of forward pass is on a single token

* bugfix - mamba can use precomputed states only if they match the batch size

* typo

* remove making _prepare_4d_causal_attention_mask a leaf function

* stop using past_seq_len.get_seq_length(). Use cache positions instead. Adjust test (test_decoder_model_past_with_large_inputs) accordingly

---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>

3f20877d

10 Apr, 2024 1 commit

Raushan Turganbay authored Apr 10, 2024



* avoid generation length warning

* add tests

* Update src/transformers/generation/candidate_generator.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* add tests and minor fixes

* refine `min_new_tokens`

* Update src/transformers/generation/candidate_generator.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* add method to prepare length arguments

* add test for min length

* Update src/transformers/generation/candidate_generator.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* fix variable naming

* empty commit for tests

* trigger tests (empty)

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

41579763

09 Apr, 2024 1 commit

Revert workaround for TF safetensors loading (#30128) · ec59a421

Matt authored Apr 09, 2024

* See if we can get tests to pass with the fixed weights

* See if we can get tests to pass with the fixed weights

* Replace the revisions now that we don't need them anymore

ec59a421

08 Apr, 2024 1 commit
- Patch fix - don't use safetensors for TF models (#30118) · 7f9aff91
  amyeroberts authored Apr 08, 2024
```
* Patch fix - don't use safetensors for TF models

* Skip test for TF for now

* Update for another test
```
  7f9aff91
02 Apr, 2024 2 commits

Adding FlaxNoRepeatNGramLogitsProcessor (#29677) · fed27ffc

théo gigant authored Apr 02, 2024

* fix issue with logit processor in beam search in Flax

* adding FlaxNoRepeatNGramLogitsProcessor class + unit test

* style correction and code verification

* add FlaxNoRepeatNGramLogitsProcessor to the test_processor_list and test_processor_list_jitted tests

* fix an issue where ngrams are banned only if they appear ==1 time + update description of get_previous_ngrams

* replace non-jit compatible masking of ngrams that are not yet generated with jittable version

* Revert "fix issue with logit processor in beam search in Flax"

This reverts commit 09b70d7e4dc32d0cc4db61af09a835a9cd238b50.

* add FlaxNoRepeatNGramLogitsProcessor to _get_logits_processor

* change the method of casting to boolean of banned tokens indices

* fix code style

* remove some useless operations + significantly faster computation of update indices using jax.lax.fori_loop

* remove useless loop iterations

* set some variables that were calculated and used multiple times

* fix format

fed27ffc

[`generate`] fix breaking change for patch (#29976) · 83b26dd7

Arthur authored Apr 02, 2024

* fix bug and add tests

* nit

* otherway to get the cur len instead of attention mask

* more places where this might have been broken

* nit

* oups

* inputs_embeds vs input_embeds

* test generated outptus

* style

* nit

* fix

* skip failing biogpt

83b26dd7

01 Apr, 2024 1 commit
- Generate: move misplaced test (#29902) · c9f6e5e3
  Joao Gante authored Apr 01, 2024
  
  c9f6e5e3
27 Mar, 2024 1 commit

Move `eos_token_id` to stopping criteria (#29459) · 0efcf323

Raushan Turganbay authored Mar 27, 2024



* add eos stopping criteria

* minor fix

* Update tests/generation/test_stopping_criteria.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* check eos is not None and fix tests

* make style and fixup

* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update tests/generation/test_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update tests/generation/test_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* camel case everywhere

* call stopping criteria list for candidate ids

* make style  and fixup

* Empty commit

* Empty commit to pass flaky test

* set max length in PromptLookupCandidateGenerator

* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* lets fix this typo in docs

* Update src/transformers/generation/utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* update PR

* empty commit

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

0efcf323

26 Mar, 2024 1 commit
- Allow `bos_token_id is None` during the generation with `inputs_embeds` (#29772) · 998b5bb5
  Zhihao Lin authored Mar 26, 2024
```
* update

* add ut

* update
```
  998b5bb5
21 Mar, 2024 1 commit

Change in-place operations to out-of-place in LogitsProcessors (#29680) · fadb0533

Raushan Turganbay authored Mar 21, 2024



* change in-place -> out-of-place

* add tests

* add more tests

* naming consistency

* fix doctest

* forgot min-length processors

* empty

* Revert "fix doctest"

This reverts commit 4772768457f9bc057f1d4d9d67ea94eb7224eb8d.

* revert change in docstring

* Update tests/generation/test_logits_process.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/generation/test_logits_process.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

fadb0533

19 Mar, 2024 1 commit

Clean-up generation tests after moving methods to private (#29582) · 425ba56c

Raushan Turganbay authored Mar 19, 2024

* clean-up tests

* refine comments

* fix musicgen tests

* make style

* remove slow decorator from a test

* more clean-up

* fix other failing tests

425ba56c

08 Mar, 2024 1 commit

[tests] use `torch_device` instead of `auto` for model testing (#29531) · 1ea3ad1a

Fanli Lin authored Mar 08, 2024



* use torch_device

* skip for XPU

* Update tests/generation/test_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

1ea3ad1a