Commits · c9f6e5e35156e068b227dd9b15521767f6afd4d2 · chenpangpang / transformers

01 Apr, 2024 1 commit
- Generate: move misplaced test (#29902) · c9f6e5e3
  Joao Gante authored Apr 01, 2024
  
  c9f6e5e3
27 Mar, 2024 1 commit

Move `eos_token_id` to stopping criteria (#29459) · 0efcf323

Raushan Turganbay authored Mar 27, 2024



* add eos stopping criteria

* minor fix

* Update tests/generation/test_stopping_criteria.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* check eos is not None and fix tests

* make style and fixup

* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update tests/generation/test_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update tests/generation/test_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* camel case everywhere

* call stopping criteria list for candidate ids

* make style  and fixup

* Empty commit

* Empty commit to pass flaky test

* set max length in PromptLookupCandidateGenerator

* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* lets fix this typo in docs

* Update src/transformers/generation/utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* update PR

* empty commit

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

0efcf323

26 Mar, 2024 1 commit
- Allow `bos_token_id is None` during the generation with `inputs_embeds` (#29772) · 998b5bb5
  Zhihao Lin authored Mar 26, 2024
```
* update

* add ut

* update
```
  998b5bb5
21 Mar, 2024 1 commit

Change in-place operations to out-of-place in LogitsProcessors (#29680) · fadb0533

Raushan Turganbay authored Mar 21, 2024



* change in-place -> out-of-place

* add tests

* add more tests

* naming consistency

* fix doctest

* forgot min-length processors

* empty

* Revert "fix doctest"

This reverts commit 4772768457f9bc057f1d4d9d67ea94eb7224eb8d.

* revert change in docstring

* Update tests/generation/test_logits_process.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/generation/test_logits_process.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

fadb0533

19 Mar, 2024 1 commit

Clean-up generation tests after moving methods to private (#29582) · 425ba56c

Raushan Turganbay authored Mar 19, 2024

* clean-up tests

* refine comments

* fix musicgen tests

* make style

* remove slow decorator from a test

* more clean-up

* fix other failing tests

425ba56c

08 Mar, 2024 2 commits

[tests] use `torch_device` instead of `auto` for model testing (#29531) · 1ea3ad1a

Fanli Lin authored Mar 08, 2024



* use torch_device

* skip for XPU

* Update tests/generation/test_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

1ea3ad1a

Generate: left-padding test, revisited (#29515) · bc764f42

Joao Gante authored Mar 08, 2024



* left-padding test revisited

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

bc764f42

07 Mar, 2024 1 commit
- v4.39 deprecations 🧼 (#29492) · ffe60fdc
  Joao Gante authored Mar 07, 2024
  
  ffe60fdc
06 Mar, 2024 1 commit
- Generate: get generation mode from the generation config instance 🧼 (#29441) · 700d48fb
  Joao Gante authored Mar 06, 2024
  
  700d48fb
27 Feb, 2024 1 commit
- GenerationConfig validate both constraints and force_words_ids (#29163) · 871ba71d
  FredericOdermatt authored Feb 27, 2024
```
GenerationConfig validate both options for constrained decoding: constraints and force_words_ids
```
  871ba71d
26 Feb, 2024 1 commit
- Track each row separately for stopping criteria (#29116) · 8f2f0f0f
  Raushan Turganbay authored Feb 26, 2024
  
  8f2f0f0f
20 Feb, 2024 1 commit
- Generate: unset GenerationConfig parameters do not raise warning (#29119) · a7755d24
  Joao Gante authored Feb 20, 2024
  
  a7755d24
19 Feb, 2024 1 commit

ENH: added new output_logits option to generate function (#28667) · 08cd694e

Max Baak authored Feb 19, 2024

output_logits option behaves like output_scores, but returns the raw, unprocessed prediction logit scores,
ie. the values before they undergo logit processing and/or warping. The latter happens by default for the
regular output scores.

It's useful to have the unprocessed logit scores in certain circumstances. For example, unprocessed logit scores
are very useful with causallm models when one wants to determine the probability of a certain answer, e.g.
when asking a question with a yes/no answer. In that case getting the next-token probabilities of both "yes" and
"no" (and/or their relative ratio) is of interest for classification. The reason for getting these _before_ logit
processing and/or warping is b/c a) that can change the probabilities or b) reject the tokens of interest / reduce
the number of tokens to just 1.

For an example use-case see paper TabLLM: Few-shot Classification of Tabular Data with Large Language Models
by Stefan Hegselmann, Alejandro Buendia, Hunter Lang, Monica Agrawal, Xiaoyi Jiang, and David Sontag.
https://arxiv.org/abs/2210.10723



In addition:
- added dedicated unit test: tests/generation/test_utils/test_return_unprocessed_logit_scores
  which tests return of logics with output_logits=True in generation.
- set output_logits=True in all other generation unit tests, that also have output_scores=True.

Implemented @gante's and @amyeroberts review feedback
Co-authored-by: kx79wq <max.baak@ing.com>

08cd694e

16 Feb, 2024 3 commits

fix num_assistant_tokens with heuristic schedule (#28759) · 258da40e

Jonathan Mamou authored Feb 16, 2024



* fix heuristic num_assistant_tokens_schedule

* Update src/transformers/generation/configuration_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/candidate_generator.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update utils.py

check that candidate_generator.assistant_model exists since some some speculations (like ngram and PLD) don't have assistant_model attribute

* Update src/transformers/generation/candidate_generator.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update tests/generation/test_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* make fixup

* merge conflict

* fix docstring

* make fixup

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

258da40e

Fix max_length criteria when using inputs_embeds (#28994) · aee11fe4

Raushan Turganbay authored Feb 16, 2024



* fix max_length for inputs_embeds

* make style

* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Static Cache: load models with MQA or GQA (#28975)

* fix

* fix tests

* fix tests

* Update src/transformers/generation/utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* more fixes

* make style

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

aee11fe4

Update all references to canonical models (#29001) · f497f564
Lysandre Debut authored Feb 16, 2024
```
* Script & Manual edition

* Update
```
f497f564

08 Feb, 2024 1 commit

Support batched input for decoder start ids (#28887) · d6286646

Raushan Turganbay authored Feb 08, 2024



* support batched input for decoder start ids

* Fix typos
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* minor changes

* fix: decoder_start_id as list

* empty commit

* empty commit

* empty commit

* empty commit

* empty commit

* empty commit

* empty commit

* empty commit

* empty commit

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

d6286646

30 Jan, 2024 1 commit

Add tf_keras imports to prepare for Keras 3 (#28588) · 415e9a09

Matt authored Jan 30, 2024

* Port core files + ESM (because ESM code is odd)

* Search-replace in modelling code

* Fix up transfo_xl as well

* Fix other core files + tests (still need to add correct import to tests)

* Fix cookiecutter

* make fixup, fix imports in some more core files

* Auto-add imports to tests

* Cleanup, add imports to sagemaker tests

* Use correct exception for importing tf_keras

* Fixes in modeling_tf_utils

* make fixup

* Correct version parsing code

* Ensure the pipeline tests correctly revert to float32 after each test

* Ensure the pipeline tests correctly revert to float32 after each test

* More tf.keras -> keras

* Add dtype cast

* Better imports of tf_keras

* Add a cast for tf.assign, just in case

* Fix callback imports

415e9a09

29 Jan, 2024 1 commit
- Mark test_constrained_beam_search_generate as flaky (#28757) · 9e8f35fa
  amyeroberts authored Jan 29, 2024
```
* Make test_constrained_beam_search_generate as flaky

* Update tests/generation/test_utils.py
```
  9e8f35fa
19 Jan, 2024 2 commits
- Fix `_speculative_sampling` implementation (#28508) · 9efec114
  Ofir Zafrir authored Jan 19, 2024
  
  9efec114
- feat: Sequential beam search (#26304) · d4fc1eb4
  Saibo-creator authored Jan 19, 2024
  
  d4fc1eb4
16 Jan, 2024 1 commit
- Config: warning when saving generation kwargs in the model config (#28514) · f4f57f9d
  Joao Gante authored Jan 16, 2024
  
  f4f57f9d
15 Jan, 2024 1 commit
- Generate: consolidate output classes (#28494) · 7e0ddf89
  Joao Gante authored Jan 15, 2024
  
  7e0ddf89
13 Jan, 2024 1 commit

Adding Prompt lookup decoding (#27775) · e304f976

Apoorv Saxena authored Jan 13, 2024



* MVP

* fix ci

* more ci

* remove redundant kwarg

* added and wired up PromptLookupCandidateGenerator

* rebased with main, working

* removed print

* style fixes

* fix test

* fixed tests

* added test for prompt lookup decoding

* fixed circleci

* fixed test issue

* Update src/transformers/generation/candidate_generator.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/candidate_generator.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/candidate_generator.py

* Update src/transformers/generation/candidate_generator.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

e304f976

12 Jan, 2024 1 commit
- Generate: refuse to save bad generation config files (#28477) · afc45b13
  Joao Gante authored Jan 12, 2024
  
  afc45b13
10 Jan, 2024 1 commit

[BUG] BarkEosPrioritizerLogitsProcessor eos_token_id use list, tensor size mismatch (#28201) · 4df1d696

HanHui authored Jan 10, 2024



fix(generation/logits_process.py): BarkEosPrioritizerLogitsProcessor eos_token_id use list, tensor size mismatch
Co-authored-by: chenhanhui <chenhanhui@kanzhun.com>

4df1d696

14 Dec, 2023 1 commit
- Generate: assisted decoding now uses `generate` for the assistant (#28030) · 9e5c28c5
  Joao Gante authored Dec 14, 2023
```
generate refactor
```
  9e5c28c5
08 Dec, 2023 3 commits

Fix remaining issues in beam score calculation (#27808) · b31905d1

Xin Qiu authored Dec 08, 2023

* Fix issues in add and is_done for BeamHypotheses

* make newly added arguments optional for better compatibility

* Directly use cur_len as generated_len, add note for retrocompatibility

* update test expectation

* make cur_len represents the length of the entire sequence including the decoder prompt

* remove redundant if/else in testing

b31905d1

Fix: Raise informative exception when `prefix_allowed_tokens_fn` return empty... · 56be5e80

Saibo-creator authored Dec 08, 2023


Fix: Raise informative exception when `prefix_allowed_tokens_fn` return empty set of tokens (#27797)
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

56be5e80

Generate: New `Cache` abstraction and Attention Sinks support (#26681) · 633215ba

Tom Aarsen authored Dec 08, 2023

* Draft version of new KV Caching

This should allow Attention Sinks (https://github.com/tomaarsen/attention_sinks)
/ StreamingLLM (https://arxiv.org/abs/2309.17453) to be easily implemented
in a third-party or in transformers directly

* Address numerous PR suggestions

1. Move layer_idx from cache to ...Attention. Removes confusing set_layer_idx magic.
2. Always convert past_key_values to Cache instance at the start of ...Attention, removes all other isinstance calls.
3. Remove __bool__ and __getitem__ magic as they're confusing.
4. past_key_values.update(key, value, idx) now returns key, value.
5. Add use_legacy_cache flag, defaults to None, i.e. Falsey. This breaks generate for now, until 1) the cache is used is generate() or 2) use_legacy_cache is defaulted to True in generate() until we change it in another PR.
6. Separate key_cache and value_cache.

Some work is still needed to see if the SinkCache can conveniently be implemented with just one update method.

* Implement the SinkCache through backward+forward rotations

* Integrate (Sink)Cache with Llama FA2

* Set use_legacy_cache=True as default, allows for test passes

* Move from/to_legacy_cache to ...Model class

* Undo unnecessary newline change

* Remove copy utility from deprecated OpenLlama

* Match import style

* manual rebase with main

* Cache class working with generate (#1)

* Draft version of new KV Caching

This should allow Attention Sinks (https://github.com/tomaarsen/attention_sinks)
/ StreamingLLM (https://arxiv.org/abs/2309.17453

) to be easily implemented
in a third-party or in transformers directly

* Address numerous PR suggestions

Some work is still needed to see if the SinkCache can conveniently be implemented with just one update method.

* Integrate (Sink)Cache with Llama FA2

* Move from/to_legacy_cache to ...Model class

* Undo unnecessary newline change

* Match import style

* working generate

* Add tests; Simplify code; Apply changes to Mistral and Persimmon

* fix rebase mess

* a few more manual fixes

* last manual fix

* propagate changes to phi

* upgrade test

* add use_legacy_cache docstring; beef up tests

* reintroduce unwanted deletes

---------
Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com>

* move import

* add default to model_kwargs.get('use_legacy_cache')

* correct failing test

* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* apply PR suggestions

* fix failing test

* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com>

* PR comments

* tmp commit

* add docstrings

* more tests, more docstrings, add to docs

* derp

* tmp commit

* tmp dbg

* more dbg

* fix beam search bug

* cache can be a list of tuples in some models

* fix group beam search

* all but sinkcache integration tests

* fix sink cache and add hard integration test

* now also compatible with input_embeds input

* PR comments

* add Cache support to Phi+FA2

* make fixup

---------
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

633215ba

30 Nov, 2023 1 commit
- Generate: `GenerationConfig` throws an exception when `generate` args are passed (#27757) · 510270af
  Joao Gante authored Nov 30, 2023
  
  510270af
24 Nov, 2023 1 commit

Deprecate `TransfoXL` (#27607) · 7293fdc5

Yih-Dar authored Nov 24, 2023



* fix

* fix

* trigger

* Apply suggestions from code review
Co-authored-by: Lysandre Debut <hi@lysand.re>

* tic

* revert

* revert

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>

7293fdc5

17 Nov, 2023 1 commit
- Generate: fix flaky tests (#27543) · 913d03dc
  Joao Gante authored Nov 17, 2023
  
  913d03dc
16 Nov, 2023 1 commit
- Generate: improve assisted generation tests (#27540) · 12b50c61
  Joao Gante authored Nov 16, 2023
  
  12b50c61
15 Nov, 2023 2 commits

🚨

Fix beam score calculation issue for decoder-only models (#27351) · 453079c7

Xin Qiu authored Nov 15, 2023



* Fix beam score calculation issue for decoder-only models

* Update beam search test and fix code quality issue

* Fix beam_sample, group_beam_search and constrained_beam_search

* Split test for pytorch and TF, add documentation

---------
Co-authored-by: Xin Qiu <xin.qiu@sentient.ai>

453079c7

[`CircleCI`] skip test_assisted_decoding_sample for everyone (#27511) · 1e0e2dd3

Arthur authored Nov 15, 2023

* skip 4 tests

* nits

* style

* wow it's not my day

* skip new failing tests

* style

* skip for NLLB MoE as well

* skip `test_assisted_decoding_sample` for everyone

1e0e2dd3

07 Nov, 2023 1 commit
- Generate: skip tests on unsupported models instead of passing (#27265) · 90b4adc1
  Joao Gante authored Nov 07, 2023
  
  90b4adc1
02 Nov, 2023 1 commit
- Generate: return `past_key_values` (#25086) · a6c82d45
  Joao Gante authored Nov 02, 2023
  
  a6c82d45
01 Nov, 2023 1 commit

[WhisperForCausalLM] Add WhisperForCausalLM for speculative decoding (#27195) · 391d14e8

Patrick von Platen authored Nov 01, 2023



* finish

* add tests

* fix all tests

* [Assistant Decoding] Add test

* fix more

* better

* finish

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* finish

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

391d14e8

31 Oct, 2023 1 commit

device agnostic models testing (#27146) · 50378cbf

Hz, Ji authored Nov 01, 2023

* device agnostic models testing

* add decorator `require_torch_fp16`

* make style

* apply review suggestion

* Oops, the fp16 decorator was misused

50378cbf