Commits · aee11fe427b2f2fd66c3ef3cd91757ec00420ac9 · chenpangpang / transformers

16 Feb, 2024 2 commits

Fix max_length criteria when using inputs_embeds (#28994) · aee11fe4

Raushan Turganbay authored Feb 16, 2024



* fix max_length for inputs_embeds

* make style

* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Static Cache: load models with MQA or GQA (#28975)

* fix

* fix tests

* fix tests

* Update src/transformers/generation/utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* more fixes

* make style

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

aee11fe4

Update all references to canonical models (#29001) · f497f564
Lysandre Debut authored Feb 16, 2024
```
* Script & Manual edition

* Update
```
f497f564

08 Feb, 2024 1 commit

Support batched input for decoder start ids (#28887) · d6286646

Raushan Turganbay authored Feb 08, 2024



* support batched input for decoder start ids

* Fix typos
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* minor changes

* fix: decoder_start_id as list

* empty commit

* empty commit

* empty commit

* empty commit

* empty commit

* empty commit

* empty commit

* empty commit

* empty commit

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

d6286646

29 Jan, 2024 1 commit
- Mark test_constrained_beam_search_generate as flaky (#28757) · 9e8f35fa
  amyeroberts authored Jan 29, 2024
```
* Make test_constrained_beam_search_generate as flaky

* Update tests/generation/test_utils.py
```
  9e8f35fa
19 Jan, 2024 2 commits
- Fix `_speculative_sampling` implementation (#28508) · 9efec114
  Ofir Zafrir authored Jan 19, 2024
  
  9efec114
- feat: Sequential beam search (#26304) · d4fc1eb4
  Saibo-creator authored Jan 19, 2024
  
  d4fc1eb4
15 Jan, 2024 1 commit
- Generate: consolidate output classes (#28494) · 7e0ddf89
  Joao Gante authored Jan 15, 2024
  
  7e0ddf89
13 Jan, 2024 1 commit

Adding Prompt lookup decoding (#27775) · e304f976

Apoorv Saxena authored Jan 13, 2024



* MVP

* fix ci

* more ci

* remove redundant kwarg

* added and wired up PromptLookupCandidateGenerator

* rebased with main, working

* removed print

* style fixes

* fix test

* fixed tests

* added test for prompt lookup decoding

* fixed circleci

* fixed test issue

* Update src/transformers/generation/candidate_generator.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/candidate_generator.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/candidate_generator.py

* Update src/transformers/generation/candidate_generator.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

e304f976

14 Dec, 2023 1 commit
- Generate: assisted decoding now uses `generate` for the assistant (#28030) · 9e5c28c5
  Joao Gante authored Dec 14, 2023
```
generate refactor
```
  9e5c28c5
08 Dec, 2023 1 commit

Generate: New `Cache` abstraction and Attention Sinks support (#26681) · 633215ba

Tom Aarsen authored Dec 08, 2023

* Draft version of new KV Caching

This should allow Attention Sinks (https://github.com/tomaarsen/attention_sinks)
/ StreamingLLM (https://arxiv.org/abs/2309.17453) to be easily implemented
in a third-party or in transformers directly

* Address numerous PR suggestions

1. Move layer_idx from cache to ...Attention. Removes confusing set_layer_idx magic.
2. Always convert past_key_values to Cache instance at the start of ...Attention, removes all other isinstance calls.
3. Remove __bool__ and __getitem__ magic as they're confusing.
4. past_key_values.update(key, value, idx) now returns key, value.
5. Add use_legacy_cache flag, defaults to None, i.e. Falsey. This breaks generate for now, until 1) the cache is used is generate() or 2) use_legacy_cache is defaulted to True in generate() until we change it in another PR.
6. Separate key_cache and value_cache.

Some work is still needed to see if the SinkCache can conveniently be implemented with just one update method.

* Implement the SinkCache through backward+forward rotations

* Integrate (Sink)Cache with Llama FA2

* Set use_legacy_cache=True as default, allows for test passes

* Move from/to_legacy_cache to ...Model class

* Undo unnecessary newline change

* Remove copy utility from deprecated OpenLlama

* Match import style

* manual rebase with main

* Cache class working with generate (#1)

* Draft version of new KV Caching

This should allow Attention Sinks (https://github.com/tomaarsen/attention_sinks)
/ StreamingLLM (https://arxiv.org/abs/2309.17453

) to be easily implemented
in a third-party or in transformers directly

* Address numerous PR suggestions

Some work is still needed to see if the SinkCache can conveniently be implemented with just one update method.

* Integrate (Sink)Cache with Llama FA2

* Move from/to_legacy_cache to ...Model class

* Undo unnecessary newline change

* Match import style

* working generate

* Add tests; Simplify code; Apply changes to Mistral and Persimmon

* fix rebase mess

* a few more manual fixes

* last manual fix

* propagate changes to phi

* upgrade test

* add use_legacy_cache docstring; beef up tests

* reintroduce unwanted deletes

---------
Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com>

* move import

* add default to model_kwargs.get('use_legacy_cache')

* correct failing test

* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* apply PR suggestions

* fix failing test

* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com>

* PR comments

* tmp commit

* add docstrings

* more tests, more docstrings, add to docs

* derp

* tmp commit

* tmp dbg

* more dbg

* fix beam search bug

* cache can be a list of tuples in some models

* fix group beam search

* all but sinkcache integration tests

* fix sink cache and add hard integration test

* now also compatible with input_embeds input

* PR comments

* add Cache support to Phi+FA2

* make fixup

---------
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

633215ba

24 Nov, 2023 1 commit

Deprecate `TransfoXL` (#27607) · 7293fdc5

Yih-Dar authored Nov 24, 2023



* fix

* fix

* trigger

* Apply suggestions from code review
Co-authored-by: Lysandre Debut <hi@lysand.re>

* tic

* revert

* revert

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>

7293fdc5

17 Nov, 2023 1 commit
- Generate: fix flaky tests (#27543) · 913d03dc
  Joao Gante authored Nov 17, 2023
  
  913d03dc
16 Nov, 2023 1 commit
- Generate: improve assisted generation tests (#27540) · 12b50c61
  Joao Gante authored Nov 16, 2023
  
  12b50c61
15 Nov, 2023 1 commit

[`CircleCI`] skip test_assisted_decoding_sample for everyone (#27511) · 1e0e2dd3

Arthur authored Nov 15, 2023

* skip 4 tests

* nits

* style

* wow it's not my day

* skip new failing tests

* style

* skip for NLLB MoE as well

* skip `test_assisted_decoding_sample` for everyone

1e0e2dd3

07 Nov, 2023 1 commit
- Generate: skip tests on unsupported models instead of passing (#27265) · 90b4adc1
  Joao Gante authored Nov 07, 2023
  
  90b4adc1
02 Nov, 2023 1 commit
- Generate: return `past_key_values` (#25086) · a6c82d45
  Joao Gante authored Nov 02, 2023
  
  a6c82d45
01 Nov, 2023 1 commit

[WhisperForCausalLM] Add WhisperForCausalLM for speculative decoding (#27195) · 391d14e8

Patrick von Platen authored Nov 01, 2023



* finish

* add tests

* fix all tests

* [Assistant Decoding] Add test

* fix more

* better

* finish

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* finish

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

391d14e8

31 Oct, 2023 1 commit

device agnostic models testing (#27146) · 50378cbf

Hz, Ji authored Nov 01, 2023

* device agnostic models testing

* add decorator `require_torch_fp16`

* make style

* apply review suggestion

* Oops, the fp16 decorator was misused

50378cbf

23 Oct, 2023 1 commit

Add Seamless M4T model (#25693) · cb45f71c

Yoach Lacombe authored Oct 23, 2023



* first raw commit

* still POC

* tentative convert script

* almost working speech encoder conversion scripts

* intermediate code for encoder/decoders

* add modeling code

* first version of speech encoder

* make style

* add new adapter layer architecture

* add adapter block

* add first tentative config

* add working speech encoder conversion

* base model convert works now

* make style

* remove unnecessary classes

* remove unecessary functions

* add modeling code speech encoder

* rework logics

* forward pass of sub components work

* add modeling codes

* some config modifs and modeling code modifs

* save WIP

* new edits

* same output speech encoder

* correct attention mask

* correct attention mask

* fix generation

* new generation logics

* erase comments

* make style

* fix typo

* add some descriptions

* new state

* clean imports

* add tests

* make style

* make beam search and num_return_sequences>1 works

* correct edge case issue

* correct SeamlessM4TConformerSamePadLayer copied from

* replace ACT2FN relu by nn.relu

* remove unecessary return variable

* move back a class

* change name conformer_attention_mask ->conv_attention_mask

* better nit code

* add some Copied from statements

* small nits

* small nit in dict.get

* rename t2u model -> conditionalgeneration

* ongoing refactoring of structure

* update models architecture

* remove SeamlessM4TMultiModal classes

* add tests

* adapt tests

* some non-working code for vocoder

* add seamlessM4T vocoder

* remove buggy line

* fix some hifigan related bugs

* remove hifigan specifc config

* change

* add WIP tokenization

* add seamlessM4T working tokenzier

* update tokenization

* add tentative feature extractor

* Update converting script

* update working FE

* refactor input_values -> input_features

* update FE

* changes in generation, tokenizer and modeling

* make style and add t2u_decoder_input_ids

* add intermediate outputs for ToSpeech models

* add vocoder to speech models

* update valueerror

* update FE with languages

* add vocoder convert

* update config docstrings and names

* update generation code and configuration

* remove todos and update config.pad_token_id to generation_config.pad_token_id

* move block vocoder

* remove unecessary code and uniformize tospeech code

* add feature extractor import

* make style and fix some copies from

* correct consistency + make fix-copies

* add processor code

* remove comments

* add fast tokenizer support

* correct pad_token_id in M4TModel

* correct config

* update tests and codes  + make style

* make some suggested correstion - correct comments and change naming

* rename some attributes

* rename some attributes

* remove unecessary sequential

* remove option to use dur predictor

* nit

* refactor hifigan

* replace normalize_mean and normalize_var with do_normalize + save lang ids to generation config

* add tests

* change tgt_lang logic

* update generation ToSpeech

* add support import SeamlessM4TProcessor

* fix generate

* make tests

* update integration tests, add option to only return text and update tokenizer fast

* fix wrong function call

* update import and convert script

* update integration tests + update repo id

* correct paths and add first test

* update how new attention masks are computed

* update tests

* take first care of batching in vocoder code

* add batching with the vocoder

* add waveform lengths to model outputs

* make style

* add generate kwargs + forward kwargs of M4TModel

* add docstrings forward methods

* reformate docstrings

* add docstrings t2u model

* add another round of modeling docstrings + reformate speaker_id -> spkr_id

* make style

* fix check_repo

* make style

* add seamlessm4t to toctree

* correct check_config_attributes

* write config docstrings + some modifs

* make style

* add docstrings tokenizer

* add docstrings to processor, fe and tokenizers

* make style

* write first version of model docs

* fix FE + correct FE test

* fix tokenizer + add correct integration tests

* fix most tokenization tests

* make style

* correct most processor test

* add generation tests and fix num_return_sequences > 1

* correct integration tests -still one left

* make style

* correct position embedding

* change numbeams to 1

* refactor some modeling code and correct one test

* make style

* correct typo

* refactor intermediate fnn

* refactor feedforward conformer

* make style

* remove comments

* make style

* fix tokenizer tests

* make style

* correct processor tests

* make style

* correct S2TT integration

* Apply suggestions from Sanchit code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* correct typo

* replace torch.nn->nn + make style

* change Output naming (waveforms -> waveform) and ordering

* nit renaming and formating

* remove return None when not necessary

* refactor SeamlessM4TConformerFeedForward

* nit typo

* remove almost copied from comments

* add a copied from comment and remove an unecessary dropout

* remove inputs_embeds from speechencoder

* remove backward compatibiliy function

* reformate class docstrings for a few components

* remove unecessary methods

* split over 2 lines smthg hard to read

* make style

* replace two steps offset by one step as suggested

* nice typo

* move warnings

* remove useless lines from processor

* make generation non-standard test more robusts

* remove torch.inference_mode from tests

* split integration tests

* enrich md

* rename control_symbol_vocoder_offset->vocoder_offset

* clean convert file

* remove tgt_lang and src_lang from FE

* change generate docstring of ToText models

* update generate docstring of tospeech models

* unify how to deal withtext_decoder_input_ids

* add default spkr_id

* unify tgt_lang for t2u_model

* simplify tgt_lang verification

* remove a todo

* change config docstring

* make style

* simplify t2u_tgt_lang_id

* make style

* enrich/correct comments

* enrich .md

* correct typo in docstrings

* add torchaudio dependency

* update tokenizer

* make style and fix copies

* modify SeamlessM4TConverter with new tokenizer behaviour

* make style

* correct small typo docs

* fix import

* update docs and add requirement to tests

* add convert_fairseq2_to_hf in utils/not_doctested.txt

* update FE

* fix imports and make style

* remove torchaudio in FE test

* add seamless_m4t.md to utils/not_doctested.txt

* nits and change the way docstring dataset is loaded

* move checkpoints from ylacombe/ to facebook/ orga

* refactor warning/error to be in the 119 line width limit

* round overly precised floats

* add stereo audio behaviour

* refactor .md and make style

* enrich docs with more precised architecture description

* readd undocumented models

* make fix-copies

* apply some suggestions

* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* correct bug from previous commit

* refactor a parameter allowing to clean the code + some small nits

* clean tokenizer

* make style and fix

* make style

* clean tokenizers arguments

* add precisions for some tests

* move docs from not_tested to slow

* modify tokenizer according to last comments

* add copied from statements in tests

* correct convert script

* correct parameter docstring style

* correct tokenization

* correct multi gpus

* make style

* clean modeling code

* make style

* add copied from statements

* add copied statements

* add support with ASR pipeline

* remove file added inadvertently

* fix docstrings seamlessM4TModel

* add seamlessM4TConfig to OBJECTS_TO_IGNORE due of unconventional markdown

* add seamlessm4t to assisted generation ignored models

---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

cb45f71c

11 Oct, 2023 2 commits

[Assistant Generation] Improve Encoder Decoder (#26701) · da69de17

Patrick von Platen authored Oct 11, 2023

* [Assistant Generation] Improve enc dec

* save more

* Fix logit processor checks

* Clean

* make style

* fix deprecation

* fix generation test

* Apply suggestions from code review

* fix biogpt

* make style

da69de17

In assisted decoding, pass model_kwargs to model's forward call (fix... · dcc49d8a

Billy Bradley authored Oct 11, 2023

In assisted decoding, pass model_kwargs to model's forward call (fix prepare_input_for_generation in all models) (#25242)

* In assisted decoding, pass model_kwargs to model's forward call

Previously, assisted decoding would ignore any additional kwargs
that it doesn't explicitly handle. This was inconsistent with other
generation methods, which pass the model_kwargs through
prepare_inputs_for_generation and forward the returned dict to the
model's forward call.

The prepare_inputs_for_generation method needs to be amended in all
models, as previously it only kept the last input ID when a past_key_values
was passed.

* Improve variable names in _extend_attention_mask

* Refactor extending token_type_ids into a function

* Replace deepcopy with copy to optimize performance

* Update new persimmon model with llama changes for assisted generation

* Update new mistral model for assisted generation with prepare_inputs_for_generation

* Update position_ids creation in falcon prepare_inputs_for_generation to support assisted generation

dcc49d8a

14 Sep, 2023 1 commit

Fix beam search when using model parallel (#24969) · 8881f38a

Dong-Yong Lee authored Sep 15, 2023



* Fix GPTNeoX beam search when using parallelize

* Fix beam search idx device when using model parallel

* remove onnx related stuff
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix: move test_beam_search_on_multi_gpu to GenerationTesterMixin

* fix: add right item to _no_split_modules of MegaPreTrainedModel

* fix: add num_beams within parallelized beam_search test
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

8881f38a

12 Sep, 2023 1 commit
- Generate: legacy mode is only triggered when `generation_config` is untouched (#25962) · 3319eb54
  Joao Gante authored Sep 12, 2023
  
  3319eb54
23 Aug, 2023 1 commit
- Generate: general test for decoder-only generation from `inputs_embeds` (#25687) · 3c2383b1
  Joao Gante authored Aug 23, 2023
```
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
```
  3c2383b1
16 Aug, 2023 1 commit
- Generate: fix default max length warning (#25539) · 3f9cb335
  Joao Gante authored Aug 16, 2023
  
  3f9cb335
09 Aug, 2023 1 commit

aligned sample_beam output selection with beam_search (#25375) · cb3c821c

hukuda222 authored Aug 10, 2023



* aligned sample_beam specs with beam_search

* pull origin main

* Revert "pull origin main"

This reverts commit 06d356f1137bb52272e120a03636598c44449cf3.

* update test_utils.py

* fix format

* remove comment

---------
Co-authored-by: Shogo Fujita <shogo.fujita@legalontech.jp>

cb3c821c

06 Aug, 2023 1 commit
- add CFG for .generate() (#24654) · d5334651
  Guillaume "Vermeille" Sanchez authored Aug 06, 2023
  
  d5334651
20 Jul, 2023 1 commit
- Contrastive Search peak memory reduction (#24120) · caf5e369
  Benjamin Badger authored Jul 20, 2023
```
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
```
  caf5e369
27 Jun, 2023 1 commit
- Generate: `group_beam_search` requires `diversity_penalty>0.0` (#24456) · 5f3efdf7
  Joao Gante authored Jun 27, 2023
```
* add exception

* update docs
```
  5f3efdf7
23 Jun, 2023 1 commit

Replace python random with torch.rand to enable dynamo.export (#24434) · a28325e2

Bowen Bao authored Jun 23, 2023

* Replace python random with torch.rand to enable dynamo.export

* revert changes to flax model code

* Remove unused random import

* Fix torch template

* Move torch.manual_seed(0) to right location

a28325e2

07 Jun, 2023 1 commit
- Generate: increase left-padding test atol (#23448) · 612b2a1a
  Joao Gante authored Jun 07, 2023
```
increase atol
```
  612b2a1a
18 May, 2023 2 commits
- Less flaky `test_assisted_decoding_matches_greedy_search` (#23451) · 2406dbdc
  Yih-Dar authored May 18, 2023
```
* fix

* fix

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  2406dbdc
- Generate: skip left-padding tests on old models (#23437) · aea7b23b
  Joao Gante authored May 18, 2023
  
  aea7b23b
16 May, 2023 1 commit
- Generate: add test to check KV format (#23403) · 918a06e2
  Joao Gante authored May 16, 2023
```
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
```
  918a06e2
08 May, 2023 1 commit
- Generate: starcoder 🤜 🤛 assisted generation (#23182) · bbfb9fc2
  Joao Gante authored May 08, 2023
```
* starcoder has joined the chat

* indexing that works for all
```
  bbfb9fc2
03 May, 2023 1 commit
- Generate: slow assisted generation test (#23125) · ce31e3c8
  Joao Gante authored May 03, 2023
  
  ce31e3c8
29 Apr, 2023 1 commit
- Generate: prepare assisted generation for release (#23052) · 849367cc
  Joao Gante authored Apr 29, 2023
  
  849367cc
24 Apr, 2023 1 commit
- Generate: assisted generation with sample (take 2) (#22949) · e4a97f82
  Joao Gante authored Apr 24, 2023
```
* temperature controls speed
```
  e4a97f82
18 Apr, 2023 2 commits

Generate: Add assisted generation (#22211) · 78cda46f

Joao Gante authored Apr 18, 2023

* working mvp

* remove breakpoint

* fix commit

* standardize outputs

* tmp commit

* tests almost ready

* tmp commit

* skip a few models

* Add streaming; Docs and examples

* document limitations

* PR commits

* Amy PR comments

78cda46f

Fix `test_eos_token_id_int_and_list_top_k_top_sampling` (#22826) · 90247d3e
Yih-Dar authored Apr 18, 2023
```
* fix

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
90247d3e