Commits · e85d86398ac92261d3341a846990fe61103a3a9b · chenpangpang / transformers

06 Aug, 2024 1 commit

add the missing flash attention test marker (#32419) · e85d8639

Fanli Lin authored Aug 06, 2024

* add flash attention check

* fix

* fix

* add the missing marker

* bug fix

* add one more

* remove order

* add one more

e85d8639

05 Aug, 2024 4 commits
- fix: Updated `test_embeded_special_tokens` for luke and mluke models (#32413) · 458b0cd2
  Sai-Suraj-27 authored Aug 05, 2024
```
Fixed tokenizertests for luke, mluke models.
```
  458b0cd2
- Persist embedding type of BART and mBART models after resize (#32242) · baf7e5c9
  Abdi authored Aug 05, 2024
```
* fix: persist embedding type of MBartConditonalGeneration after resize

* fix: persist embedding type of BartConditonalGeneration after resize
```
  baf7e5c9
- Phi3 tests: fix typing for Python 3.8 (#32388) · 3bb646a5
  Raushan Turganbay authored Aug 05, 2024
```
fix phi
```
  3bb646a5
- fix: SeamlessM4TFeatureExtractor stride remainder (#32088) · 05ae3a30
  TechInterMezzo authored Aug 05, 2024
```
* fix: SeamlessM4TFeatureExtractor stride remainder

* Added attention mask size test

* Reran ruff for style correction
```
  05ae3a30
01 Aug, 2024 2 commits

Remove size check between attn_weights and kv_seq_len for phi3 (#32339) · 48ed24c5
Lunwen He authored Aug 01, 2024
```
* Remove size check between attn_weights and kv_seq_len

* add unit tests
```
48ed24c5

[whisper] compile compatibility with long-form decoding (#31772) · e234061c

Sanchit Gandhi authored Aug 01, 2024

* [whisper] compile compatibility with long-form decoding

* clarify comment

* fix after rebase

* finalise

* fix bsz

* fix cache split

* remove contiguous

* style

* finish

* update doc

* prevent cuda graph trace

e234061c

31 Jul, 2024 4 commits

>3-5x faster torch.compile forward compilation for autoregressive decoder models (#32227) · 92abe603

fxmarty authored Jul 31, 2024



* draft

* apply changes to all relevant archs

* rerun ci - check_docstrings.py failing?

* fix docstring

* move 2D->4D mask creation to modeling file

* repo consistency

* fix the batch size = 1 case - calling contiguous is not enough

* nit

* style

* propagate to gemma/gemma-2

* prepare inputs for gemma generation

* implement test and tiny fix in gemma2

* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix copies

* ci pass

* fix gemma's test_compile_static_cache tests

* flacky

* retrigger ci

---------
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

92abe603

[Idefics2] - Fix FA2 call for Perceiver layer (#32275) · 5f1fcc29

amyeroberts authored Jul 31, 2024

* Fix FA2 call for Perciever layer

* [run_slow] idefics2

* [run_slow] idefics2

* [run_slow] idefics2

* Fix up

* [run_slow] idefics2

* [run_slow] idefics2

* [run_slow] idefics2

5f1fcc29

Llama 3.1: Fix incorrect `inv_freq` assignment (#32330) · b75ad566
Joao Gante authored Jul 31, 2024
```
fix 💩
```
b75ad566

Gemma2 and flash-attention (#32188) · 7f552e28

Raushan Turganbay authored Jul 31, 2024

* enable flash-attn & static cache

* this works, not the prev

* fix for sliding window layers

* not needed anymore

7f552e28

30 Jul, 2024 1 commit

Fix slow GemmaTokenizer and improve SPM slow -> fast conversion process (#32191) · 6e2d04e4

Joshua Lochner authored Jul 30, 2024

* Remove user-defined tokens which can be obtained through merges

* Remove debug line

* formatting

* Refactor spm slow -> fast converter

* revert unnecessary refactor

* set comprehension

* remove test files

* Use `vocab_scores`

* Always replace spiece underline with space in decode

* we no longer need token filtering

* Add save fast load slow unit test

* Remove tokenizers version check

* Remove duplicate code

* Make `<start_of_turn>` and `<end_of_turn>` special tokens

* Bias merge priority with length if score is the same

* Add unit test for merge priority

* CI

6e2d04e4

29 Jul, 2024 2 commits

Whisper tokenizer word level timestamps (#32197) · 3fbaaaa6

Kamil Akesbi authored Jul 29, 2024

* fix _fix_key in PreTrainedModel

* fix _find_longest_common_sequence

* add test

* remove result.json

* nit

* update test

3fbaaaa6

Generate: end-to-end compilation (#30788) · 7ffe25f2

Joao Gante authored Jul 29, 2024

* mvp

* added test (a few models need fixes)

* fix a few test cases

* test nits

* harder test 😈

* revert changes in stablelm

* test with improved condition

* add todo

* tmp commit

* merged with main

* nits

* add todo

* final corrections

* add docs for generation compilation

* docs nits

* add  tip

* PR suggestions

* add more details to the compilation docs

* fix cache positions

* cache is now init in generate; update docs

* tag test as flaky

* docs

* post rebase make fixup and other nits

* remove unintended changes

* whisper (encoder-decoder) not supported

* move token default updates to ; add tests for token defaults

* push changes

* manual rebase

* chameleon doesn't support this

* fix test_static_cache_mha_mqa_gqa (broken in another PR)

* docs: dynamic is better with end-to-end compilation

7ffe25f2

26 Jul, 2024 2 commits
- Refactor: Removed un-necessary `object` base class (#32230) · b8e5cd53
  Sai-Suraj-27 authored Jul 26, 2024
```
* Refactored to remove un-necessary object base class.

* small fix.
```
  b8e5cd53
- Llava: generate without images (#32183) · fad15fba
  Raushan Turganbay authored Jul 26, 2024
```
* llava w/o images

* tests
```
  fad15fba
25 Jul, 2024 3 commits
- Follow up for #31973 (#32025) · df6eee92
  Yih-Dar authored Jul 25, 2024
```
* fix

* [test_all] trigger full CI

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  df6eee92
- [warnings] fix E721 warnings (#32223) · de231889
  Kashif Rasul authored Jul 25, 2024
```
fix E721 warnings
```
  de231889
- [whisper] fix short-form output type (#32178) · 5658e749
  Sanchit Gandhi authored Jul 25, 2024
```
* [whisper] fix short-form output type

* add test

* make style

* update long-form tests

* fixes

* last fix

* finalise test
```
  5658e749
24 Jul, 2024 3 commits

fix: Replaced deprecated `unittest method` with the correct one (#32198) · 85a1269e
Sai-Suraj-27 authored Jul 24, 2024
```
Replaced deprecated unittest method with the correct one.
```
85a1269e

🚨

No more default chat templates (#31733) · edd68f4e

Matt authored Jul 24, 2024

* No more default chat templates

* Add the template to the GPT-SW3 tests since it's not available by default now

* Fix GPT2 test

* Fix Bloom test

* Fix Bloom test

* Remove default templates again

edd68f4e

RoPE: relaxed rope validation (#32182) · e0182f3b

Joao Gante authored Jul 24, 2024

* relaxed rope check

* lets also accept rope_type=None, defaulting to the original implementation

* type and rope_type can coexist

e0182f3b

23 Jul, 2024 9 commits

Updated `ruff` to the latest version (#31926) · d2c687b3

Sai-Suraj-27 authored Jul 23, 2024

* Updated ruff version and fixed the required code accorindg to the latest version.

* Updated ruff version and fixed the required code accorindg to the latest version.

* Added noqa directive to ignore 1 error shown by ruff

d2c687b3

Revert "Incorrect Whisper long-form decoding timestamps " (#32148) · 3263b343
Sanchit Gandhi authored Jul 23, 2024
```
Revert "Incorrect Whisper long-form decoding timestamps  (#32003)"

This reverts commit cd48553f.
```
3263b343

Rename Phi-3 rope scaling type (#31436) · 034b4778

Amit Garg authored Jul 23, 2024

* renamed phi3 rope_scaling type

* fixed trailing whitespaces

* fixed test

* added warning

* fixed format

034b4778

Fix video batching to videollava (#32139) · 9ced33ca
Merve Noyan authored Jul 23, 2024
```
---------
Co-authored-by: Merve Noyan <mervenoyan@Merve-MacBook-Pro.local>
```
9ced33ca

Llama: RoPE refactor (#32135) · 2e113422

Joao Gante authored Jul 23, 2024


Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

2e113422

Add YaRN and Dynamic-YaRN RoPE Scaling Methods (#30910) · 34b43211

mig-mfreitas authored Jul 23, 2024

* Add YaRN and Dynamic-YaRN RoPE Scaling Methods

YaRN (Yet another RoPE extension method) combines the NTK-By-Parts
Interpolation and Attention Scaling methods, improving upon existing
RoPE interpolation methods for longer context window sizes.

Fine-tuned models maintain their original performance across benchmarks
while enabling efficient extrapolation and transfer learning for
quicker convergence, especially in compute-limited environments.

We implement YaRN and Dynamic-YaRN for the following list of models:

 - LLaMA
 - Falcon
 - GPT-NeoX
 - Olmo
 - Persimmon
 - Phi
 - StableLM
 - OpenLLaMA

New unit tests are added to assert YaRN's correct behavior on both
short and long sequence inputs.

For more details, please refer to https://arxiv.org/abs/2309.00071

.
Co-authored-by: Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt>

* Refactor YaRN implementation for LLaMA

Iterate on YaRN implementation for LLaMA and remove diff from remaining
models for increased PR modularity.

This commit includes the following changes:
- Merge 'yarn_rope_scaling' and 'rope_scaling' dictionaries
- Remove unnecessary attributes ('extrapolation_factor' and 'finetuned')
  from YaRN classes
- Inherit 'forward' method in YaRN classes from superclass
- Rename 'yarn' method to 'compute_yarn_scaling'
- Extend YaRN tests with further assertions
- Fix style inconsistencies
Co-authored-by: Miguel Monte e Freitas <miguelmontefreitas@tecnico.ulisboa.pt>

* Refactor Tensor Building Logic for YaRN

- Comply with the the tensor building logic introduced in #30743
- Add referencing to the optimized Attention Factor equation
- Remove Dynamic YaRN for a more agile deployment
Co-authored-by: mig-mfreitas <mig-mfreitas@users.noreply.github.com>

* remove unwanted file

---------
Co-authored-by: Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt>
Co-authored-by: mig-mfreitas <mig-mfreitas@users.noreply.github.com>
Co-authored-by: Joao Gante <joao@huggingface.co>

34b43211

Fix mask creations of `GPTNeoX` and `GPT2` (#31944) · 605f3245

Anton Vlasjuk authored Jul 23, 2024

* fix mask creation of gpt2 and gpt_neox caused by me

* forgot the reshape of masks when shape > 2

* add tests for gpt neox and gpt2

* nit on a comment

605f3245

Remove `trust_remote_code` when loading Libri Dummy (#31748) · f83c6f1d
Sanchit Gandhi authored Jul 23, 2024
```
* [whisper integration] use parquet dataset for testing

* propagate to others

* more propagation

* last one
```
f83c6f1d
LLaVaNeXT: pad on right if training (#32134) · 3aefb4ec
Raushan Turganbay authored Jul 23, 2024
```
* pad on right if training

* docs

* add tests
```
3aefb4ec

22 Jul, 2024 4 commits
- Return assistant generated tokens mask in apply_chat_template (#30650) · 74d0eb3f
  Yoni Gottesman authored Jul 22, 2024
```
return assistant generated tokens mask in apply_chat_template
```
  74d0eb3f
- fix: Fixed raising `TypeError` instead of `ValueError` for invalid type (#32111) · 12b6880c
  Sai-Suraj-27 authored Jul 22, 2024
```
* Raised TypeError instead of ValueError for invalid types.

* Updated formatting using ruff.

* Retrieved few changes.

* Retrieved few changes.

* Updated tests accordingly.
```
  12b6880c
- Fix failing test with race condition (#32140) · 7ba028fc
  Matt authored Jul 22, 2024
```
* Fix failing test with race condition

* make fixup

* monotonic_ns instead of randint

* uuid4 instead of monotonic_ns

* Add a finally cleanup step
```
  7ba028fc
- Mention model_info.id instead of model_info.modelId (#32106) · f2a1e3ca
  Lucain authored Jul 22, 2024
  
  f2a1e3ca
19 Jul, 2024 3 commits

Support generating with fallback for short form audio in Whisper (#30984) · 89575b56

Kamil Akesbi authored Jul 19, 2024



* remove is_shortform

* adapt _retrieve_max_frames_and_seek for short_form

* return bos token in short and long form

* add decoder_input_ids to short form audios

* add eos token for  short form

* handle short form token_timestamps

* no need to return scores

* add is_shortform conditions

* handle when max_new_tokens is None - short form

* handle assistant decoding

* fix

* handle return_dict_in_generate

* handle split_by_batch for encoder_attentions attribute

* handle num_beams>1

* handle num_return_sequences>1 in generate_with_fallback

* handle num_return_sequences>1 with return_dict_in_generate=True

* raise error if max_new_tokens + decoder_inputs_ids > max_target_pos

* fix

* apply review suggestions

* fix

* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* fix

* logits for both short form and long form

* handle if logits_processor is None

* test

* apply review changes to num_return_sequences

* add _expand_variables_for_generation

* remove short form commented section

* update comments

* uncomment num_beams line in generate_with_fallback

* update assistant decoding

* handle return_segment with short form generation

* up

* fix output format is_shortform

* overwrite beam_sample test

* update _set_return_timestamps

* apply review suggestions

* apply review suggestions

* remove seek_outputs_short_form

* fix _stack_split_outputs

* fix stack dim in _stack_split_outputs

* update tests

* fix past_key_values + beam tests

* fix

* clean _expand_variables_for_generation

* make style

* fix slow tests

* make style

* max_length condition

* make style

* add slow tests for shortform fallback

* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* apply review changes

* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* up

* fix slow tests

* apply review suggestions

* update test

* make style

* small fix

* fix

* fix test_new_cache_format

* fix past_key_values

* fix

* make style

* fix slow tests

* fix

---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

89575b56

Incorrect Whisper long-form decoding timestamps (#32003) · cd48553f

Kamil Akesbi authored Jul 19, 2024



* fix lo form timestamps in decode_batch

* Update src/transformers/models/whisper/tokenization_whisper.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* Update src/transformers/models/whisper/tokenization_whisper.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* add test

* make style

* fix copies

* Update src/transformers/models/whisper/tokenization_whisper_fast.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/whisper/tokenization_whisper.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/whisper/processing_whisper.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/whisper/tokenization_whisper.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* apply review suggestions

* fix

* fix copies

* fix

* Update src/transformers/models/whisper/tokenization_whisper_fast.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix-copies

---------
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

cd48553f

Llava: add default chat templates (#31691) · b873234c

Raushan Turganbay authored Jul 19, 2024



* add default chat templates

* Update src/transformers/models/llava/processing_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/llava_next/processing_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* more clear docstring and docs

* Update docs/source/en/model_doc/llava.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/en/model_doc/vipllava.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* add tests

* remove default templates (see #31733)

* load chat template from another file

* Update docs/source/en/model_doc/llava_next.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* revert some changes in docs

* forgot vipllava

* chat template file is not temporary hack

* warn if loading from processor

* not that file

* similarly modify `save_pretrained`

* Update tests/models/llava_next/test_processor_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/vipllava/test_processor_vipllava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/vipllava.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/processing_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/processing_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/vipllava.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/llava.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/llava.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/processing_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

b873234c

18 Jul, 2024 2 commits

Add torch.compile Support For Mamba (#31247) · c75969ee

Longjie Zheng authored Jul 18, 2024

* modify mamba cache

* set up cache

* add test

* [run-slow] mamba

* [run-slow] mamba

* address comments

* [run-slow] mamba

* use_cache_position

* [run-slow] mamba

* [run-slow] mamba

* [run-slow] mamba

* [run-slow] mamba

* fix

* cache in generate

* [run-slow] mamba

* address comments

* [run-slow] mamba

* [run-slow] mamba

* address comments

* [run-slow] mamba

* fix

* [run-slow] mamba

* fix

* [run-slow] mamba

* fix cache name

* [run-slow] mamba

c75969ee

Chameleon: minor fixes after shipping (#32037) · 673d30b8
Raushan Turganbay authored Jul 18, 2024
```
* fix merging

* make chameleon conditional
```
673d30b8