Commits · e85d86398ac92261d3341a846990fe61103a3a9b · chenpangpang / transformers

06 Aug, 2024 1 commit

add the missing flash attention test marker (#32419) · e85d8639

Fanli Lin authored Aug 06, 2024

* add flash attention check

* fix

* fix

* add the missing marker

* bug fix

* add one more

* remove order

* add one more

e85d8639

05 Aug, 2024 6 commits
- Respect the config's attn_implementation if set (#32383) · 7e5d46de
  amyeroberts authored Aug 05, 2024
```
* Respect the config's attn if set

* Update test - can override in from_config

* Fix
```
  7e5d46de
- fix: Updated `test_embeded_special_tokens` for luke and mluke models (#32413) · 458b0cd2
  Sai-Suraj-27 authored Aug 05, 2024
```
Fixed tokenizertests for luke, mluke models.
```
  458b0cd2
- Persist embedding type of BART and mBART models after resize (#32242) · baf7e5c9
  Abdi authored Aug 05, 2024
```
* fix: persist embedding type of MBartConditonalGeneration after resize

* fix: persist embedding type of BartConditonalGeneration after resize
```
  baf7e5c9
- #32184 save total_vocab_size (#32240) · 3d7c2f9d
  Ita Zaporozhets authored Aug 05, 2024
```
* save total_vocab_size = vocab_size + user added tokens to speed up operation

* updating length when added_tokens_decoder is set

* add test len(tokenizer)
```
  3d7c2f9d
- Phi3 tests: fix typing for Python 3.8 (#32388) · 3bb646a5
  Raushan Turganbay authored Aug 05, 2024
```
fix phi
```
  3bb646a5
- fix: SeamlessM4TFeatureExtractor stride remainder (#32088) · 05ae3a30
  TechInterMezzo authored Aug 05, 2024
```
* fix: SeamlessM4TFeatureExtractor stride remainder

* Added attention mask size test

* Reran ruff for style correction
```
  05ae3a30
02 Aug, 2024 1 commit
- RoPE: Add numerical tests ✨ (#32380) · 083e13b7
  Joao Gante authored Aug 02, 2024
```
tests! :D
```
  083e13b7
01 Aug, 2024 7 commits

Yell at the user if zero-3 init wasn't performed, but expected to have been done (#32299) · 82efc535

Zach Mueller authored Aug 01, 2024



* Test this zach

* Test for improper init w/o zero3

* Move back

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Get rid of stars in warning

* Make private

* Make clear

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

82efc535

Fixed Hybrid Cache Shape Initialization. (#32163) · 51ab25e2

OsamaS99 authored Aug 01, 2024



* fixed hybrid cache init, added test

* Fix Test Typo

---------
Co-authored-by: Aaron Haag <aaron.haag@siemens.com>

51ab25e2

Offloaded KV Cache (#31325) · ca59d6f7

Nikos Karampatziakis authored Aug 01, 2024

* Initial implementation of OffloadedCache

* enable usage via cache_implementation

* Address feedback, add tests, remove legacy methods.

* Remove flash-attn, discover synchronization bugs, fix bugs

* Prevent usage in CPU only mode

* Add a section about offloaded KV cache to the docs

* Fix typos in docs

* Clarifications and better explanation of streams

ca59d6f7

Fix conflicting key in init kwargs in PreTrainedTokenizerBase (#31233) · b4727a12

Omar Salman authored Aug 01, 2024

* Fix conflicting key in init kwargs in PreTrainedTokenizerBase

* Update code to check for callable key in save_pretrained

* Apply PR suggestions

* Invoke CI

* Updates based on PR suggestion

b4727a12

update clean_up_tokenization_spaces warning (#32371) · 2229ebe7
Ita Zaporozhets authored Aug 01, 2024

2229ebe7
Remove size check between attn_weights and kv_seq_len for phi3 (#32339) · 48ed24c5
Lunwen He authored Aug 01, 2024
```
* Remove size check between attn_weights and kv_seq_len

* add unit tests
```
48ed24c5

[whisper] compile compatibility with long-form decoding (#31772) · e234061c

Sanchit Gandhi authored Aug 01, 2024

* [whisper] compile compatibility with long-form decoding

* clarify comment

* fix after rebase

* finalise

* fix bsz

* fix cache split

* remove contiguous

* style

* finish

* update doc

* prevent cuda graph trace

e234061c

31 Jul, 2024 4 commits

>3-5x faster torch.compile forward compilation for autoregressive decoder models (#32227) · 92abe603

fxmarty authored Jul 31, 2024



* draft

* apply changes to all relevant archs

* rerun ci - check_docstrings.py failing?

* fix docstring

* move 2D->4D mask creation to modeling file

* repo consistency

* fix the batch size = 1 case - calling contiguous is not enough

* nit

* style

* propagate to gemma/gemma-2

* prepare inputs for gemma generation

* implement test and tiny fix in gemma2

* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix copies

* ci pass

* fix gemma's test_compile_static_cache tests

* flacky

* retrigger ci

---------
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

92abe603

[Idefics2] - Fix FA2 call for Perceiver layer (#32275) · 5f1fcc29

amyeroberts authored Jul 31, 2024

* Fix FA2 call for Perciever layer

* [run_slow] idefics2

* [run_slow] idefics2

* [run_slow] idefics2

* Fix up

* [run_slow] idefics2

* [run_slow] idefics2

* [run_slow] idefics2

5f1fcc29

Llama 3.1: Fix incorrect `inv_freq` assignment (#32330) · b75ad566
Joao Gante authored Jul 31, 2024
```
fix 💩
```
b75ad566

Gemma2 and flash-attention (#32188) · 7f552e28

Raushan Turganbay authored Jul 31, 2024

* enable flash-attn & static cache

* this works, not the prev

* fix for sliding window layers

* not needed anymore

7f552e28

30 Jul, 2024 1 commit

Fix slow GemmaTokenizer and improve SPM slow -> fast conversion process (#32191) · 6e2d04e4

Joshua Lochner authored Jul 30, 2024

* Remove user-defined tokens which can be obtained through merges

* Remove debug line

* formatting

* Refactor spm slow -> fast converter

* revert unnecessary refactor

* set comprehension

* remove test files

* Use `vocab_scores`

* Always replace spiece underline with space in decode

* we no longer need token filtering

* Add save fast load slow unit test

* Remove tokenizers version check

* Remove duplicate code

* Make `<start_of_turn>` and `<end_of_turn>` special tokens

* Bias merge priority with length if score is the same

* Add unit test for merge priority

* CI

6e2d04e4

29 Jul, 2024 5 commits

Make static cache compatible with torch.export (#32168) · 811a9caa
Guang Yang authored Jul 29, 2024

811a9caa

[pipeline] fix padding for 1-d tensors (#31776) · 7f5d644e

Sanchit Gandhi authored Jul 29, 2024



* [pipeline] fix padding for 1-d tensors

* add test

* make style

* Update tests/pipelines/test_pipelines_automatic_speech_recognition.py
Co-authored-by: Kamil Akesbi <45195979+kamilakesbi@users.noreply.github.com>

* Update tests/pipelines/test_pipelines_automatic_speech_recognition.py

---------
Co-authored-by: Kamil Akesbi <45195979+kamilakesbi@users.noreply.github.com>

7f5d644e

Whisper tokenizer word level timestamps (#32197) · 3fbaaaa6

Kamil Akesbi authored Jul 29, 2024

* fix _fix_key in PreTrainedModel

* fix _find_longest_common_sequence

* add test

* remove result.json

* nit

* update test

3fbaaaa6

Generate: end-to-end compilation (#30788) · 7ffe25f2

Joao Gante authored Jul 29, 2024

* mvp

* added test (a few models need fixes)

* fix a few test cases

* test nits

* harder test 😈

* revert changes in stablelm

* test with improved condition

* add todo

* tmp commit

* merged with main

* nits

* add todo

* final corrections

* add docs for generation compilation

* docs nits

* add  tip

* PR suggestions

* add more details to the compilation docs

* fix cache positions

* cache is now init in generate; update docs

* tag test as flaky

* docs

* post rebase make fixup and other nits

* remove unintended changes

* whisper (encoder-decoder) not supported

* move token default updates to ; add tests for token defaults

* push changes

* manual rebase

* chameleon doesn't support this

* fix test_static_cache_mha_mqa_gqa (broken in another PR)

* docs: dynamic is better with end-to-end compilation

7ffe25f2

🚨

Bloom support for cache class (#31445) · f7396876

Raushan Turganbay authored Jul 29, 2024



* bloom dynamic cache

* bloom follows standard cache format

* no skips for bloom anymore

* use cache position when possible

* clean up

* codestyle

* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* pr comments

* isinstance fix

* address comments

* make musicgen test happy

* [run-slow] bloom

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

f7396876

26 Jul, 2024 5 commits

Flash-Attn: fix generation when no attention mask or no pading (#32241) · 81233c06
Raushan Turganbay authored Jul 26, 2024
```
* fix

* fix prev test (half of failures)

* [run-slow] llama, gemma2

* [run-slow] llama, gemma2
```
81233c06

[tests] fix `static` cache implementation is not compatible with... · 27c7f971

Fanli Lin authored Jul 26, 2024

[tests] fix `static` cache implementation is not compatible with `attn_implementation==flash_attention_2` (#32039)

* add flash attention check

* fix

* fix

27c7f971

Refactor: Removed un-necessary `object` base class (#32230) · b8e5cd53
Sai-Suraj-27 authored Jul 26, 2024
```
* Refactored to remove un-necessary object base class.

* small fix.
```
b8e5cd53
Llava: generate without images (#32183) · fad15fba
Raushan Turganbay authored Jul 26, 2024
```
* llava w/o images

* tests
```
fad15fba

Generation: stop at `eos` for assisted decoding (#31301) · 4ab33c2d

Raushan Turganbay authored Jul 26, 2024



* fix

* move changes to prompt lookup

* add test

* set eos in assistant model

* style

* fix flakiness

* changes for new `main`

* Update tests/generation/test_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/generation/test_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add comment to explain

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

4ab33c2d

25 Jul, 2024 3 commits
- Follow up for #31973 (#32025) · df6eee92
  Yih-Dar authored Jul 25, 2024
```
* fix

* [test_all] trigger full CI

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  df6eee92
- [warnings] fix E721 warnings (#32223) · de231889
  Kashif Rasul authored Jul 25, 2024
```
fix E721 warnings
```
  de231889
- [whisper] fix short-form output type (#32178) · 5658e749
  Sanchit Gandhi authored Jul 25, 2024
```
* [whisper] fix short-form output type

* add test

* make style

* update long-form tests

* fixes

* last fix

* finalise test
```
  5658e749
24 Jul, 2024 5 commits

fix: Replaced deprecated `unittest method` with the correct one (#32198) · 85a1269e
Sai-Suraj-27 authored Jul 24, 2024
```
Replaced deprecated unittest method with the correct one.
```
85a1269e

🚨

No more default chat templates (#31733) · edd68f4e

Matt authored Jul 24, 2024

* No more default chat templates

* Add the template to the GPT-SW3 tests since it's not available by default now

* Fix GPT2 test

* Fix Bloom test

* Fix Bloom test

* Remove default templates again

edd68f4e

Support dequantizing GGUF FP16 format (#31783) · 1c122a46
Penut Chen authored Jul 24, 2024
```
* support gguf fp16

* support gguf bf16 with pytorch

* add gguf f16 test

* remove bf16
```
1c122a46

RoPE: relaxed rope validation (#32182) · e0182f3b

Joao Gante authored Jul 24, 2024

* relaxed rope check

* lets also accept rope_type=None, defaulting to the original implementation

* type and rope_type can coexist

e0182f3b

Remove conversational pipeline tests (#32099) · 165116bc
amyeroberts authored Jul 24, 2024
```
Remove conversation pipeline tests
```
165116bc

23 Jul, 2024 2 commits

Updated `ruff` to the latest version (#31926) · d2c687b3

Sai-Suraj-27 authored Jul 23, 2024

* Updated ruff version and fixed the required code accorindg to the latest version.

* Updated ruff version and fixed the required code accorindg to the latest version.

* Added noqa directive to ignore 1 error shown by ruff

d2c687b3

Enhancing SFT Training Efficiency Using Packing and FlashAttention2 with Position IDs (#31629) · 9cf4f2aa

RhuiDih authored Jul 23, 2024

* add DataCollatorBatchFlattening

* Update data_collator.py

* change name

* new FA2 flow if position_ids is provided

* add comments

* minor fix

* minor fix data collator

* add test cases for models

* add test case for data collator

* remove extra code

* formating for ruff check and check_repo.py

* ruff format

ruff format tests src utils

* custom_init_isort.py

9cf4f2aa