Commits · 51ab25e2932da15511ced35bcbdfa92d25c4794c · chenpangpang / transformers

"docs/vscode:/vscode.git/clone" did not exist on "0927bfd002f2691059125b7fb8f6e0fc081de695"

01 Aug, 2024 6 commits

Fixed Hybrid Cache Shape Initialization. (#32163) · 51ab25e2

OsamaS99 authored Aug 01, 2024



* fixed hybrid cache init, added test

* Fix Test Typo

---------
Co-authored-by: Aaron Haag <aaron.haag@siemens.com>

51ab25e2

Offloaded KV Cache (#31325) · ca59d6f7

Nikos Karampatziakis authored Aug 01, 2024

* Initial implementation of OffloadedCache

* enable usage via cache_implementation

* Address feedback, add tests, remove legacy methods.

* Remove flash-attn, discover synchronization bugs, fix bugs

* Prevent usage in CPU only mode

* Add a section about offloaded KV cache to the docs

* Fix typos in docs

* Clarifications and better explanation of streams

ca59d6f7

Fix conflicting key in init kwargs in PreTrainedTokenizerBase (#31233) · b4727a12

Omar Salman authored Aug 01, 2024

* Fix conflicting key in init kwargs in PreTrainedTokenizerBase

* Update code to check for callable key in save_pretrained

* Apply PR suggestions

* Invoke CI

* Updates based on PR suggestion

b4727a12

update clean_up_tokenization_spaces warning (#32371) · 2229ebe7
Ita Zaporozhets authored Aug 01, 2024

2229ebe7
Remove size check between attn_weights and kv_seq_len for phi3 (#32339) · 48ed24c5
Lunwen He authored Aug 01, 2024
```
* Remove size check between attn_weights and kv_seq_len

* add unit tests
```
48ed24c5

[whisper] compile compatibility with long-form decoding (#31772) · e234061c

Sanchit Gandhi authored Aug 01, 2024

* [whisper] compile compatibility with long-form decoding

* clarify comment

* fix after rebase

* finalise

* fix bsz

* fix cache split

* remove contiguous

* style

* finish

* update doc

* prevent cuda graph trace

e234061c

31 Jul, 2024 4 commits

>3-5x faster torch.compile forward compilation for autoregressive decoder models (#32227) · 92abe603

fxmarty authored Jul 31, 2024



* draft

* apply changes to all relevant archs

* rerun ci - check_docstrings.py failing?

* fix docstring

* move 2D->4D mask creation to modeling file

* repo consistency

* fix the batch size = 1 case - calling contiguous is not enough

* nit

* style

* propagate to gemma/gemma-2

* prepare inputs for gemma generation

* implement test and tiny fix in gemma2

* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix copies

* ci pass

* fix gemma's test_compile_static_cache tests

* flacky

* retrigger ci

---------
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

92abe603

[Idefics2] - Fix FA2 call for Perceiver layer (#32275) · 5f1fcc29

amyeroberts authored Jul 31, 2024

* Fix FA2 call for Perciever layer

* [run_slow] idefics2

* [run_slow] idefics2

* [run_slow] idefics2

* Fix up

* [run_slow] idefics2

* [run_slow] idefics2

* [run_slow] idefics2

5f1fcc29

Llama 3.1: Fix incorrect `inv_freq` assignment (#32330) · b75ad566
Joao Gante authored Jul 31, 2024
```
fix 💩
```
b75ad566

Gemma2 and flash-attention (#32188) · 7f552e28

Raushan Turganbay authored Jul 31, 2024

* enable flash-attn & static cache

* this works, not the prev

* fix for sliding window layers

* not needed anymore

7f552e28

30 Jul, 2024 1 commit

Fix slow GemmaTokenizer and improve SPM slow -> fast conversion process (#32191) · 6e2d04e4

Joshua Lochner authored Jul 30, 2024

* Remove user-defined tokens which can be obtained through merges

* Remove debug line

* formatting

* Refactor spm slow -> fast converter

* revert unnecessary refactor

* set comprehension

* remove test files

* Use `vocab_scores`

* Always replace spiece underline with space in decode

* we no longer need token filtering

* Add save fast load slow unit test

* Remove tokenizers version check

* Remove duplicate code

* Make `<start_of_turn>` and `<end_of_turn>` special tokens

* Bias merge priority with length if score is the same

* Add unit test for merge priority

* CI

6e2d04e4

29 Jul, 2024 5 commits

Make static cache compatible with torch.export (#32168) · 811a9caa
Guang Yang authored Jul 29, 2024

811a9caa

[pipeline] fix padding for 1-d tensors (#31776) · 7f5d644e

Sanchit Gandhi authored Jul 29, 2024



* [pipeline] fix padding for 1-d tensors

* add test

* make style

* Update tests/pipelines/test_pipelines_automatic_speech_recognition.py
Co-authored-by: Kamil Akesbi <45195979+kamilakesbi@users.noreply.github.com>

* Update tests/pipelines/test_pipelines_automatic_speech_recognition.py

---------
Co-authored-by: Kamil Akesbi <45195979+kamilakesbi@users.noreply.github.com>

7f5d644e

Whisper tokenizer word level timestamps (#32197) · 3fbaaaa6

Kamil Akesbi authored Jul 29, 2024

* fix _fix_key in PreTrainedModel

* fix _find_longest_common_sequence

* add test

* remove result.json

* nit

* update test

3fbaaaa6

Generate: end-to-end compilation (#30788) · 7ffe25f2

Joao Gante authored Jul 29, 2024

* mvp

* added test (a few models need fixes)

* fix a few test cases

* test nits

* harder test 😈

* revert changes in stablelm

* test with improved condition

* add todo

* tmp commit

* merged with main

* nits

* add todo

* final corrections

* add docs for generation compilation

* docs nits

* add  tip

* PR suggestions

* add more details to the compilation docs

* fix cache positions

* cache is now init in generate; update docs

* tag test as flaky

* docs

* post rebase make fixup and other nits

* remove unintended changes

* whisper (encoder-decoder) not supported

* move token default updates to ; add tests for token defaults

* push changes

* manual rebase

* chameleon doesn't support this

* fix test_static_cache_mha_mqa_gqa (broken in another PR)

* docs: dynamic is better with end-to-end compilation

7ffe25f2

🚨

Bloom support for cache class (#31445) · f7396876

Raushan Turganbay authored Jul 29, 2024



* bloom dynamic cache

* bloom follows standard cache format

* no skips for bloom anymore

* use cache position when possible

* clean up

* codestyle

* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* pr comments

* isinstance fix

* address comments

* make musicgen test happy

* [run-slow] bloom

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

f7396876

26 Jul, 2024 5 commits

Flash-Attn: fix generation when no attention mask or no pading (#32241) · 81233c06
Raushan Turganbay authored Jul 26, 2024
```
* fix

* fix prev test (half of failures)

* [run-slow] llama, gemma2

* [run-slow] llama, gemma2
```
81233c06

[tests] fix `static` cache implementation is not compatible with... · 27c7f971

Fanli Lin authored Jul 26, 2024

[tests] fix `static` cache implementation is not compatible with `attn_implementation==flash_attention_2` (#32039)

* add flash attention check

* fix

* fix

27c7f971

Refactor: Removed un-necessary `object` base class (#32230) · b8e5cd53
Sai-Suraj-27 authored Jul 26, 2024
```
* Refactored to remove un-necessary object base class.

* small fix.
```
b8e5cd53
Llava: generate without images (#32183) · fad15fba
Raushan Turganbay authored Jul 26, 2024
```
* llava w/o images

* tests
```
fad15fba

Generation: stop at `eos` for assisted decoding (#31301) · 4ab33c2d

Raushan Turganbay authored Jul 26, 2024



* fix

* move changes to prompt lookup

* add test

* set eos in assistant model

* style

* fix flakiness

* changes for new `main`

* Update tests/generation/test_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/generation/test_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add comment to explain

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

4ab33c2d

25 Jul, 2024 3 commits
- Follow up for #31973 (#32025) · df6eee92
  Yih-Dar authored Jul 25, 2024
```
* fix

* [test_all] trigger full CI

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  df6eee92
- [warnings] fix E721 warnings (#32223) · de231889
  Kashif Rasul authored Jul 25, 2024
```
fix E721 warnings
```
  de231889
- [whisper] fix short-form output type (#32178) · 5658e749
  Sanchit Gandhi authored Jul 25, 2024
```
* [whisper] fix short-form output type

* add test

* make style

* update long-form tests

* fixes

* last fix

* finalise test
```
  5658e749
24 Jul, 2024 5 commits

fix: Replaced deprecated `unittest method` with the correct one (#32198) · 85a1269e
Sai-Suraj-27 authored Jul 24, 2024
```
Replaced deprecated unittest method with the correct one.
```
85a1269e

🚨

No more default chat templates (#31733) · edd68f4e

Matt authored Jul 24, 2024

* No more default chat templates

* Add the template to the GPT-SW3 tests since it's not available by default now

* Fix GPT2 test

* Fix Bloom test

* Fix Bloom test

* Remove default templates again

edd68f4e

Support dequantizing GGUF FP16 format (#31783) · 1c122a46
Penut Chen authored Jul 24, 2024
```
* support gguf fp16

* support gguf bf16 with pytorch

* add gguf f16 test

* remove bf16
```
1c122a46

RoPE: relaxed rope validation (#32182) · e0182f3b

Joao Gante authored Jul 24, 2024

* relaxed rope check

* lets also accept rope_type=None, defaulting to the original implementation

* type and rope_type can coexist

e0182f3b

Remove conversational pipeline tests (#32099) · 165116bc
amyeroberts authored Jul 24, 2024
```
Remove conversation pipeline tests
```
165116bc

23 Jul, 2024 11 commits

Updated `ruff` to the latest version (#31926) · d2c687b3

Sai-Suraj-27 authored Jul 23, 2024

* Updated ruff version and fixed the required code accorindg to the latest version.

* Updated ruff version and fixed the required code accorindg to the latest version.

* Added noqa directive to ignore 1 error shown by ruff

d2c687b3

Enhancing SFT Training Efficiency Using Packing and FlashAttention2 with Position IDs (#31629) · 9cf4f2aa

RhuiDih authored Jul 23, 2024

* add DataCollatorBatchFlattening

* Update data_collator.py

* change name

* new FA2 flow if position_ids is provided

* add comments

* minor fix

* minor fix data collator

* add test cases for models

* add test case for data collator

* remove extra code

* formating for ruff check and check_repo.py

* ruff format

ruff format tests src utils

* custom_init_isort.py

9cf4f2aa

Revert "Incorrect Whisper long-form decoding timestamps " (#32148) · 3263b343
Sanchit Gandhi authored Jul 23, 2024
```
Revert "Incorrect Whisper long-form decoding timestamps  (#32003)"

This reverts commit cd48553f.
```
3263b343

Rename Phi-3 rope scaling type (#31436) · 034b4778

Amit Garg authored Jul 23, 2024

* renamed phi3 rope_scaling type

* fixed trailing whitespaces

* fixed test

* added warning

* fixed format

034b4778

Fix video batching to videollava (#32139) · 9ced33ca
Merve Noyan authored Jul 23, 2024
```
---------
Co-authored-by: Merve Noyan <mervenoyan@Merve-MacBook-Pro.local>
```
9ced33ca

gguf conversion add_prefix_space=None for llama3 (#31937) · a1844a32

Ita Zaporozhets authored Jul 23, 2024

* gguf conversion forces add_prefix_space=False for llama3, this is not required and forces from_slow, which fails. changing to None + test

* typo

* clean test

a1844a32

Llama: RoPE refactor (#32135) · 2e113422

Joao Gante authored Jul 23, 2024


Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

2e113422

Modify resize_token_embeddings to ensure output type is same as input (#31979) · 5a4a76ed

bayllama authored Jul 23, 2024



* Change resize_token_embeddings to make it return same Class that is passed to it

* Add explanatory comment as requested in review

* Add explanatory comments for add resizing function in lxmert

* Add comment for padding_idx and moving _resize_bias in lxmert to LxmertForPreTraining

---------
Co-authored-by: Prashanth Sateesh <prasatee@Prashanths-MBP.attlocal.net>
Co-authored-by: Prashanth Sateesh <prasatee@Prashanths-MacBook-Pro.local>

5a4a76ed

Add YaRN and Dynamic-YaRN RoPE Scaling Methods (#30910) · 34b43211

mig-mfreitas authored Jul 23, 2024

* Add YaRN and Dynamic-YaRN RoPE Scaling Methods

YaRN (Yet another RoPE extension method) combines the NTK-By-Parts
Interpolation and Attention Scaling methods, improving upon existing
RoPE interpolation methods for longer context window sizes.

Fine-tuned models maintain their original performance across benchmarks
while enabling efficient extrapolation and transfer learning for
quicker convergence, especially in compute-limited environments.

We implement YaRN and Dynamic-YaRN for the following list of models:

 - LLaMA
 - Falcon
 - GPT-NeoX
 - Olmo
 - Persimmon
 - Phi
 - StableLM
 - OpenLLaMA

New unit tests are added to assert YaRN's correct behavior on both
short and long sequence inputs.

For more details, please refer to https://arxiv.org/abs/2309.00071

.
Co-authored-by: Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt>

* Refactor YaRN implementation for LLaMA

Iterate on YaRN implementation for LLaMA and remove diff from remaining
models for increased PR modularity.

This commit includes the following changes:
- Merge 'yarn_rope_scaling' and 'rope_scaling' dictionaries
- Remove unnecessary attributes ('extrapolation_factor' and 'finetuned')
  from YaRN classes
- Inherit 'forward' method in YaRN classes from superclass
- Rename 'yarn' method to 'compute_yarn_scaling'
- Extend YaRN tests with further assertions
- Fix style inconsistencies
Co-authored-by: Miguel Monte e Freitas <miguelmontefreitas@tecnico.ulisboa.pt>

* Refactor Tensor Building Logic for YaRN

- Comply with the the tensor building logic introduced in #30743
- Add referencing to the optimized Attention Factor equation
- Remove Dynamic YaRN for a more agile deployment
Co-authored-by: mig-mfreitas <mig-mfreitas@users.noreply.github.com>

* remove unwanted file

---------
Co-authored-by: Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt>
Co-authored-by: mig-mfreitas <mig-mfreitas@users.noreply.github.com>
Co-authored-by: Joao Gante <joao@huggingface.co>

34b43211

Fix mask creations of `GPTNeoX` and `GPT2` (#31944) · 605f3245

Anton Vlasjuk authored Jul 23, 2024

* fix mask creation of gpt2 and gpt_neox caused by me

* forgot the reshape of masks when shape > 2

* add tests for gpt neox and gpt2

* nit on a comment

605f3245

Remove `trust_remote_code` when loading Libri Dummy (#31748) · f83c6f1d
Sanchit Gandhi authored Jul 23, 2024
```
* [whisper integration] use parquet dataset for testing

* propagate to others

* more propagation

* last one
```
f83c6f1d