Commits · 6a03942db7c577f9db340270024ab86ba54f7f21 · chenpangpang / transformers

06 Aug, 2024 4 commits

Add Nemotron HF Support (#31699) · 6a03942d

Ao Tang authored Aug 06, 2024

* Add nemotron support

* fix inference

* add unit test

* add layernorm1p as a class to avoid meta device mismatch

* test fixed

* Add copied_from statements

* remove pretraining_tp args

* remove nemotronlayernorm

* force LN computation done in FP32

* remove nemotrontokenizer and use llamatokenizer

* license update

* add option for kv_channels for minitron8b

* remove assert

* o_proj fixed

* o_proj reshape

* add gated_proj option

* typo

* remove todos

* fix broken test after merging latest main

* remove nezha/nat after meging main

* chnage default config to 15b model

* add nemo conversion script

* rename conversion script

* remove gate_proj option

* pr comment resolved

* fix unit test

* rename kv_channels to head_dim

* resolve PR issue

* add nemotron md

* fix broken tests

* refactor rope for nemotron

* test fix

* remove linearscaling

* whitespace and import

* fix some copied-from

* code style fix

* reformatted

* add position_embedding to nemotronattention

* rope refactor to only use config, copied-from fix

* format

* Run make fix-copies

* nemotron md with autodoc

* doc  fix

* fix order

* pass check_config_docstrings.py

* fix config_attributes

* remove all llama BC related code

* Use PreTrainedTokenizerFast

* ruff check examples

* conversion script update

* add nemotron to toctree

6a03942d

Fix get large model config for Switch Transformer encoder only tester (#32438) · 438d06c9
Francisco Kurucz authored Aug 06, 2024

438d06c9

Update kwargs validation for `preprocess` with decorator (#32024) · fb66ef81

Pavel Iakubovskii authored Aug 06, 2024

* BLIP preprocess

* BIT preprocess

* BRIDGETOWER preprocess

* CHAMELEON preprocess

* CHINESE_CLIP preprocess

* CONVNEXT preprocess

* DEIT preprocess

* DONUT preprocess

* DPT preprocess

* FLAVA preprocess

* EFFICIENTNET preprocess

* FUYU preprocess

* GLPN preprocess

* IMAGEGPT preprocess

* INTRUCTBLIPVIDEO preprocess

* VIVIT preprocess

* ZOEDEPTH preprocess

* VITMATTE preprocess

* VIT preprocess

* VILT preprocess

* VIDEOMAE preprocess

* VIDEOLLAVA

* TVP processing

* TVP fixup

* SWIN2SR preprocess

* SIGLIP preprocess

* SAM preprocess

* RT-DETR preprocess

* PVT preprocess

* POOLFORMER preprocess

* PERCEIVER preprocess

* OWLVIT preprocess

* OWLV2 preprocess

* NOUGAT preprocess

* MOBILEVIT preprocess

* MOBILENETV2 preprocess

* MOBILENETV1 preprocess

* LEVIT preprocess

* LAYOUTLMV2 preprocess

* LAYOUTLMV3 preprocess

* Add test

* Update tests

fb66ef81

add the missing flash attention test marker (#32419) · e85d8639

Fanli Lin authored Aug 06, 2024

* add flash attention check

* fix

* fix

* add the missing marker

* bug fix

* add one more

* remove order

* add one more

e85d8639

05 Aug, 2024 4 commits
- fix: Updated `test_embeded_special_tokens` for luke and mluke models (#32413) · 458b0cd2
  Sai-Suraj-27 authored Aug 05, 2024
```
Fixed tokenizertests for luke, mluke models.
```
  458b0cd2
- Persist embedding type of BART and mBART models after resize (#32242) · baf7e5c9
  Abdi authored Aug 05, 2024
```
* fix: persist embedding type of MBartConditonalGeneration after resize

* fix: persist embedding type of BartConditonalGeneration after resize
```
  baf7e5c9
- Phi3 tests: fix typing for Python 3.8 (#32388) · 3bb646a5
  Raushan Turganbay authored Aug 05, 2024
```
fix phi
```
  3bb646a5
- fix: SeamlessM4TFeatureExtractor stride remainder (#32088) · 05ae3a30
  TechInterMezzo authored Aug 05, 2024
```
* fix: SeamlessM4TFeatureExtractor stride remainder

* Added attention mask size test

* Reran ruff for style correction
```
  05ae3a30
01 Aug, 2024 2 commits

Remove size check between attn_weights and kv_seq_len for phi3 (#32339) · 48ed24c5
Lunwen He authored Aug 01, 2024
```
* Remove size check between attn_weights and kv_seq_len

* add unit tests
```
48ed24c5

[whisper] compile compatibility with long-form decoding (#31772) · e234061c

Sanchit Gandhi authored Aug 01, 2024

* [whisper] compile compatibility with long-form decoding

* clarify comment

* fix after rebase

* finalise

* fix bsz

* fix cache split

* remove contiguous

* style

* finish

* update doc

* prevent cuda graph trace

e234061c

31 Jul, 2024 4 commits

>3-5x faster torch.compile forward compilation for autoregressive decoder models (#32227) · 92abe603

fxmarty authored Jul 31, 2024



* draft

* apply changes to all relevant archs

* rerun ci - check_docstrings.py failing?

* fix docstring

* move 2D->4D mask creation to modeling file

* repo consistency

* fix the batch size = 1 case - calling contiguous is not enough

* nit

* style

* propagate to gemma/gemma-2

* prepare inputs for gemma generation

* implement test and tiny fix in gemma2

* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix copies

* ci pass

* fix gemma's test_compile_static_cache tests

* flacky

* retrigger ci

---------
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

92abe603

[Idefics2] - Fix FA2 call for Perceiver layer (#32275) · 5f1fcc29

amyeroberts authored Jul 31, 2024

* Fix FA2 call for Perciever layer

* [run_slow] idefics2

* [run_slow] idefics2

* [run_slow] idefics2

* Fix up

* [run_slow] idefics2

* [run_slow] idefics2

* [run_slow] idefics2

5f1fcc29

Llama 3.1: Fix incorrect `inv_freq` assignment (#32330) · b75ad566
Joao Gante authored Jul 31, 2024
```
fix 💩
```
b75ad566

Gemma2 and flash-attention (#32188) · 7f552e28

Raushan Turganbay authored Jul 31, 2024

* enable flash-attn & static cache

* this works, not the prev

* fix for sliding window layers

* not needed anymore

7f552e28

30 Jul, 2024 1 commit

Fix slow GemmaTokenizer and improve SPM slow -> fast conversion process (#32191) · 6e2d04e4

Joshua Lochner authored Jul 30, 2024

* Remove user-defined tokens which can be obtained through merges

* Remove debug line

* formatting

* Refactor spm slow -> fast converter

* revert unnecessary refactor

* set comprehension

* remove test files

* Use `vocab_scores`

* Always replace spiece underline with space in decode

* we no longer need token filtering

* Add save fast load slow unit test

* Remove tokenizers version check

* Remove duplicate code

* Make `<start_of_turn>` and `<end_of_turn>` special tokens

* Bias merge priority with length if score is the same

* Add unit test for merge priority

* CI

6e2d04e4

29 Jul, 2024 2 commits

Whisper tokenizer word level timestamps (#32197) · 3fbaaaa6

Kamil Akesbi authored Jul 29, 2024

* fix _fix_key in PreTrainedModel

* fix _find_longest_common_sequence

* add test

* remove result.json

* nit

* update test

3fbaaaa6

Generate: end-to-end compilation (#30788) · 7ffe25f2

Joao Gante authored Jul 29, 2024

* mvp

* added test (a few models need fixes)

* fix a few test cases

* test nits

* harder test 😈

* revert changes in stablelm

* test with improved condition

* add todo

* tmp commit

* merged with main

* nits

* add todo

* final corrections

* add docs for generation compilation

* docs nits

* add  tip

* PR suggestions

* add more details to the compilation docs

* fix cache positions

* cache is now init in generate; update docs

* tag test as flaky

* docs

* post rebase make fixup and other nits

* remove unintended changes

* whisper (encoder-decoder) not supported

* move token default updates to ; add tests for token defaults

* push changes

* manual rebase

* chameleon doesn't support this

* fix test_static_cache_mha_mqa_gqa (broken in another PR)

* docs: dynamic is better with end-to-end compilation

7ffe25f2

26 Jul, 2024 2 commits
- Refactor: Removed un-necessary `object` base class (#32230) · b8e5cd53
  Sai-Suraj-27 authored Jul 26, 2024
```
* Refactored to remove un-necessary object base class.

* small fix.
```
  b8e5cd53
- Llava: generate without images (#32183) · fad15fba
  Raushan Turganbay authored Jul 26, 2024
```
* llava w/o images

* tests
```
  fad15fba
25 Jul, 2024 3 commits
- Follow up for #31973 (#32025) · df6eee92
  Yih-Dar authored Jul 25, 2024
```
* fix

* [test_all] trigger full CI

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  df6eee92
- [warnings] fix E721 warnings (#32223) · de231889
  Kashif Rasul authored Jul 25, 2024
```
fix E721 warnings
```
  de231889
- [whisper] fix short-form output type (#32178) · 5658e749
  Sanchit Gandhi authored Jul 25, 2024
```
* [whisper] fix short-form output type

* add test

* make style

* update long-form tests

* fixes

* last fix

* finalise test
```
  5658e749
24 Jul, 2024 3 commits

fix: Replaced deprecated `unittest method` with the correct one (#32198) · 85a1269e
Sai-Suraj-27 authored Jul 24, 2024
```
Replaced deprecated unittest method with the correct one.
```
85a1269e

🚨

No more default chat templates (#31733) · edd68f4e

Matt authored Jul 24, 2024

* No more default chat templates

* Add the template to the GPT-SW3 tests since it's not available by default now

* Fix GPT2 test

* Fix Bloom test

* Fix Bloom test

* Remove default templates again

edd68f4e

RoPE: relaxed rope validation (#32182) · e0182f3b

Joao Gante authored Jul 24, 2024

* relaxed rope check

* lets also accept rope_type=None, defaulting to the original implementation

* type and rope_type can coexist

e0182f3b

23 Jul, 2024 9 commits

Updated `ruff` to the latest version (#31926) · d2c687b3

Sai-Suraj-27 authored Jul 23, 2024

* Updated ruff version and fixed the required code accorindg to the latest version.

* Updated ruff version and fixed the required code accorindg to the latest version.

* Added noqa directive to ignore 1 error shown by ruff

d2c687b3

Revert "Incorrect Whisper long-form decoding timestamps " (#32148) · 3263b343
Sanchit Gandhi authored Jul 23, 2024
```
Revert "Incorrect Whisper long-form decoding timestamps  (#32003)"

This reverts commit cd48553f.
```
3263b343

Rename Phi-3 rope scaling type (#31436) · 034b4778

Amit Garg authored Jul 23, 2024

* renamed phi3 rope_scaling type

* fixed trailing whitespaces

* fixed test

* added warning

* fixed format

034b4778

Fix video batching to videollava (#32139) · 9ced33ca
Merve Noyan authored Jul 23, 2024
```
---------
Co-authored-by: Merve Noyan <mervenoyan@Merve-MacBook-Pro.local>
```
9ced33ca

Llama: RoPE refactor (#32135) · 2e113422

Joao Gante authored Jul 23, 2024


Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

2e113422

Add YaRN and Dynamic-YaRN RoPE Scaling Methods (#30910) · 34b43211

mig-mfreitas authored Jul 23, 2024

* Add YaRN and Dynamic-YaRN RoPE Scaling Methods

YaRN (Yet another RoPE extension method) combines the NTK-By-Parts
Interpolation and Attention Scaling methods, improving upon existing
RoPE interpolation methods for longer context window sizes.

Fine-tuned models maintain their original performance across benchmarks
while enabling efficient extrapolation and transfer learning for
quicker convergence, especially in compute-limited environments.

We implement YaRN and Dynamic-YaRN for the following list of models:

 - LLaMA
 - Falcon
 - GPT-NeoX
 - Olmo
 - Persimmon
 - Phi
 - StableLM
 - OpenLLaMA

New unit tests are added to assert YaRN's correct behavior on both
short and long sequence inputs.

For more details, please refer to https://arxiv.org/abs/2309.00071

.
Co-authored-by: Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt>

* Refactor YaRN implementation for LLaMA

Iterate on YaRN implementation for LLaMA and remove diff from remaining
models for increased PR modularity.

This commit includes the following changes:
- Merge 'yarn_rope_scaling' and 'rope_scaling' dictionaries
- Remove unnecessary attributes ('extrapolation_factor' and 'finetuned')
  from YaRN classes
- Inherit 'forward' method in YaRN classes from superclass
- Rename 'yarn' method to 'compute_yarn_scaling'
- Extend YaRN tests with further assertions
- Fix style inconsistencies
Co-authored-by: Miguel Monte e Freitas <miguelmontefreitas@tecnico.ulisboa.pt>

* Refactor Tensor Building Logic for YaRN

- Comply with the the tensor building logic introduced in #30743
- Add referencing to the optimized Attention Factor equation
- Remove Dynamic YaRN for a more agile deployment
Co-authored-by: mig-mfreitas <mig-mfreitas@users.noreply.github.com>

* remove unwanted file

---------
Co-authored-by: Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt>
Co-authored-by: mig-mfreitas <mig-mfreitas@users.noreply.github.com>
Co-authored-by: Joao Gante <joao@huggingface.co>

34b43211

Fix mask creations of `GPTNeoX` and `GPT2` (#31944) · 605f3245

Anton Vlasjuk authored Jul 23, 2024

* fix mask creation of gpt2 and gpt_neox caused by me

* forgot the reshape of masks when shape > 2

* add tests for gpt neox and gpt2

* nit on a comment

605f3245

Remove `trust_remote_code` when loading Libri Dummy (#31748) · f83c6f1d
Sanchit Gandhi authored Jul 23, 2024
```
* [whisper integration] use parquet dataset for testing

* propagate to others

* more propagation

* last one
```
f83c6f1d
LLaVaNeXT: pad on right if training (#32134) · 3aefb4ec
Raushan Turganbay authored Jul 23, 2024
```
* pad on right if training

* docs

* add tests
```
3aefb4ec

22 Jul, 2024 4 commits
- Return assistant generated tokens mask in apply_chat_template (#30650) · 74d0eb3f
  Yoni Gottesman authored Jul 22, 2024
```
return assistant generated tokens mask in apply_chat_template
```
  74d0eb3f
- fix: Fixed raising `TypeError` instead of `ValueError` for invalid type (#32111) · 12b6880c
  Sai-Suraj-27 authored Jul 22, 2024
```
* Raised TypeError instead of ValueError for invalid types.

* Updated formatting using ruff.

* Retrieved few changes.

* Retrieved few changes.

* Updated tests accordingly.
```
  12b6880c
- Fix failing test with race condition (#32140) · 7ba028fc
  Matt authored Jul 22, 2024
```
* Fix failing test with race condition

* make fixup

* monotonic_ns instead of randint

* uuid4 instead of monotonic_ns

* Add a finally cleanup step
```
  7ba028fc
- Mention model_info.id instead of model_info.modelId (#32106) · f2a1e3ca
  Lucain authored Jul 22, 2024
  
  f2a1e3ca
19 Jul, 2024 2 commits

Support generating with fallback for short form audio in Whisper (#30984) · 89575b56

Kamil Akesbi authored Jul 19, 2024



* remove is_shortform

* adapt _retrieve_max_frames_and_seek for short_form

* return bos token in short and long form

* add decoder_input_ids to short form audios

* add eos token for  short form

* handle short form token_timestamps

* no need to return scores

* add is_shortform conditions

* handle when max_new_tokens is None - short form

* handle assistant decoding

* fix

* handle return_dict_in_generate

* handle split_by_batch for encoder_attentions attribute

* handle num_beams>1

* handle num_return_sequences>1 in generate_with_fallback

* handle num_return_sequences>1 with return_dict_in_generate=True

* raise error if max_new_tokens + decoder_inputs_ids > max_target_pos

* fix

* apply review suggestions

* fix

* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* fix

* logits for both short form and long form

* handle if logits_processor is None

* test

* apply review changes to num_return_sequences

* add _expand_variables_for_generation

* remove short form commented section

* update comments

* uncomment num_beams line in generate_with_fallback

* update assistant decoding

* handle return_segment with short form generation

* up

* fix output format is_shortform

* overwrite beam_sample test

* update _set_return_timestamps

* apply review suggestions

* apply review suggestions

* remove seek_outputs_short_form

* fix _stack_split_outputs

* fix stack dim in _stack_split_outputs

* update tests

* fix past_key_values + beam tests

* fix

* clean _expand_variables_for_generation

* make style

* fix slow tests

* make style

* max_length condition

* make style

* add slow tests for shortform fallback

* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* apply review changes

* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* up

* fix slow tests

* apply review suggestions

* update test

* make style

* small fix

* fix

* fix test_new_cache_format

* fix past_key_values

* fix

* make style

* fix slow tests

* fix

---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

89575b56

Incorrect Whisper long-form decoding timestamps (#32003) · cd48553f

Kamil Akesbi authored Jul 19, 2024



* fix lo form timestamps in decode_batch

* Update src/transformers/models/whisper/tokenization_whisper.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* Update src/transformers/models/whisper/tokenization_whisper.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* add test

* make style

* fix copies

* Update src/transformers/models/whisper/tokenization_whisper_fast.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/whisper/tokenization_whisper.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/whisper/processing_whisper.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/whisper/tokenization_whisper.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* apply review suggestions

* fix

* fix copies

* fix

* Update src/transformers/models/whisper/tokenization_whisper_fast.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix-copies

---------
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

cd48553f