Commits · 940fde8dafaecb8f17b588c5078291f1c1a420c8 · chenpangpang / transformers

05 Jun, 2024 12 commits

Skip failing JetMOE generation tests (#31266) · 940fde8d
amyeroberts authored Jun 05, 2024
```
Skip failing tests for now
```
940fde8d

Reduce by 2 the memory requirement in `generate()`

(#30536) · bd5091df

Cyril Vallez authored Jun 05, 2024

* Fix contrastive_search for new cache structure, and improve performance by removing inneficient torch.stack(torch.split(x, top_k, dim=0))

* Fix _contrastive_search for non-standard cache using ellipsis slicing

* Fix all outputs.logits memory leaks for all decoding strategies!

* Fix small error in _contrastive_search()

* Make all necessary change and revert for the new class

* Apply coding style

* Remove pipes in type hints for compatibility

* correct type hint

* apply style

* Use DynamicCache by default and solve conflicts

* Fix rebase issues

* Add `_supports_dynamic_cache_class` in models for models that support DynamicCache but not other caches to make DynamicCache the default for more models

* Create generation config to return legacy format by default, or to choose not to

* style

* Fix case when use_cache is False

* Remove default DynamicCache in assiste_decoding if assistant_model does not support it + fix _seen_tokens when cropping cache

* Update prepare_inputs_for_generation() for case with empty DynamicCache

* Correct return of args in _assisted_decoding

* Remove EfficientDynamicCache as it is no longer needed

* Correct mistake in generation config

* Move cache logic of assisted decoding to AssistedCandidateGenerator.__init__

* change DynamicCache function names from "split" to "batch_split" for readability + apply coding style

* Remove `_supports_dynamic_cache_class` attribute after rebase

* Correct missing line lost in conflict resolution during rebasing

* Add special case for Jamba

* Fix jamba test

* Coding style

* coding style

* Correct missing import in rebasing

* Simplify _validate_model_kwargs based on removal of _supports_dynamic_cache attribute

* Simplify code paths in _contrastive_search

* coding style

* Update docstrings of cache methods

* Update prepare_inputs_for_generation() -> past_key_values are always Cache objects

bd5091df

Add condition to `benchmark` job in `push-important-models.yml` (#31259) · d6276f0f
Yih-Dar authored Jun 05, 2024
```
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
d6276f0f
Fix circular reference issue in CLIPTokenizerFast (#31075) · b72752f0
Dhaivat Bhatt authored Jun 05, 2024

b72752f0

Add missing Flaubert tokenizer tests (#30492) · 464d986b

bastrob authored Jun 05, 2024

* add flaubert tokenization test, enrich inheritance in FlaubertTokenizer.

* fix quality code ci

* ensure parameter consistency

* fix ci

* fix copyright year and flatten vocab list.

* fix style

464d986b

enable deterministic mode for npu (#31253) · 41cf4097
Huazhong Ji authored Jun 05, 2024

41cf4097

doc: add info about wav2vec2 bert in older wav2vec2 models. (#31120) · 4a602492

Vaibhav Srivastav authored Jun 05, 2024



* doc: add info about wav2vec2 bert in older wav2vec2 models.

* apply suggestions from review.

* forward contrib credits from review

---------
Co-authored-by: Sanchit Gandhi <sanchit-gandhi@users.noreply.github.com>

4a602492

Bump transformers from 3.5.1 to 4.38.0 in /examples/research_projects/deebert (#31244) · c39aaea9

dependabot[bot] authored Jun 05, 2024

Bump transformers in /examples/research_projects/deebert

Bumps [transformers](https://github.com/huggingface/transformers) from 3.5.1 to 4.38.0.
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](https://github.com/huggingface/transformers/compare/v3.5.1...v4.38.0

)

---
updated-dependencies:
- dependency-name: transformers
  dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

c39aaea9

Early labels validation (#31240) · 54659048

amyeroberts authored Jun 05, 2024

* Move label validation checks - fail early

* Remove some formatting changes - add back labels change wav2vec2

54659048

Benchmark GitHub Actions workflow (#31163) · 03ea1609

Yih-Dar authored Jun 05, 2024



* benchmark workflow

* benchmark workflow

* benchmark workflow

* benchmark workflow

* build

* build

* build

* build

* build

* build

* build

* build

* build

* build

* build

* build

* build

* build

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

03ea1609

Fixing `name 'torch' is not defined` in `bitsandbytes` integration (#31243) · 63fb253d
James Braza authored Jun 04, 2024
```
Fixed torch definition error
```
63fb253d

Specify dtype=torch.bool to avoid xla error (#31191) · 66875ac0

Yury Sulsky authored Jun 05, 2024

The StoppingCriteriaList allocates is_done without specifying dtype=torch.bool. On XLA this allocates a float tensor and causes a failure on the following line:

is_done = is_done | criteria(input_ids, scores, **kwargs)

by attempting to OR float with bool.

66875ac0

04 Jun, 2024 14 commits

Bump transformers from 4.26.0 to 4.38.0 in /examples/research_projects/vqgan-clip (#31242) · 8685b3c5

dependabot[bot] authored Jun 04, 2024

Bump transformers in /examples/research_projects/vqgan-clip

Bumps [transformers](https://github.com/huggingface/transformers) from 4.26.0 to 4.38.0.
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](https://github.com/huggingface/transformers/compare/v4.26.0...v4.38.0

)

---
updated-dependencies:
- dependency-name: transformers
  dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

8685b3c5

Upload (daily) CI results to Hub (#31168) · 3714f3f8

Yih-Dar authored Jun 04, 2024



* build

* build

* build

* build

* fix

* fix

* fix

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

3714f3f8

Move out common backbone config param validation (#31144) · 99de3a84
amyeroberts authored Jun 04, 2024
```
* Move out common validation

* Add missing backbone config arguments
```
99de3a84
Blip: Deprecate `BlipModel` (#31235) · 485d913d
Younes Belkada authored Jun 04, 2024
```
* deprecate blip

* mention deprecation on docs
```
485d913d

Fix `MistralIntegrationTest` (#31231) · fd3238b4

Yih-Dar authored Jun 04, 2024



* fix

* fix

* fix

* fix

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

fd3238b4

add no split modules for xlmrobertaxl (#31223) · 2965b204
Manuel Faysse authored Jun 04, 2024

2965b204

Add new line switch before logging ***** Running {description} ***** (#31225) · 821b772a

Jacklanda authored Jun 04, 2024

✨

 Add new line switch before logging "***** Running {description} *****".
Signed-off-by: jacklanda <yonyonlau@gmail.com>

821b772a

Fix pipeline tests - torch imports (#31227) · 4ba66fdb
amyeroberts authored Jun 04, 2024
```
* Fix pipeline tests - torch imports

* Frameowrk dependant float conversion
```
4ba66fdb

fix bf16 issue in text classification pipeline (#30996) · 6b22a8f2

Chujie Zheng authored Jun 04, 2024

* fix logits dtype

* Add bf16/fp16 tests for text_classification pipeline

* Update test_pipelines_text_classification.py

* fix

* fix

6b22a8f2

Add dynamic resolution input/interpolate position embedding to deit (#31131) · de460e28

Kristen Pereira authored Jun 04, 2024



* Added interpolate pos encoding feature and test to deit

* Added interpolate pos encoding feature and test for deit TF model

* readded accidentally delted test for multi_gpu

* storing only patch_size instead of entire config and removed commented code

* Update modeling_tf_deit.py to remove extra line
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

de460e28

Video-LLaVa: handle any number of frames (#31221) · d64e4da7
Raushan Turganbay authored Jun 04, 2024
```
video-llava can handle more frames
```
d64e4da7

fix(PatchTST): Wrong dropout used for PretainHead (#31117) · 36ade4a3

Max Strobel authored Jun 04, 2024



* fix(PatchTST): Wrong dropout used for PretainHead

* feat(PatchTST): remove unused config.dropout

---------
Co-authored-by: Strobel Maximilian (IFAG PSS SIS SCE ACM) <Maximilian.Strobel@infineon.com>

36ade4a3

Fix sentence fragment within test comments (#31218) · e83cf581
DomHudson authored Jun 04, 2024

e83cf581

Pass device in Logits Processor's init (#29804) · 83238eee

Raushan Turganbay authored Jun 04, 2024



* add device in logits processor

* remove device when not needed

* codestyle

* tests

* forgot `melody` version

* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* codestyle

* updates

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

83238eee

03 Jun, 2024 14 commits

[docs] Spanish translation of tokenizer_summary.md (#31154) · c73ee133

Aaron Jimenez authored Jun 03, 2024

* add tokenizer_summary to es/_toctree.yml

* add tokenizer_summary to es/

* fix link to Transformes XL in en/

* translate until Subword tokenization section

* fix GPT link in en/

* fix other GPT link in en/

* fix typo in en/

* translate the doc

* run make fixup

* Remove .md in Transformer XL link

* fix some link issues in es/

* fix typo

c73ee133

Fix GPU OOM for `mistral.py::Mask4DTestHard` (#31212) · 8a1a23ae

Yih-Dar authored Jun 03, 2024



* build

* build

* build

* build

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

8a1a23ae

Set greater_is_better to False if metric_for_best_model ends with "loss" (#31142) · df5abae8
miivanov90 authored Jun 03, 2024
```
* update to not(endswith(loss))

* ruff formatting
```
df5abae8
Cohere: Fix copied from (#31213) · 924c46d4
Younes Belkada authored Jun 03, 2024
```
Update modeling_cohere.py
```
924c46d4
Wrong translation FR : Contents = Contenu (#31186) · 98dd8423
Jade Choghari authored Jun 03, 2024
```
Update index.md - Contents = Contenu

French typo -
Contents = Contenu
```
98dd8423
Rename sanity_evaluation to eval_on_start (#31192) · c6c78733
Qubitium authored Jun 03, 2024
```
* Rename sanity_evaluation to eval_on_start

* move arg back to last
```
c6c78733
Fix typo in utils (#31169) · c230504b
Bojun Feng authored Jun 03, 2024
```
fix typo
```
c230504b

fix the get_size_with_aspect_ratio in max_size situation (#30902) · 874ac129

Sangbum Daniel Choi authored Jun 04, 2024



* fix the get_size_with_aspect_ratio in max_size situation

* make fix-up

* add more general solution

* consider when max_size is not defined

* fix typo

* fix typo

* simple fix

* fix error

* fix if else error

* fix error of size overwrite

* fix yolos image processing

* fix detr image processing

* make

* add longest related test script

* Update src/transformers/models/yolos/image_processing_yolos.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add more test

* add test script about longest size

* remove deprecated

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

874ac129

Add Qwen2 GGUF loading support (#31175) · e4628434

Isotr0py authored Jun 03, 2024

* add qwen2 gguf support

* Update docs

* fix qwen2 tokenizer

* add qwen2 gguf test

* fix typo in qwen2 gguf test

* format code

* Remove mistral, clarify the error message

* format code

* add typing and update docstring

e4628434

Fix `test_compile_static_cache` (#30991) · df848acc

Yih-Dar authored Jun 03, 2024



* fix

* fix

* fix

* fix

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

df848acc

🚨 [Mistral and friends] Update MLP (#31057) · 70c87138
NielsRogge authored Jun 03, 2024
```
Update MLP
```
70c87138
SlidingWindowCache: reduce differences to other Cache classes (#30970) · d475f767
Joao Gante authored Jun 03, 2024
```
* tmp commit

* sliding window with fewer differences

* make fixup + rebase

* missing overwrite
```
d475f767

Ignore non-causal mask in more cases with SDPA (#30138) · 221aaec6

fxmarty authored Jun 03, 2024

* update non-causal mask for sdpa

* add test

* update docstrings

* add one more test

* fix cross attention bug

* gentler atol/rtol

221aaec6

Fix Cannot convert [array()] to EagerTensor of dtype int64 (#31109) · f4f69625

Pavithra Devi M authored Jun 03, 2024

While running the model.prepare_tf_dataset() method,
it raises the error below:
```
TypeError: Cannot convert [array([322.,   1.])] to EagerTensor of dtype int64
```

This happens, in  "DataCollatorForSeq2Seq" function when we are try
to convert the labels to tensors. While converting the labels to tensors,
the labels can be in the format of list of list or list of ndarrays.
There is no problem converting the list of list lables. There is a problem
when the list of ndarrays are float values(like below).

```
[array([322.,   1.])]
```

so the exception raises while trying to convert this label to tensors using
below code.

```
batch["labels"] = tf.constant(batch["labels"], dtype=tf.int64)
```

The labels are always integer values, so this got converted to float
values in the label padding operation below.
```
batch["labels"] = [
                    call(label)
                    if padding_side == "right"
                    else np.concatenate([[self.label_pad_token_id] * (max_label_length - len(label)), label])
                    for label in labels
                    ]
```
Here we have 2 cases:
1 - Concatenating an array having integer padding token value with labels.
2 - Concatenating an empty array with labels.

----------------------------------------------------------------------------------------
case 1: Concatenating an array having integer padding token value with labels.
WORKS EXPECTED:
----------------------------------------------------------------------------------------
```
label = np.array([233, 1])
max_label_length = 4
label_pad_token_id = -100
np.concatenate([[label_pad_token_id] * (max_label_length - len(label)), label])
o/p:
array([-100, -100,  233,    1])
```

----------------------------------------------------------------------------------------
Case 2: Concatenating an empty array with labels.
GIVES THE ISSUE:
This scenorio can happen when the label has the maximum label length -- No padding needed.
----------------------------------------------------------------------------------------
```
label = np.array([233, 1])
max_label_length = 2
label_pad_token_id = -100
np.concatenate([[label_pad_token_id] * (max_label_length - len(label)), label])
o/p:
array([233.,   1.])
```

----------------------------------------------------------------------------------------
Solution:
----------------------------------------------------------------------------------------
We need to concatenate a ndarray of dtype int with labels.

AFTER FIX:
----------
case 1:
```

label = np.array([233, 1])
max_label_length = 4
label_pad_token_id = -100
np.concatenate([np.array([label_pad_token_id] * (max_label_length - len(label)), dtype=np.int64),label])

o/p:
array([-100, -100,  233,    1])
```

case 2:
```

label = np.array([233, 1])
max_label_length = 2
label_pad_token_id = -100
np.concatenate([np.array([label_pad_token_id] * (max_label_length - len(label)), dtype=np.int64),label])

o/p:
array([233,   1])
```

f4f69625