Commits · 43d17c18360ac9c3d3491389328e2fe55fe8f9ce · chenpangpang / transformers

29 Mar, 2024 1 commit

Mark `test_eager_matches_sdpa_generate` flaky for some models (#29479) · 43d17c18

Yih-Dar authored Mar 29, 2024



* fix

* revert for qwen2

* revert for qwen2

* update

* update

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

43d17c18

28 Mar, 2024 18 commits

Update installs in image classification doc (#29947) · ba56ed08

MariaHei authored Mar 28, 2024

Trainer with PyTorch now requires accelerate to be installed.

Partly resolves huggingface/transformers#29174

ba56ed08

[`LlamaSlowConverter`] Slow to Fast better support (#29797) · 536ea2ac

Arthur authored Mar 29, 2024

* fix

* fix test

* style

* nit

* rather rely on concert token to id

* fix quality

* Update src/transformers/convert_slow_tokenizer.py

536ea2ac

Fix doc issue #29758 in DebertaV2Config class (#29842) · e2036468
VINAYAKK GARG authored Mar 28, 2024
```
Fix doc issue in DebertaV2Config class
Co-authored-by: Vinayakk Garg <vigar@akamai.com>
```
e2036468
[`BC`] Fix BC for other libraries (#29934) · 2bbbf1be
Arthur authored Mar 28, 2024
```
* fi xbc?

* nit
```
2bbbf1be

Allow GradientAccumulationPlugin to be configured from AcceleratorConfig (#29589) · 4df5b9b4

Yu Chin Fabian Lim authored Mar 28, 2024



* add gradient_accumulation_kwargs to AcceleratorConfig

* add suggestions from @muellerzr to docstrings, new behavior and tests

* Documentation suggestions from @muellerz
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* addressed @muellerzr comments regarding tests and test utils

* moved accelerate version to top of file.

* @muellerzr's variable fix
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* address @amyeroberts. fix tests and docstrings

* address @amyeroberts additional suggestions

---------
Co-authored-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

4df5b9b4

[ `TokenizationLlama`] fix the way we convert tokens to strings to keep... · a2a7f716

Arthur authored Mar 28, 2024

[ `TokenizationLlama`] fix the way we convert tokens to strings to keep leading spaces 🚨 breaking fix (#29453)

* nit

* update test and fix test

* fixup

a2a7f716

[`Mamba`] from pretrained issue with `self.embeddings` (#29851) · e677479c

Arthur authored Mar 28, 2024



* nit

* update

* oups

* Update src/transformers/models/mamba/modeling_mamba.py
Co-authored-by: Lysandre Debut <hi@lysand.re>

---------
Co-authored-by: Lysandre Debut <hi@lysand.re>

e677479c

RoPE models: add numerical sanity-check test for RoPE scaling (#29808) · 441de62f
Joao Gante authored Mar 28, 2024
```
* add hard rope scaling test

* make fixup

* quick rope scaling tests

* add copy statements
```
441de62f

add functions to inspect model and optimizer status to trainer.py (#29838) · aac7099c

Christopher Keibel authored Mar 28, 2024



* add functions to get number of params which require grad, get optimizer group for parameters and get learning rates of param groups to trainer.py

* add tests and raise ValueError when optimizer is None

* add second layer to test and freeze its weigths

* check if torch is available before running tests

* use decorator to check if torch is available
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix test indentation
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

aac7099c

Safe import of LRScheduler (#29919) · 855b95ce

amyeroberts authored Mar 28, 2024



* Safe import of LRScheduler

* Update src/transformers/trainer_pt_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/trainer_pt_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Fix up

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

855b95ce

Add beam search visualizer to the doc (#29876) · c9d2e855
Aymeric Roucher authored Mar 28, 2024

c9d2e855
Tests: replace `torch.testing.assert_allclose` by `torch.testing.assert_close` (#29915) · 248d5d23
Joao Gante authored Mar 28, 2024
```
* replace torch.testing.assert_allclose by torch.testing.assert_close

* missing atol rtol
```
248d5d23
[doc] fix some typos and add `xpu` to the testing documentation (#29894) · 7c19fafe
Fanli Lin authored Mar 28, 2024
```
fix typo
```
7c19fafe

Adding Flash Attention 2 Support for GPT2 (#29226) · 22d159dd

Eduardo Pacheco authored Mar 28, 2024



* First commit to add flash attention 2 for GPT-2

* more improvements

* Make GPT2 pass tests and fixed Decison Transformers copies

* Fixed missing arg

* fix copies

* Added expected speedup

* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Added test

* Fixed attn attribute

* Update docs/source/en/model_doc/gpt2.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt2.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update Decision transformer attentions

* More updates

* Passing tests

* Fix copies

* Fix copies part 2

* Decision transformer updates

* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fix copies

* Decision transformer not supporting flash attn

* Addressed comments

* Addressed comments

* Addressed comments

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

22d159dd

[`pipeline`]. Zero shot add doc warning (#29845) · 3a7e6836
Arthur authored Mar 28, 2024
```
* add doc warning

* fix build pr
```
3a7e6836
[`GptNeox`] don't gather on pkv when using the trainer (#29892) · 543889f3
Arthur authored Mar 28, 2024
```
don't gather on pkv when using the trainer
```
543889f3
[`make fix-copies`] update and help (#29924) · b256516a
Arthur authored Mar 28, 2024
```
* add some help

* style
```
b256516a
Fix typo in T5Block error message (#29881) · d9dc993f
Minseo Kang authored Mar 28, 2024

d9dc993f

27 Mar, 2024 9 commits

MixtralSparseMoeBlock: add gate jitter (#29865) · a25037be

Lorenzo Verardo authored Mar 27, 2024

This commit adds gate jitter to MixtralSparseMoeBlock's input data
before passing it through the MoE layer, if turned on.

a25037be

add Cambricon MLUs support (#29627) · 75769744

huismiling authored Mar 27, 2024

* add Cambricon MLUs support

* fix mlu device rng state

* up for quality check

* up mlu to support fp16

* fix mlu device dependency error

* fix mlu device dependency error

* enable mlu device for bf16

* fix mlu device memory tracker

75769744

Move `eos_token_id` to stopping criteria (#29459) · 0efcf323

Raushan Turganbay authored Mar 27, 2024



* add eos stopping criteria

* minor fix

* Update tests/generation/test_stopping_criteria.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* check eos is not None and fix tests

* make style and fixup

* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update tests/generation/test_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update tests/generation/test_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* camel case everywhere

* call stopping criteria list for candidate ids

* make style  and fixup

* Empty commit

* Empty commit to pass flaky test

* set max length in PromptLookupCandidateGenerator

* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* lets fix this typo in docs

* Update src/transformers/generation/utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* update PR

* empty commit

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

0efcf323

fix fuyu device_map compatibility (#29880) · 31c575bc
Marc Sun authored Mar 27, 2024
```
fix foward
```
31c575bc

Reimplement "Automatic safetensors conversion when lacking these files" (#29846) · 4d8427f7

Lysandre Debut authored Mar 27, 2024

* Automatic safetensors conversion when lacking these files (#29390)

* Automatic safetensors conversion when lacking these files

* Remove debug

* Thread name

* Typo

* Ensure that raises do not affect the main thread

* Catch all errors

4d8427f7

Fix 29807, sinusoidal positional encodings overwritten by post_init() (#29813) · a81cf9ee

Hovnatan Karapetyan authored Mar 27, 2024

* Check for requires_grad when initing weights

* Add unit test

* Move sinusoidal positional encoding generation after post_init()

* Add modules to skip init list

* Move create_sinusoidal_embeddings to _init_weights

a81cf9ee

Mamba `slow_forward` gradient fix (#29563) · cefb819f

Anton Vlasjuk authored Mar 27, 2024

* FIX: Cached slow forward in mamba
- additionally added mamba cached test
- added unused test (mamba causal lm forward and backward)
- fixed typo: "causl" --> "causal"

* formatting

* fix: use real `slow_forward` call instead of torch module's

* add shape assertion for mixer block test

* adjust shape assertion

cefb819f

Add Qwen2MoE (#29377) · 1c39974a

Bo Zheng authored Mar 27, 2024



* add support for qwen2 MoE models

* update docs

* add support for qwen2 MoE models

* update docs

* update model name & test

* update readme

* update class names & readme & model_doc of Qwen2MoE.

* update architecture name

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* update modeling_qwen2_moe.py

* fix model architecture

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* update modeling_qwen2_moe.py

* fix model architecture

* fix style

* fix test when there are sparse and non sparse layers

* fixup

* Update README.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixup

* fixup

* add archive back

* add support for qwen2 MoE models

* update docs

* update model name & test

* update readme

* update class names & readme & model_doc of Qwen2MoE.

* update architecture name

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* update modeling_qwen2_moe.py

* fix model architecture

* fixup

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* fix style

* fix test when there are sparse and non sparse layers

* fixup

* add archive back

* fix integration test

* fixup

---------
Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

1c39974a

Support `num_attention_heads` != `num_key_value_heads` in Flax Llama Implementation (#29557) · 8e08acad
Benjamin Minixhofer authored Mar 27, 2024
```
* fix tinyllama flax modelling

* rename vars to minimize changes

* move

* formatting

* remove unused var
```
8e08acad

26 Mar, 2024 8 commits
- Set custom_container in build docs workflows (#29855) · f01e1609
  Lucain authored Mar 26, 2024
  
  f01e1609
- Disable AMD memory benchmarks (#29871) · 07d79520
  Ilyas Moutawwakil authored Mar 26, 2024
```
* remove py3nvml to skip amd memory benchmarks

* uninstall pynvml from docker images
```
  07d79520
- Add `cosine_with_min_lr` scheduler in Trainer (#29341) · ef609958
  Yanyi Liu authored Mar 26, 2024
```
* Add cosine_with_min_lr scheduler

* Update error message for missing min_lr or min_lr_rate
```
  ef609958
- Allow `bos_token_id is None` during the generation with `inputs_embeds` (#29772) · 998b5bb5
  Zhihao Lin authored Mar 26, 2024
```
* update

* add ut

* update
```
  998b5bb5
- [docs] Indent ordered list in add_new_model.md (#29796) · b9ceb03d
  Michael authored Mar 26, 2024
  
  b9ceb03d
- Fix header in IFE task guide (#29859) · de81a677
  Merve Noyan authored Mar 26, 2024
```
Update image_feature_extraction.md
```
  de81a677
- Replace 'decord' with 'av' in VideoClassificationPipeline (#29747) · b32bf85b
  yunxiangtang authored Mar 26, 2024
```
* replace the 'decord' with 'av' in VideoClassificationPipeline

* fix the check of backend in VideoClassificationPipeline

* adjust the order of imports

* format 'video_classification.py'

* format 'video_classification.py' with ruff

---------
Co-authored-by: wanqiancheng <13541261013@163.com>
```
  b32bf85b
- Add warnings if training args differ from checkpoint trainer state (#29255) · b5a6d6ee
  Jonathan Flynn authored Mar 26, 2024
```
* add warnings if training args differ from checkpoint args stored in trainer_state.json

* run formatting and styling

* add a test

* format and styling

---------
Co-authored-by: Jonathan Flynn <jonl.flynn@guardian.co.uk>
```
  b5a6d6ee
25 Mar, 2024 4 commits

remove quotes in code example (#29812) · 7eb3ba82
Johannes Kolbe authored Mar 25, 2024
```
Co-authored-by: Johannes <johannes.kolbe@tech.better.team>
```
7eb3ba82
[`revert commit`] revert 00a09ed4 · e3e16ddc
Arthur Zucker authored Mar 25, 2024

e3e16ddc
fix 😭 · 00a09ed4
Arthur Zucker authored Mar 25, 2024

00a09ed4

Populate torch_dtype from model to pipeline (#28940) · 8e9a2207

Yuki Watanabe authored Mar 25, 2024



* Populate torch_dtype from model to pipeline
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

* use property
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

* lint
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

* Remove default handling
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

---------
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

8e9a2207