Commits · 22d159ddf93990a3340a93aa49fa96ca60bd8cc3 · chenpangpang / transformers

28 Mar, 2024 5 commits

Adding Flash Attention 2 Support for GPT2 (#29226) · 22d159dd

Eduardo Pacheco authored Mar 28, 2024



* First commit to add flash attention 2 for GPT-2

* more improvements

* Make GPT2 pass tests and fixed Decison Transformers copies

* Fixed missing arg

* fix copies

* Added expected speedup

* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Added test

* Fixed attn attribute

* Update docs/source/en/model_doc/gpt2.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt2.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update Decision transformer attentions

* More updates

* Passing tests

* Fix copies

* Fix copies part 2

* Decision transformer updates

* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fix copies

* Decision transformer not supporting flash attn

* Addressed comments

* Addressed comments

* Addressed comments

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

22d159dd

[`pipeline`]. Zero shot add doc warning (#29845) · 3a7e6836
Arthur authored Mar 28, 2024
```
* add doc warning

* fix build pr
```
3a7e6836
[`GptNeox`] don't gather on pkv when using the trainer (#29892) · 543889f3
Arthur authored Mar 28, 2024
```
don't gather on pkv when using the trainer
```
543889f3
[`make fix-copies`] update and help (#29924) · b256516a
Arthur authored Mar 28, 2024
```
* add some help

* style
```
b256516a
Fix typo in T5Block error message (#29881) · d9dc993f
Minseo Kang authored Mar 28, 2024

d9dc993f

27 Mar, 2024 9 commits

MixtralSparseMoeBlock: add gate jitter (#29865) · a25037be

Lorenzo Verardo authored Mar 27, 2024

This commit adds gate jitter to MixtralSparseMoeBlock's input data
before passing it through the MoE layer, if turned on.

a25037be

add Cambricon MLUs support (#29627) · 75769744

huismiling authored Mar 27, 2024

* add Cambricon MLUs support

* fix mlu device rng state

* up for quality check

* up mlu to support fp16

* fix mlu device dependency error

* fix mlu device dependency error

* enable mlu device for bf16

* fix mlu device memory tracker

75769744

Move `eos_token_id` to stopping criteria (#29459) · 0efcf323

Raushan Turganbay authored Mar 27, 2024



* add eos stopping criteria

* minor fix

* Update tests/generation/test_stopping_criteria.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* check eos is not None and fix tests

* make style and fixup

* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update tests/generation/test_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update tests/generation/test_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* camel case everywhere

* call stopping criteria list for candidate ids

* make style  and fixup

* Empty commit

* Empty commit to pass flaky test

* set max length in PromptLookupCandidateGenerator

* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* lets fix this typo in docs

* Update src/transformers/generation/utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* update PR

* empty commit

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

0efcf323

fix fuyu device_map compatibility (#29880) · 31c575bc
Marc Sun authored Mar 27, 2024
```
fix foward
```
31c575bc

Reimplement "Automatic safetensors conversion when lacking these files" (#29846) · 4d8427f7

Lysandre Debut authored Mar 27, 2024

* Automatic safetensors conversion when lacking these files (#29390)

* Automatic safetensors conversion when lacking these files

* Remove debug

* Thread name

* Typo

* Ensure that raises do not affect the main thread

* Catch all errors

4d8427f7

Fix 29807, sinusoidal positional encodings overwritten by post_init() (#29813) · a81cf9ee

Hovnatan Karapetyan authored Mar 27, 2024

* Check for requires_grad when initing weights

* Add unit test

* Move sinusoidal positional encoding generation after post_init()

* Add modules to skip init list

* Move create_sinusoidal_embeddings to _init_weights

a81cf9ee

Mamba `slow_forward` gradient fix (#29563) · cefb819f

Anton Vlasjuk authored Mar 27, 2024

* FIX: Cached slow forward in mamba
- additionally added mamba cached test
- added unused test (mamba causal lm forward and backward)
- fixed typo: "causl" --> "causal"

* formatting

* fix: use real `slow_forward` call instead of torch module's

* add shape assertion for mixer block test

* adjust shape assertion

cefb819f

Add Qwen2MoE (#29377) · 1c39974a

Bo Zheng authored Mar 27, 2024



* add support for qwen2 MoE models

* update docs

* add support for qwen2 MoE models

* update docs

* update model name & test

* update readme

* update class names & readme & model_doc of Qwen2MoE.

* update architecture name

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* update modeling_qwen2_moe.py

* fix model architecture

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* update modeling_qwen2_moe.py

* fix model architecture

* fix style

* fix test when there are sparse and non sparse layers

* fixup

* Update README.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixup

* fixup

* add archive back

* add support for qwen2 MoE models

* update docs

* update model name & test

* update readme

* update class names & readme & model_doc of Qwen2MoE.

* update architecture name

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* update modeling_qwen2_moe.py

* fix model architecture

* fixup

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* fix style

* fix test when there are sparse and non sparse layers

* fixup

* add archive back

* fix integration test

* fixup

---------
Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

1c39974a

Support `num_attention_heads` != `num_key_value_heads` in Flax Llama Implementation (#29557) · 8e08acad
Benjamin Minixhofer authored Mar 27, 2024
```
* fix tinyllama flax modelling

* rename vars to minimize changes

* move

* formatting

* remove unused var
```
8e08acad

26 Mar, 2024 8 commits
- Set custom_container in build docs workflows (#29855) · f01e1609
  Lucain authored Mar 26, 2024
  
  f01e1609
- Disable AMD memory benchmarks (#29871) · 07d79520
  Ilyas Moutawwakil authored Mar 26, 2024
```
* remove py3nvml to skip amd memory benchmarks

* uninstall pynvml from docker images
```
  07d79520
- Add `cosine_with_min_lr` scheduler in Trainer (#29341) · ef609958
  Yanyi Liu authored Mar 26, 2024
```
* Add cosine_with_min_lr scheduler

* Update error message for missing min_lr or min_lr_rate
```
  ef609958
- Allow `bos_token_id is None` during the generation with `inputs_embeds` (#29772) · 998b5bb5
  Zhihao Lin authored Mar 26, 2024
```
* update

* add ut

* update
```
  998b5bb5
- [docs] Indent ordered list in add_new_model.md (#29796) · b9ceb03d
  Michael authored Mar 26, 2024
  
  b9ceb03d
- Fix header in IFE task guide (#29859) · de81a677
  Merve Noyan authored Mar 26, 2024
```
Update image_feature_extraction.md
```
  de81a677
- Replace 'decord' with 'av' in VideoClassificationPipeline (#29747) · b32bf85b
  yunxiangtang authored Mar 26, 2024
```
* replace the 'decord' with 'av' in VideoClassificationPipeline

* fix the check of backend in VideoClassificationPipeline

* adjust the order of imports

* format 'video_classification.py'

* format 'video_classification.py' with ruff

---------
Co-authored-by: wanqiancheng <13541261013@163.com>
```
  b32bf85b
- Add warnings if training args differ from checkpoint trainer state (#29255) · b5a6d6ee
  Jonathan Flynn authored Mar 26, 2024
```
* add warnings if training args differ from checkpoint args stored in trainer_state.json

* run formatting and styling

* add a test

* format and styling

---------
Co-authored-by: Jonathan Flynn <jonl.flynn@guardian.co.uk>
```
  b5a6d6ee
25 Mar, 2024 6 commits

remove quotes in code example (#29812) · 7eb3ba82
Johannes Kolbe authored Mar 25, 2024
```
Co-authored-by: Johannes <johannes.kolbe@tech.better.team>
```
7eb3ba82
[`revert commit`] revert 00a09ed4 · e3e16ddc
Arthur Zucker authored Mar 25, 2024

e3e16ddc
fix 😭 · 00a09ed4
Arthur Zucker authored Mar 25, 2024

00a09ed4

Populate torch_dtype from model to pipeline (#28940) · 8e9a2207

Yuki Watanabe authored Mar 25, 2024



* Populate torch_dtype from model to pipeline
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

* use property
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

* lint
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

* Remove default handling
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

---------
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

8e9a2207

Fix the behavior of collecting 'num_input_tokens_seen' (#29099) · afe73aed

yhuang authored Mar 25, 2024

fix the behavior of collecting 'num_input_tokens_seen'

See https://github.com/huggingface/transformers/issues/28791 for more details.

afe73aed

Remove static pretrained maps from the library's internals (#29112) · 39114c03

Lysandre Debut authored Mar 25, 2024



* [test_all] Remove static pretrained maps from the library's internals

* Deprecate archive maps instead of removing them

* Revert init changes

* [test_all] Deprecate instead of removing

* [test_all] PVT v2 support

* [test_all] Tests should all pass

* [test_all] Style

* Address review comments

* Update src/transformers/models/deprecated/_archive_maps.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/deprecated/_archive_maps.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* [test_all] trigger tests

* [test_all] LLAVA

* [test_all] Bad rebase

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

39114c03

24 Mar, 2024 1 commit

model_summary.md - Restore link to Harvard's Annotated Transformer. (#29702) · 76a33a10

gamepad_coder authored Mar 23, 2024

* model_summary.md - Add link to Harvard's Annotated Transformer.

* model_summary.md - slight wording change + capitalize name of the paper

* model_summary.md - moves the Annotated Transformer link in a praenthesis next to the link to the original paper (great idea, stevhliu!)

* model_summary.md - moves the Annotated Transformer link in a praenthesis next to the link to the original paper (commit pt. 2, accidentally removed "has" in pt. 1)

76a33a10

23 Mar, 2024 1 commit
- [DOCS] Fix typo for llava next docs (#29829) · dafe3702
  Billy Cao authored Mar 24, 2024
```
Fix typo for llava next docs
```
  dafe3702
22 Mar, 2024 10 commits

[`SuperPoint`] Fix doc example (#29816) · c5f0288b
amyeroberts authored Mar 22, 2024
```
[SuperPoint] Fix doc example
```
c5f0288b

Complete security policy with mentions of remote code (#29707) · 7e1413d1

Lysandre Debut authored Mar 22, 2024



* Security policy

* Apply suggestions from code review
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
Co-authored-by: Michelle Habonneau <83347449+Michellehbn@users.noreply.github.com>

* Update SECURITY.md
Co-authored-by: Diogo Teles Sant'Anna <diogoteles@google.com>

---------
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
Co-authored-by: Michelle Habonneau <83347449+Michellehbn@users.noreply.github.com>
Co-authored-by: Diogo Teles Sant'Anna <diogoteles@google.com>

7e1413d1

[`cleanup`] vestiges of causal mask (#29806) · 2e7cb46f
Arthur authored Mar 22, 2024
```
nit
```
2e7cb46f
replaced concatenation to f-strings to improve readability and unify … (#29785) · 884b2215
igeni authored Mar 22, 2024
```
replaced concatenation to f-strings to improve readability and unify with the rest code
```
884b2215
Generate: remove unused attributes in `AssistedCandidateGenerator` (#29787) · 34e07f4b
Joao Gante authored Mar 22, 2024
```
remove unused attrs
```
34e07f4b

rm input dtype change in CPU (#28631) · e85654f5

jiqing-feng authored Mar 22, 2024

* rm input dtype change in CPU

* add warning when use CPU low-precision

* rm useless logging

e85654f5

Correct llava mask & fix missing setter for `vocab_size` (#29389) · 13b23704

fxmarty authored Mar 22, 2024

* correct llava mask

* fix vipllava as wlel

* mask out embedding for padding tokens

* add test

* fix style

* add setter

* fix test on suggestion

13b23704

Enable AMD docker build CI (#29803) · aa17cf98
Ilyas Moutawwakil authored Mar 22, 2024
```
* enable amd ci

* remove unnecessary clean up
```
aa17cf98

Fix type hint for train_dataset param of Trainer.__init__() to allow... · 34791613

Steven Madere authored Mar 22, 2024

Fix type hint for train_dataset param of Trainer.__init__() to allow IterableDataset.  Issue 29678 (#29738)

* Fixed typehint for train_dataset param in Trainer.__init__().  Added IterableDataset option.

* make fixup

34791613

[`quality`] update quality check to make sure we check imports

😈

(#29771) · e68ff304

Arthur authored Mar 22, 2024

* update quality check

* make it nice

* update

* let's make sure it runs and we have the logs actually

* update workflow

* nits

e68ff304