Commits · 80377eb018c077dba434bc8e7912bcaed3a64d09 · chenpangpang / transformers

08 Dec, 2023 1 commit

F.scaled_dot_product_attention support (#26572) · 80377eb0

fxmarty authored Dec 08, 2023



* add sdpa

* wip

* cleaning

* add ref

* yet more cleaning

* and more :)

* wip llama

* working llama

* add output_attentions=True support

* bigcode sdpa support

* fixes

* gpt-bigcode support, require torch>=2.1.1

* add falcon support

* fix conflicts falcon

* style

* fix attention_mask definition

* remove output_attentions from attnmaskconverter

* support whisper without removing any Copied from statement

* fix mbart default to eager renaming

* fix typo in falcon

* fix is_causal in SDPA

* check is_flash_attn_2_available in the models init as well in case the model is not initialized through from_pretrained

* add warnings when falling back on the manual implementation

* precise doc

* wip replace _flash_attn_enabled by config.attn_implementation

* fix typo

* add tests

* style

* add a copy.deepcopy on the config in from_pretrained, as we do not want to modify it inplace

* obey to config.attn_implementation if a config is passed in from_pretrained

* fix is_torch_sdpa_available when torch is not installed

* remove dead code

* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/bart/modeling_bart.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove duplicate pretraining_tp code

* add dropout in llama

* precise comment on attn_mask

* add fmt: off for _unmask_unattended docstring

* precise num_masks comment

* nuke pretraining_tp in LlamaSDPAAttention following Arthur's suggestion

* cleanup modeling_utils

* backward compatibility

* fix style as requested

* style

* improve documentation

* test pass

* style

* add _unmask_unattended tests

* skip meaningless tests for idefics

* hard_check SDPA requirements when specifically requested

* standardize the use if XXX_ATTENTION_CLASSES

* fix SDPA bug with mem-efficient backend on CUDA when using fp32

* fix test

* rely on SDPA is_causal parameter to handle the causal mask in some cases

* fix FALCON_ATTENTION_CLASSES

* remove _flash_attn_2_enabled occurences

* fix test

* add OPT to the list of supported flash models

* improve test

* properly test on different SDPA backends, on different dtypes & properly handle separately the pad tokens in the test

* remove remaining _flash_attn_2_enabled occurence

* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/perf_infer_gpu_one.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove use_attn_implementation

* fix docstring & slight bug

* make attn_implementation internal (_attn_implementation)

* typos

* fix tests

* deprecate use_flash_attention_2=True

* fix test

* add back llama that was removed by mistake

* fix tests

* remove _flash_attn_2_enabled occurences bis

* add check & test that passed attn_implementation is valid

* fix falcon torchscript export

* fix device of mask in tests

* add tip about torch.jit.trace and move bt doc below sdpa

* fix parameterized.expand order

* move tests from test_modeling_attn_mask_utils to test_modeling_utils as a relevant test class is already there

* update sdpaattention class with the new cache

* Update src/transformers/configuration_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/bark/modeling_bark.py

* address review comments

* WIP torch.jit.trace fix. left: test both eager & sdpa

* add test for torch.jit.trace for both eager/sdpa

* fix falcon with torch==2.0 that needs to use sdpa

* fix doc

* hopefully last fix

* fix key_value_length that has no default now in mask converter

* is it flacky?

* fix speculative decoding bug

* tests do pass

* fix following #27907

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

80377eb0

04 Dec, 2023 1 commit
- translate internal folder files to chinese (#27638) · a502b0d4
  jiaqiw09 authored Dec 05, 2023
```
* translate

* update

* update

---------
Co-authored-by: jiaqiw <wangjiaqi50@huawei.com>
```
  a502b0d4
27 Nov, 2023 2 commits

translation main-class files to chinese (#27588) · cad1b119

jiaqiw09 authored Nov 28, 2023



* translate work

* update

* update

* update [[autodoc]]

* Update callback.md

---------
Co-authored-by: jiaqiw <wangjiaqi50@huawei.com>

cad1b119

docs: replace torch.distributed.run by torchrun (#27528) · ce315081

Peter Pan authored Nov 28, 2023



* docs: replace torch.distributed.run by torchrun

 `transformers` now officially support pytorch >= 1.10.
 The entrypoint `torchrun`` is present from 1.10 onwards.
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>

* Update src/transformers/trainer.py

with @ArthurZucker's suggestion
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

ce315081

17 Nov, 2023 2 commits
- translate deepspeed.md to chinese (#27495) · d1a00f9d
  jiaqiw09 authored Nov 18, 2023
```
* translate deepspeed.md

* update
```
  d1a00f9d
- Broken links fixed related to datasets docs (#27569) · ffbcfc01
  V.Prasanna kumar authored Nov 18, 2023
```
fixed the broken links belogs to dataset library of transformers
```
  ffbcfc01
16 Nov, 2023 2 commits

translate Trainer.md to chinese (#27527) · b074461e
jiaqiw09 authored Nov 16, 2023
```
* translate

* update

* update
```
b074461e

translate model.md to chinese (#27518) · 06343b06

Hz, Ji authored Nov 16, 2023



* translate model.md to chinese

* apply review suggestion
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

06343b06

14 Nov, 2023 1 commit
- translate hpo_train.md and perf_hardware.md to chinese (#27431) · 73bc0c9e
  jiaqiw09 authored Nov 14, 2023
```
* translate

* translate

* update
```
  73bc0c9e
13 Nov, 2023 1 commit
- Perf torch compile (#27422) · eb79b55b
  jiaqiw09 authored Nov 13, 2023
```
* translate perrf_torch_compile.md

* translate tf_xla.md

* update
```
  eb79b55b
08 Nov, 2023 2 commits
- translate debugging.md to chinese (#27374) · ced9fd86
  jiaqiw09 authored Nov 08, 2023
```
* update

* update
```
  ced9fd86
- translate big_models.md and performance.md to chinese (#27334) · ef716736
  jiaqiw09 authored Nov 08, 2023
```
* translate performance.md

* tranlsate performance.md and big_models.md

* update translation

* update review
```
  ef716736
07 Nov, 2023 2 commits

translate model_sharing.md and llm_tutorial.md to chinese (#27283) · e2647450

jiaqiw09 authored Nov 07, 2023

* translate model_sharing.md

* translate llm_tutorial.md to chiense

* update wrong translation

* update _torctree.yml

* update typos

* update

e2647450

translate the en tokenizer_summary.md to Chinese (#27291) · f213d5dd
九是否随意的称呼 authored Nov 08, 2023
```
* translate the en tokenizer_summary.md to Chinese

* revise WordPiece

* add to source/zh/_toctree.yml
```
f213d5dd

06 Nov, 2023 1 commit
- [docs] fixed links with 404 (#27327) · 9beb2737
  Maria Khalusova authored Nov 06, 2023
```
* fixed links with 404

* make style
```
  9beb2737
03 Nov, 2023 2 commits
- translate run_scripts.md to chinese (#27246) · cc3e4781
  jiaqiw09 authored Nov 03, 2023
```
* translate run_scripts.md to chinese

* translate run_scripts.md to chinese

* translate run_scripts.md to chinese
```
  cc3e4781
- translate autoclass_tutorial to chinese (#27269) · bf7cfac2
  jiaqiw09 authored Nov 03, 2023
```
* translate autoclass_tutorial.md  to chinese

* translate update
```
  bf7cfac2
02 Nov, 2023 1 commit
- translate peft.md to chinese (#27215) · 00d8502b
  jiaqiw09 authored Nov 02, 2023
```
* tranlsate peft.md to chinese

* translate peft.md to chinese

* fix missing link
```
  00d8502b
01 Nov, 2023 1 commit

Translate task summary to chinese (#27180) · 239cd0ea

jiaqiw09 authored Nov 01, 2023

* translate task_summary.md to chinese

* update translation

* update translation

* fix _toctree.yml

239cd0ea

31 Oct, 2023 2 commits

🌐

[i18n-ZH] Translate tflite.md into Chinese (#27134) · 7d8ff362

Yeyang authored Nov 01, 2023



* docs(zh): translate tflite.md

* docs(zh): add space around links

* Update docs/source/zh/tflite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

7d8ff362

translate traning.md to chinese (#27122) · 6b7f8ff1

jiaqiw09 authored Oct 31, 2023

* translate traning.md

* update _tocree.yml

* update _tocree.yml

* update _tocree.yml

6b7f8ff1

30 Oct, 2023 1 commit
- 🌐 [i18n-ZH] Translate serialization.md into Chinese (#27076) · 9093b19b
  Yeyang authored Oct 30, 2023
```
* docs(zh): translate serialization.md

* docs(zh): add space around links
```
  9093b19b
27 Oct, 2023 1 commit
- translate transformers_agents.md to Chinese (#27046) · ef23b68e
  jiaqiw09 authored Oct 27, 2023
```
* update translation

* fix problems mentioned in reviews
```
  ef23b68e
25 Oct, 2023 1 commit

🌐

[i18n-ZH] Translate custom_models.md into Chinese (#27065) · ba5144f7

Yeyang authored Oct 26, 2023



* docs(zh): translate custom_models.md

* minor fix in customer_models
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

ba5144f7

23 Oct, 2023 4 commits

🌐 [i18n-ZH] Translate create_a_model.md into Chinese (#27026) · 32f799db
Yeyang authored Oct 23, 2023
```
docs(zh): translate create_a_model.md
```
32f799db

translate `preprocessing.md` to Chinese (#26955) · b0d1d7f7

jiaqiw09 authored Oct 24, 2023



* translate preprocessing.md to Chinese

* update files fixing problems mentioned in review

* update files fixing problems mentioned in review

---------
Co-authored-by: jiaqiw <wangjiaqi50@huawei.com>

b0d1d7f7

🌐

[i18n-ZH] Translate multilingual into Chinese (#26935) · 19ae0505

Yeyang authored Oct 23, 2023



translate multilingual into Chinese
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

19ae0505

Translate `pipeline_tutorial.md` to chinese (#26954) · f09a081d

jiaqiw09 authored Oct 23, 2023



* update translation of pipeline_tutorial and preprocessing(Version1.0)

* update translation of pipeline_tutorial and preprocessing(Version2.0)

* update translation docs

* update to fix problems mentioned in review

---------
Co-authored-by: jiaqiw <wangjiaqi50@huawei.com>

f09a081d

18 Oct, 2023 1 commit
- [i18n-ZH] Translated fast_tokenizers.md to Chinese (#26910) · 732d2a8a
  Yeyang authored Oct 18, 2023
```
docs: translate fast_tokenizers into Chinese
```
  732d2a8a
11 Oct, 2023 1 commit

Translated the accelerate.md file of the documentation to Chinese (#26161) · e1cec434

TERRY LEE authored Oct 12, 2023



* translate accelerate page

* Update docs/source/zh/accelerate.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

e1cec434

06 Oct, 2023 1 commit
- docs(zh): review and punctuation & space fix (#26627) · 897a826d
  Jabasukuriputo Wang authored Oct 06, 2023
  
  897a826d
04 Oct, 2023 1 commit
- add zh translation for installation (#26084) · 43bfd093
  Yeyang authored Oct 05, 2023
```
* translate installation to zh

* fix translation typo
```
  43bfd093
06 Sep, 2023 1 commit

Fix small typo README.md (#25934) · 3e203f92

zspo authored Sep 06, 2023



* fix some samll bugs in readme

* Update docs/README.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

3e203f92

16 Aug, 2023 1 commit

[TYPO] fix typo/format in quicktour.md (#25519) · c385de24

lishukan authored Aug 16, 2023



* fix_all_language_quicktour

* give up ! before bash command

---------
Co-authored-by: lishukan <lishukan@dxy.cn>

c385de24

20 Jun, 2023 1 commit

Migrate doc files to Markdown. (#24376) · eb849f66

Sylvain Gugger authored Jun 20, 2023



* Rename index.mdx to index.md

* With saved modifs

* Address review comment

* Treat all files

* .mdx -> .md

* Remove special char

* Update utils/tests_fetcher.py
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

---------
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

eb849f66

04 Apr, 2023 1 commit

Flax Regnet (#21867) · 90067748

Shubhamai authored Apr 04, 2023

* initial commit

* review changes

* post model PR merge

* updating doc

90067748

14 Mar, 2023 1 commit

Add ConvNeXT V2 (#21679) · cdddfbff

Alara Dirik authored Mar 14, 2023

* Add ConvNeXt V2 to transformers
* TF model is separated from the PR to fix issues

cdddfbff

30 Jan, 2023 1 commit
- translate index to zh(#20095) (#21351) · 95be242a
  BFSS authored Jan 31, 2023
```
translate index to zh
Co-authored-by: bfss <bfss@bfss.com>
```
  95be242a
21 Nov, 2022 1 commit

translate zh quicktour(#20095) (#20181) · 05d80d85

BFSS authored Nov 21, 2022



* zh quicktour(#20095)

* add zh to doc workflow

* remove untranslation from toctree
Co-authored-by: BeifangSusu <BeifangSusu@bfss.com>

05d80d85