Commits · f4db565b695582891e43a5e042e5d318e28f20b8 · chenpangpang / transformers

12 Dec, 2023 7 commits
- fix typo in dvclive callback (#27983) · f4db565b
  Dave Berenbaum authored Dec 12, 2023
  
  f4db565b
- [doc] fix typo (#27981) · 99361430
  Stas Bekman authored Dec 12, 2023
  
  99361430
- Fix SDPA correctness following torch==2.1.2 regression (#27973) · 78172dcd
  fxmarty authored Dec 12, 2023
```
* fix sdpa with non-contiguous inputs for gpt_bigcode

* fix other archs

* add currently comment

* format
```
  78172dcd
- Better key error for AutoConfig (#27976) · 5e4ef0a0
  Matt authored Dec 12, 2023
```
* Improve the error printed when loading an unrecognized architecture

* Improve the error printed when loading an unrecognized architecture

* Raise a ValueError instead because KeyError prints weirdly

* make fixup
```
  5e4ef0a0
- Fix link in README.md of Image Captioning (#27969) · a49f4aca
  saswatmeher authored Dec 12, 2023
```
Update the link for vision encoder decoder doc used by
FlaxVisionEncoderDecoderModel link.
```
  a49f4aca
- Hot-fix-mixstral-loss (#27948) · 680c610f
  Arthur authored Dec 12, 2023
```
* fix loss computation

* compute on GPU if possible
```
  680c610f
- Generate: `assisted_decoding` now accepts arbitrary candidate generators (#27750) · 4b759da8
  Joao Gante authored Dec 12, 2023
```
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
```
  4b759da8
11 Dec, 2023 26 commits

fixed typos (issue 27919) (#27920) · e6604247

Anthony Susevski authored Dec 11, 2023



* fixed typos (issue 27919)

* Update docs/source/en/tasks/knowledge_distillation_for_image_classification.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

e6604247

Support PeftModel signature inspect (#27865) · e5079b0b

dancingpipi authored Dec 12, 2023



* Support PeftModel signature inspect

* Use get_base_model() to get the base model

---------
Co-authored-by: shujunhua1 <shujunhua1@jd.com>

e5079b0b

[docs] Fused AWQ modules (#27896) · 35478182
Steven Liu authored Dec 11, 2023
```
streamline
```
35478182
Update bounding box format everywhere (#27944) · 67b1335c
NielsRogge authored Dec 11, 2023
```
Update formats
```
67b1335c
[`Mixtral`] Change mistral op order (#27955) · 54d0b1c2
Younes Belkada authored Dec 11, 2023
```
up
```
54d0b1c2

fix no sequence length models error (#27522) · 4850aaba

Adam Louly authored Dec 11, 2023

* fix no sequence length models error

* block size check

---------

Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>

4850aaba

Fix for stochastic depth decay rule in the TimeSformer implementation (#27875) · 4b4b8642
Ashish Tawari authored Dec 11, 2023
```
Update modeling_timesformer.py

Fixing typo to correct the stochastic depth decay rule
```
4b4b8642
fix bug in mask2former: cost matrix is infeasible (#27897) · c0a354d8
Chenhao Xu authored Dec 12, 2023
```
fix bug: cost matrix is infeasible
```
c0a354d8

Fix a couple of typos and add an illustrative test (#26941) · 7e35f370

rjenc29 authored Dec 11, 2023

* fix a typo and add an illustrative test

* appease black

* reduce code duplication and add Annotion type back with a pending deprecation warning

* remove unused code

* change warning type

* black formatting fix

* change enum deprecation approach to support 3.8 and earlier

* add stacklevel

* fix black issue

* fix ruff issues

* fix ruff issues

* move tests to own mixin

* include yolos

* fix black formatting issue

* fix black formatting issue

* use logger instead of warnings and include target version for deprecation

7e35f370

Add deepspeed test to amd scheduled CI (#27633) · 39acfe84

Ella Charlaix authored Dec 11, 2023



* add deepspeed scheduled test for amd

* fix image

* add dockerfile

* add comment

* enable tests

* trigger

* remove trigger for this branch

* trigger

* change runner env to trigger the docker build image test

* use new docker image

* remove test suffix from docker image tag

* replace test docker image with original image

* push new image

* Trigger

* add back amd tests

* fix typo

* add amd tests back

* fix

* comment until docker image build scheduled test fix

* remove deprecated deepspeed build option

* upgrade torch

* update docker & make tests pass

* Update docker/transformers-pytorch-deepspeed-amd-gpu/Dockerfile

* fix

* tmp disable test

* precompile deepspeed to avoid timeout during tests

* fix comment

* trigger deepspeed tests with new image

* comment tests

* trigger

* add sklearn dependency to fix slow tests

* enable back other tests

* final update

---------
Co-authored-by: Felix Marty <felix@hf.co>
Co-authored-by: Félix Marty <9808326+fxmarty@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

39acfe84

Fix AMD scheduled CI not triggered (#27951) · 0f59d2f1
Yih-Dar authored Dec 11, 2023
```
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
0f59d2f1
In PreTrainedTokenizerBase add missing word in error message (#27949) · 417bb914
Peter Götz authored Dec 11, 2023
```
"text input must of type" -> "text input must be of type"
```
417bb914
Fix parameter count in readme for mixtral 45b (#27945) · 5cec306c
Timon Käch authored Dec 11, 2023
```
fix parameter count in readme
```
5cec306c
Update import message (#27946) · 921a6bf2
NielsRogge authored Dec 11, 2023
```
* Update import message

* Update message
```
921a6bf2
Fix test for auto_find_batch_size on multi-GPU (#27947) · 44127ec6
Zach Mueller authored Dec 11, 2023
```
* Fix test for multi-GPU

* WIth CPU handle
```
44127ec6

Docs for AutoBackbone & Backbone (#27456) · b911c1f1

Merve Noyan authored Dec 11, 2023



* Initial commit for AutoBackbone & Backbone

* Added timm and clarified out_indices

* Swapped the example to out_indices

* fix toctree

* Update autoclass_tutorial.md

* Update backbones.md

* Update autoclass_tutorial.md

* Add dummy torch input instead

* Add dummy torch input

* Update autoclass_tutorial.md

* Update backbones.md

* minor fix

* Update docs/source/en/main_classes/backbones.md
Co-authored-by: Maria Khalusova <kafooster@gmail.com>

* Update docs/source/en/autoclass_tutorial.md
Co-authored-by: Maria Khalusova <kafooster@gmail.com>

* Added illustrations and explained backbone & neck

* Update docs/source/en/main_classes/backbones.md
Co-authored-by: Maria Khalusova <kafooster@gmail.com>

* Update backbones.md

---------
Co-authored-by: Maria Khalusova <kafooster@gmail.com>

b911c1f1

use logger.warning_once to avoid massive outputs (#27428) · e49c3852

YQ authored Dec 11, 2023

* use logger.warning_once to avoid massive outputs when training/finetuning longformer

* update more

e49c3852

Fix PatchTSMixer Docstrings (#27943) · 6ff10922

vijaye12 authored Dec 11, 2023



* docstring corrections

* style make

---------
Co-authored-by: vijaye12 <vijaye12@in.ibm.com>

6ff10922

[`Add Mixtral`] Adds support for the Mixtral MoE (#27942) · accccdd0

Arthur authored Dec 11, 2023



* up

* up

* test

* logits ok

* up

* up

* few fixes

* conversion script

* up

* nits

* nits

* update

* nuke

* more updates

* nites

* fix many issues

* nit

* scatter

* nit

* nuke megablocks

* nits

* fix conversion script

* nit

* remove

* nits

* nit

* update

* oupsssss

* change

* nits device

* nits

* fixup

* update

* merge

* add copied from

* fix the copy mentions

* update tests

* more fixes

* nits

* conversion script

* add parts of the readme

* Update tests/models/mixtral/test_modeling_mixtral.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* new test + conversion script

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Apply suggestions from code review

* fix

* fix copies

* fix copies

* ooops

* fix config

* Apply suggestions from code review

* fix nits

* nit

* add copies

* add batched tests

* docs

* fix flash attention

* let's add more verbose

* add correct outputs

* support router ouptus

* ignore copies where needed

* fix

* cat list if list is given for now

* nits

* Update docs/source/en/model_doc/mixtral.md

* finish router refactoring

* fix forward

* fix expected values

* nits

* fixup

* fix

* fix bug

* fix

* fix dtype mismatch

* fix

* grrr grrr I support item assignment

* fix CI

* docs

* fixup

* remove some copied form

* fix weird diff

* skip doctest fast on the config and modeling

* mark that is supports flash attention in the doc

* update

* Update src/transformers/models/mixtral/modeling_mixtral.py
Co-authored-by: Lysandre Debut <hi@lysand.re>

* Update docs/source/en/model_doc/mixtral.md
Co-authored-by: Lysandre Debut <hi@lysand.re>

* revert router logits config issue

* update doc accordingly

* Update src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py

* nits

* use torch testing asssert close

* fixup

* doc nits

---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>

accccdd0

[`from_pretrained`] Make from_pretrained fast again (#27709) · 0676d992

Arthur authored Dec 11, 2023



* Skip nn.Module.reset_parameters

* Actually skip

* Check quality

* Maybe change all inits

* Fix init issues: only modify public functions

* Add a small test for now

* Style

* test updates

* style

* nice tes

* style

* make it even faster

* one more second

* remove fx icompatible

* Update tests/test_modeling_common.py
Co-authored-by: Lysandre Debut <hi@lysand.re>

* Update tests/test_modeling_common.py
Co-authored-by: Lysandre Debut <hi@lysand.re>

* skip

* fix quality

* protect the import

---------
Co-authored-by: Lysandre Debut <hi@lysand.re>

0676d992

Fix SDPA dispatch & make SDPA CI compatible with torch<2.1.1 (#27940) · 9f18cc6d
fxmarty authored Dec 11, 2023
```
fix sdpa dispatch
```
9f18cc6d
[LLaVa] Some improvements (#27895) · 7ea21f1f
NielsRogge authored Dec 11, 2023
```
* More improvements

* Improve variable names

* Update READMEs, improve docs
```
7ea21f1f
Fix `SeamlessM4Tv2ModelIntegrationTest` (#27911) · 5e620a92
Yoach Lacombe authored Dec 11, 2023
```
change dtype of some integration tests
```
5e620a92
Skip `UnivNetModelTest::test_multi_gpu_data_parallel_forward` (#27912) · e96c1de1
Yih-Dar authored Dec 11, 2023
```
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
e96c1de1
[BEiT] Fix test (#27934) · 8d8970ef
NielsRogge authored Dec 11, 2023
```
Fix test
```
8d8970ef

[DETA] fix backbone freeze/unfreeze function (#27843) · 235be085

Sangbum Daniel Choi authored Dec 11, 2023



* [DETA] fix freeze/unfreeze function

* Update src/transformers/models/deta/modeling_deta.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/deta/modeling_deta.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add freeze/unfreeze test case in DETA

* fix type

* fix typo 2

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

235be085

09 Dec, 2023 3 commits

Fix typo (#27918) · df5c5c62
Brendan Fahy authored Dec 09, 2023

df5c5c62

[integration] Update Ray Tune integration for Ray 2.7 (#26499) · 5fa66df3

Justin Yu authored Dec 09, 2023



* fix tune integration for ray 2.7+
Signed-off-by: Justin Yu <justinvyu@anyscale.com>

* add version check for ray tune backend availability
Signed-off-by: Justin Yu <justinvyu@anyscale.com>

* missing import
Signed-off-by: Justin Yu <justinvyu@anyscale.com>

* pin min version instead
Signed-off-by: Justin Yu <justinvyu@anyscale.com>

* address comments
Signed-off-by: Justin Yu <justinvyu@anyscale.com>

* some fixes
Signed-off-by: Justin Yu <justinvyu@anyscale.com>

* fix unnecessary final checkpoint
Signed-off-by: Justin Yu <justinvyu@anyscale.com>

* fix lint
Signed-off-by: Justin Yu <justinvyu@anyscale.com>

* dep table fix
Signed-off-by: Justin Yu <justinvyu@anyscale.com>

* fix lint
Signed-off-by: Justin Yu <justinvyu@anyscale.com>

---------
Signed-off-by: Justin Yu <justinvyu@anyscale.com>

5fa66df3

[CLAP] Replace hard-coded batch size to enable dynamic ONNX export (#27790) · ffd426ee
Joshua Lochner authored Dec 09, 2023
```
* [CLAP] Replace hard-coded batch size to enable dynamic ONNX export

* Add back docstring
```
ffd426ee

08 Dec, 2023 4 commits

F.scaled_dot_product_attention support (#26572) · 80377eb0

fxmarty authored Dec 08, 2023



* add sdpa

* wip

* cleaning

* add ref

* yet more cleaning

* and more :)

* wip llama

* working llama

* add output_attentions=True support

* bigcode sdpa support

* fixes

* gpt-bigcode support, require torch>=2.1.1

* add falcon support

* fix conflicts falcon

* style

* fix attention_mask definition

* remove output_attentions from attnmaskconverter

* support whisper without removing any Copied from statement

* fix mbart default to eager renaming

* fix typo in falcon

* fix is_causal in SDPA

* check is_flash_attn_2_available in the models init as well in case the model is not initialized through from_pretrained

* add warnings when falling back on the manual implementation

* precise doc

* wip replace _flash_attn_enabled by config.attn_implementation

* fix typo

* add tests

* style

* add a copy.deepcopy on the config in from_pretrained, as we do not want to modify it inplace

* obey to config.attn_implementation if a config is passed in from_pretrained

* fix is_torch_sdpa_available when torch is not installed

* remove dead code

* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/bart/modeling_bart.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove duplicate pretraining_tp code

* add dropout in llama

* precise comment on attn_mask

* add fmt: off for _unmask_unattended docstring

* precise num_masks comment

* nuke pretraining_tp in LlamaSDPAAttention following Arthur's suggestion

* cleanup modeling_utils

* backward compatibility

* fix style as requested

* style

* improve documentation

* test pass

* style

* add _unmask_unattended tests

* skip meaningless tests for idefics

* hard_check SDPA requirements when specifically requested

* standardize the use if XXX_ATTENTION_CLASSES

* fix SDPA bug with mem-efficient backend on CUDA when using fp32

* fix test

* rely on SDPA is_causal parameter to handle the causal mask in some cases

* fix FALCON_ATTENTION_CLASSES

* remove _flash_attn_2_enabled occurences

* fix test

* add OPT to the list of supported flash models

* improve test

* properly test on different SDPA backends, on different dtypes & properly handle separately the pad tokens in the test

* remove remaining _flash_attn_2_enabled occurence

* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/perf_infer_gpu_one.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove use_attn_implementation

* fix docstring & slight bug

* make attn_implementation internal (_attn_implementation)

* typos

* fix tests

* deprecate use_flash_attention_2=True

* fix test

* add back llama that was removed by mistake

* fix tests

* remove _flash_attn_2_enabled occurences bis

* add check & test that passed attn_implementation is valid

* fix falcon torchscript export

* fix device of mask in tests

* add tip about torch.jit.trace and move bt doc below sdpa

* fix parameterized.expand order

* move tests from test_modeling_attn_mask_utils to test_modeling_utils as a relevant test class is already there

* update sdpaattention class with the new cache

* Update src/transformers/configuration_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/bark/modeling_bark.py

* address review comments

* WIP torch.jit.trace fix. left: test both eager & sdpa

* add test for torch.jit.trace for both eager/sdpa

* fix falcon with torch==2.0 that needs to use sdpa

* fix doc

* hopefully last fix

* fix key_value_length that has no default now in mask converter

* is it flacky?

* fix speculative decoding bug

* tests do pass

* fix following #27907

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

80377eb0

Generate: SinkCache can handle iterative prompts (#27907) · ce0bbd51
Joao Gante authored Dec 08, 2023

ce0bbd51
fix typo in image_processing_blip.py Wwhether -> Whether (#27899) · 94c76538
zhc7 authored Dec 09, 2023

94c76538

[Doc] Spanish translation of pad_truncation.md (#27890) · d6c3a3f1

Aaron Jimenez authored Dec 08, 2023

* Add pad_truncation to es/_toctree.yml

* Add pad_truncation.md to es/

* Translated first two paragraph

* Translated paddig argument section

* Translated truncation argument section

* Translated final paragraphs

* Translated table

* Fixed typo in the table of en/pad_truncation.md

* Run make style | Fix a word

* Add Padding (relleno) y el Truncation (truncamiento) in the final paragraphs

* Fix relleno and truncamiento words

d6c3a3f1