Commits · 09edd77f64481cc9ce03067f4cfe74cef9d5d776 · chenpangpang / transformers

06 May, 2024 1 commit
- Check if the current compiled version of pytorch supports MPS (#30664) · 09edd77f
  jiaqianjing authored May 06, 2024
  
  09edd77f
02 May, 2024 1 commit

Add HQQ quantization support (#29637) · 59952994

mobicham authored May 02, 2024



* update HQQ transformers integration

* push import_utils.py

* add force_hooks check in modeling_utils.py

* fix | with Optional

* force bias as param

* check bias is Tensor

* force forward for multi-gpu

* review fixes pass

* remove torch grad()

* if any key in linear_tags fix

* add cpu/disk check

* isinstance return

* add multigpu test + refactor tests

* clean hqq_utils imports in hqq.py

* clean hqq_utils imports in quantizer_hqq.py

* delete hqq_utils.py

* Delete src/transformers/utils/hqq_utils.py

* ruff init

* remove torch.float16 from __init__ in test

* refactor test

* isinstance -> type in quantizer_hqq.py

* cpu/disk device_map check in quantizer_hqq.py

* remove type(module) nn.linear check in quantizer_hqq.py

* add BaseQuantizeConfig import inside HqqConfig init

* remove hqq import in hqq.py

* remove accelerate import from test_hqq.py

* quant config.py doc update

* add hqqconfig to main_classes doc

* make style

* __init__ fix

* ruff __init__

* skip_modules list

* hqqconfig format fix

* hqqconfig doc fix

* hqqconfig doc fix

* hqqconfig doc fix

* hqqconfig doc fix

* hqqconfig doc fix

* hqqconfig doc fix

* hqqconfig doc fix

* hqqconfig doc fix

* hqqconfig doc fix

* test_hqq.py remove mistral comment

* remove self.using_multi_gpu is False

* torch_dtype default val set and logger.info

* hqq.py isinstance fix

* remove torch=None

* torch_device test_hqq

* rename test_hqq

* MODEL_ID in test_hqq

* quantizer_hqq setattr fix

* quantizer_hqq typo fix

* imports quantizer_hqq.py

* isinstance quantizer_hqq

* hqq_layer.bias reformat quantizer_hqq

* Step 2 as comment in quantizer_hqq

* prepare_for_hqq_linear() comment

* keep_in_fp32_modules fix

* HqqHfQuantizer reformat

* quantization.md hqqconfig

* quantization.md model example reformat

* quantization.md # space

* quantization.md space   })

* quantization.md space   })

* quantization_config fix doc
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* axis value check in quantization_config

* format

* dynamic config explanation

* quant config method in quantization.md

* remove shard-level progress

* .cuda fix modeling_utils

* test_hqq fixes

* make fix-copies

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

59952994

22 Apr, 2024 1 commit

[FEAT]: EETQ quantizer support (#30262) · b4c18a83

zhong zhuang authored Apr 23, 2024



* [FEAT]: EETQ quantizer support

* Update quantization.md

* Update docs/source/en/main_classes/quantization.md
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update docs/source/en/quantization.md
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update docs/source/en/quantization.md
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/integrations/__init__.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/integrations/__init__.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/integrations/eetq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/integrations/eetq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/integrations/eetq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update tests/quantization/eetq_integration/test_eetq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/quantizers/auto.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/quantizers/auto.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/quantizers/auto.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/quantizers/quantizer_eetq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update tests/quantization/eetq_integration/test_eetq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/quantizers/quantizer_eetq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update tests/quantization/eetq_integration/test_eetq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update tests/quantization/eetq_integration/test_eetq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* [FEAT]: EETQ quantizer support

* [FEAT]: EETQ quantizer support

* remove whitespaces

* update quantization.md

* style

* Update docs/source/en/quantization.md
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* add copyright

* Update quantization.md

* Update docs/source/en/quantization.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/quantization.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Address the comments by amyeroberts

* style

---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

b4c18a83

11 Apr, 2024 1 commit
- Guard XLA version imports (#30167) · e50be9a0
  Zach Mueller authored Apr 11, 2024
  
  e50be9a0
28 Mar, 2024 1 commit

Allow GradientAccumulationPlugin to be configured from AcceleratorConfig (#29589) · 4df5b9b4

Yu Chin Fabian Lim authored Mar 28, 2024



* add gradient_accumulation_kwargs to AcceleratorConfig

* add suggestions from @muellerzr to docstrings, new behavior and tests

* Documentation suggestions from @muellerz
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* addressed @muellerzr comments regarding tests and test utils

* moved accelerate version to top of file.

* @muellerzr's variable fix
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* address @amyeroberts. fix tests and docstrings

* address @amyeroberts additional suggestions

---------
Co-authored-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

4df5b9b4

27 Mar, 2024 1 commit

add Cambricon MLUs support (#29627) · 75769744

huismiling authored Mar 27, 2024

* add Cambricon MLUs support

* fix mlu device rng state

* up for quality check

* up mlu to support fp16

* fix mlu device dependency error

* fix mlu device dependency error

* enable mlu device for bf16

* fix mlu device memory tracker

75769744

26 Mar, 2024 1 commit

Replace 'decord' with 'av' in VideoClassificationPipeline (#29747) · b32bf85b

yunxiangtang authored Mar 26, 2024



* replace the 'decord' with 'av' in VideoClassificationPipeline

* fix the check of backend in VideoClassificationPipeline

* adjust the order of imports

* format 'video_classification.py'

* format 'video_classification.py' with ruff

---------
Co-authored-by: wanqiancheng <13541261013@163.com>

b32bf85b

19 Mar, 2024 1 commit

FEAT / Optim: Add GaLore optimizer (#29588) · f6261d7d

Younes Belkada authored Mar 19, 2024



* add galore v1

* add import

* add tests and doc

* fix doctest

* forward contrib credits from discussions

* forward contrib credits from discussions

* Apply suggestions from code review
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* fix failing tests'

* switch to `optim_target_modules` and clarify docs

* more clarification

* enhance lookup logic

* update a test to add peak memory

* add regex, all-linear and single string support

* add layer-wise optimization through DummyOptimizers and LRSchedulers

* forward contrib credits from discussions and original idea

* add a section about DDP not supported in layerwise

* Update src/transformers/trainer.py
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* fix self

* check only if layer_wise

* Update src/transformers/training_args.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* oops

* make use of intervals

* clarify comment

* add matching tests

* GaLoRe -> GaLore

* move to `get_scheduler`

* add note on docs

* add a warning

* adapt a bit the docs

* update docstring

* support original API

* Update docs/source/en/trainer.md

* slightly refactor

* Update docs/source/en/trainer.md
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* Update src/transformers/training_args.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix args parsing and add tests

* remove warning for regex

* fix type hint

* add note about extra args

* make `is_regex` return optional

---------

Co-authored-by: Maxime <maximegmd @users.noreply.github.com>
Co-authored-by: Wing Lian <winglian @users.noreply.github.com>
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
Co-authored-by: hiyouga <hiyouga@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

f6261d7d

15 Mar, 2024 2 commits

[Quantization] Quanto quantizer (#29023) · 28de2f4d

Marc Sun authored Mar 15, 2024



* start integration

* fix

* add and debug tests

* update tests

* make pytorch serialization works

* compatible with device_map and offload

* fix tests

* make style

* add ref

* guard against safetensors

* add float8 and style

* fix is_serializable

* Fix shard_checkpoint compatibility with quanto

* more tests

* docs

* adjust memory

* better

* style

* pass tests

* Update src/transformers/modeling_utils.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* add is_safe_serialization instead

* Update src/transformers/quantizers/quantizer_quanto.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* add QbitsTensor tests

* fix tests

* simplify activation list

* Update docs/source/en/quantization.md
Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>

* better comment

* Update tests/quantization/quanto_integration/test_quanto.py
Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>

* Update tests/quantization/quanto_integration/test_quanto.py
Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>

* find and fix edge case

* Update docs/source/en/quantization.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* pass weights_only_kwarg instead

* fix shard_checkpoint loading

* simplify update_missing_keys

* Update tests/quantization/quanto_integration/test_quanto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* recursion to get all tensors

* block serialization

* skip serialization tests

* fix

* change by cuda:0 for now

* fix regression

* update device_map

* fix doc

* add noteboon

* update torch_dtype

* update doc

* typo

* typo

* remove comm

---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Younes Belkada <younesbelkada@gmail.com>

28de2f4d

Extend import utils to cover "editable" torch versions (#29000) · f62407f7

bhack authored Mar 15, 2024



* Extend import utils to cover "editable" torch versions

* Re-add type hint

* Remove whitespaces

* Double quote strings

* Update comment
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* Restore package_exists

* Revert "Restore package_exists"

This reverts commit 66fd2cd5c33d1b9a26a8f3e8adef2e6ec1214868.

---------
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

f62407f7

11 Mar, 2024 1 commit

Make torch xla available on GPU (#29334) · 873d9bb3

Yitong Huang authored Mar 11, 2024



* add USE_TORCH_XLA env

* rename torch_tpu to torch_xla

* better is_torch_xla_available; fix some fsdp and performance issues

* fix format

* fix bug when pjrt_device is cpu

* fix bug

* fix the deprecation handling

---------
Co-authored-by: anw90 <ang868@gmail.com>
Co-authored-by: wangang.wa <wangang.wa@alibaba-inc.com>

873d9bb3

05 Mar, 2024 1 commit

[`Add Mamba`] Adds support for the `Mamba` models (#28094) · fb1c62e9

Arthur authored Mar 05, 2024



* initial-commit

* start cleaning

* small nits

* small nits

* current updates

* add kernels

* small refactoring little step

* add comments

* styling

* nit

* nits

* Style

* Small changes

* Push dummy mambda simple slow

* nit

* Use original names

* Use original names and remove norm

* Updates for inference params

* Style nd updates

* nits

* Match logits

* Add a test

* Add expected generated text

* nits doc, imports and styling

* style

* oups

* dont install kernels, invite users to install the required kernels

* let use use the original packages

* styling

* nits

* fix some copieds

* update doc

* fix-copies

* styling done

* nits

* fix import check

* run but wrong cuda ress

* mamba CUDA works :)

* fix the fast path

* config naming nits

* conversion script is not required at this stage

* finish fixing the fast path: generation make sense now!

* nit

* Let's start working on the CIs

* style

* better style

* more nits

* test nit

* quick fix for now

* nits

* nit

* nit

* nit

* nits

* update test rest

* fixup

* update test

* nit

* some fixes

* nits

* update test values

* fix styling

* nit

* support peft

* integrations tests require torchg

* also add slow markers

* styling

* chose forward wisely

* nits

* update tests

* fix gradient checkpointing

* fixup

* nit

* fix doc

* check copies

* fix the docstring

* fix some more tests

* style

* fix beam search

* add init schene

* update

* nit

* fix

* fixup the doc

* fix the doc

* fixup

* tentative update but slow is no longer good

* nit

* should we always use float32?

* nits

* revert wrong changes

* res in float32

* cleanup

* skip fmt for now

* update generation values

* update test values running original model

* fixup

* update tests + rename inference_params to cache_params + make sure training does not use cache_params

* small nits

* more nits

* fix final CIs

* style

* nit doc

* I hope final doc nits

* nit

* 🫠

* final touch!

* fix torch import

* Apply suggestions from code review
Co-authored-by: Lysandre Debut <hi@lysand.re>

* Apply suggestions from code review

* fix fix and fix

* fix base model prefix!

* nit

* Update src/transformers/models/mamba/__init__.py

* Update docs/source/en/model_doc/mamba.md
Co-authored-by: Lysandre Debut <hi@lysand.re>

* nit

---------
Co-authored-by: Lysandre Debut <hi@lysand.re>

fb1c62e9

04 Mar, 2024 1 commit

Add mlx support to BatchEncoding.convert_to_tensors (#29406) · 704b3f74

Y4hL authored Mar 04, 2024

* Add mlx support

* Fix import order and use def instead of lambda

* Another fix for ruff format :)

* Add detecting mlx from repr, add is_mlx_array

704b3f74

26 Feb, 2024 2 commits
- Generate: v4.38 removals and related updates (#29171) · ece1b62b
  Joao Gante authored Feb 26, 2024
  
  ece1b62b
- Cache `is_vision_available` result (#29280) · 9fe36088
  Benjamin Muskalla authored Feb 26, 2024
```
Cache `is_vision_available`

This check is used quite often during process in image models and can take up a serious amount of time compared to the other processing steps.
```
  9fe36088
14 Feb, 2024 1 commit

AQLM quantizer support (#28928) · 1ecf5f7c

Andrei Panferov authored Feb 14, 2024



* aqlm init

* calibration and dtypes

* docs

* Readme update

* is_aqlm_available

* Simpler link in docs

* Test TODO real reference

* init _import_structure fix

* AqlmConfig autodoc

* integration aqlm

* integrations in tests

* docstring fix

* legacy typing

* Less typings

* More kernels information

* Performance -> Accuracy

* correct tests

* remoced multi-gpu test

* Update docs/source/en/quantization.md
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/utils/quantization_config.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Brought back multi-gpu tests

* Update src/transformers/integrations/aqlm.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update tests/quantization/aqlm_integration/test_aqlm.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

---------
Co-authored-by: Andrei Panferov <blacksamorez@yandex-team.ru>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

1ecf5f7c

13 Feb, 2024 1 commit

Add sudachi_projection option to BertJapaneseTokenizer (#28503) · da20209d

Hiroshi Matsuda authored Feb 13, 2024



* add sudachi_projection option

* Upgrade sudachipy>=0.6.8

* add a test case for sudachi_projection

* Compatible with older versions of SudachiPy

* make fixup

* make style

* error message for unidic download

* revert jumanpp test cases

* format options for sudachi_projection
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* format options for sudachi_split_mode and sudachi_dict_type

* comment

* add tests for full_tokenizer kwargs

* pass projection arg directly

* require_sudachi_projection

* make style

* revert upgrade sudachipy

* check is_sudachi_projection_available()

* revert dependency_version_table and bugfix

* style format

* simply raise ImportError
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* simply raise ImportError

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

da20209d

01 Feb, 2024 1 commit
- Make `is_torch_bf16_available_on_device` more strict (#28796) · eb8e7a00
  Yih-Dar authored Feb 01, 2024
```
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  eb8e7a00
11 Jan, 2024 1 commit

Byebye torch 1.10 (#28207) · 59cd9de3

Yih-Dar authored Jan 11, 2024



* fix

* fix

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

59cd9de3

08 Jan, 2024 1 commit
- remove two deprecated function (#28220) · 7d9d5cea
  Hz, Ji authored Jan 08, 2024
  
  7d9d5cea
03 Jan, 2024 1 commit

Add FastSpeech2Conformer (#23439) · d83ff5ee

Connor Henderson authored Jan 03, 2024

* start - docs, SpeechT5 copy and rename

* add relevant code from FastSpeech2 draft, have tests pass

* make it an actual conformer, demo ex.

* matching inference with original repo, includes debug code

* refactor nn.Sequentials, start more desc. var names

* more renaming

* more renaming

* vocoder scratchwork

* matching vocoder outputs

* hifigan vocoder conversion script

* convert model script, rename some config vars

* replace postnet with speecht5's implementation

* passing common tests, file cleanup

* expand testing, add output hidden states and attention

* tokenizer + passing tokenizer tests

* variety of updates and tests

* g2p_en pckg setup

* import structure edits

* docstrings and cleanup

* repo consistency

* deps

* small cleanup

* forward signature param order

* address comments except for masks and labels

* address comments on attention_mask and labels

* address second round of comments

* remove old unneeded line

* address comments part 1

* address comments pt 2

* rename auto mapping

* fixes for failing tests

* address comments part 3 (bart-like, train loss)

* make style

* pass config where possible

* add forward method + tests to WithHifiGan model

* make style

* address arg passing and generate_speech comments

* address Arthur comments

* address Arthur comments pt2

* lint  changes

* Sanchit comment

* add g2p-en to doctest deps

* move up self.encoder

* onnx compatible tensor method

* fix is symbolic

* fix paper url

* move models to espnet org

* make style

* make fix-copies

* update docstring

* Arthur comments

* update docstring w/ new updates

* add model architecture images

* header size

* md wording update

* make style

d83ff5ee

11 Dec, 2023 1 commit
- Update import message (#27946) · 921a6bf2
  NielsRogge authored Dec 11, 2023
```
* Update import message

* Update message
```
  921a6bf2
08 Dec, 2023 1 commit

F.scaled_dot_product_attention support (#26572) · 80377eb0

fxmarty authored Dec 08, 2023



* add sdpa

* wip

* cleaning

* add ref

* yet more cleaning

* and more :)

* wip llama

* working llama

* add output_attentions=True support

* bigcode sdpa support

* fixes

* gpt-bigcode support, require torch>=2.1.1

* add falcon support

* fix conflicts falcon

* style

* fix attention_mask definition

* remove output_attentions from attnmaskconverter

* support whisper without removing any Copied from statement

* fix mbart default to eager renaming

* fix typo in falcon

* fix is_causal in SDPA

* check is_flash_attn_2_available in the models init as well in case the model is not initialized through from_pretrained

* add warnings when falling back on the manual implementation

* precise doc

* wip replace _flash_attn_enabled by config.attn_implementation

* fix typo

* add tests

* style

* add a copy.deepcopy on the config in from_pretrained, as we do not want to modify it inplace

* obey to config.attn_implementation if a config is passed in from_pretrained

* fix is_torch_sdpa_available when torch is not installed

* remove dead code

* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/bart/modeling_bart.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove duplicate pretraining_tp code

* add dropout in llama

* precise comment on attn_mask

* add fmt: off for _unmask_unattended docstring

* precise num_masks comment

* nuke pretraining_tp in LlamaSDPAAttention following Arthur's suggestion

* cleanup modeling_utils

* backward compatibility

* fix style as requested

* style

* improve documentation

* test pass

* style

* add _unmask_unattended tests

* skip meaningless tests for idefics

* hard_check SDPA requirements when specifically requested

* standardize the use if XXX_ATTENTION_CLASSES

* fix SDPA bug with mem-efficient backend on CUDA when using fp32

* fix test

* rely on SDPA is_causal parameter to handle the causal mask in some cases

* fix FALCON_ATTENTION_CLASSES

* remove _flash_attn_2_enabled occurences

* fix test

* add OPT to the list of supported flash models

* improve test

* properly test on different SDPA backends, on different dtypes & properly handle separately the pad tokens in the test

* remove remaining _flash_attn_2_enabled occurence

* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/perf_infer_gpu_one.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove use_attn_implementation

* fix docstring & slight bug

* make attn_implementation internal (_attn_implementation)

* typos

* fix tests

* deprecate use_flash_attention_2=True

* fix test

* add back llama that was removed by mistake

* fix tests

* remove _flash_attn_2_enabled occurences bis

* add check & test that passed attn_implementation is valid

* fix falcon torchscript export

* fix device of mask in tests

* add tip about torch.jit.trace and move bt doc below sdpa

* fix parameterized.expand order

* move tests from test_modeling_attn_mask_utils to test_modeling_utils as a relevant test class is already there

* update sdpaattention class with the new cache

* Update src/transformers/configuration_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/bark/modeling_bark.py

* address review comments

* WIP torch.jit.trace fix. left: test both eager & sdpa

* add test for torch.jit.trace for both eager/sdpa

* fix falcon with torch==2.0 that needs to use sdpa

* fix doc

* hopefully last fix

* fix key_value_length that has no default now in mask converter

* is it flacky?

* fix speculative decoding bug

* tests do pass

* fix following #27907

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

80377eb0

04 Dec, 2023 1 commit

Flash Attention 2 support for RoCm (#27611) · 1da1302e

fxmarty authored Dec 04, 2023



* support FA2

* fix typo

* fix broken tests

* fix more test errors

* left/right

* fix bug

* more test

* typo

* fix layout flash attention falcon

* do not support this case

* use allclose instead of equal

* fix various bugs with flash attention

* bump

* fix test

* fix mistral

* use skiptest instead of return that may be misleading

* add fix causal arg flash attention

* fix copies

* more explicit comment

* still use self.is_causal

* fix causal argument

* comment

* fixes

* update documentation

* add link

* wrong test

* simplify FA2 RoCm requirements

* update opt

* make flash_attn_uses_top_left_mask attribute private and precise comment

* better error handling

* fix copy & mistral

* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/utils/import_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* use is_flash_attn_greater_or_equal_2_10 instead of is_flash_attn_greater_or_equal_210

* fix merge

* simplify

* inline args

---------
Co-authored-by: Felix Marty <felix@hf.co>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

1da1302e

24 Nov, 2023 1 commit

Refactoring Trainer, adds `save_only_model` arg and simplifying FSDP integration (#27652) · a761d6e9

Sourab Mangrulkar authored Nov 24, 2023



* add code changes

1. Refactor FSDP
2. Add `--save_only_model` option: When checkpointing, whether to only save the model, or also the optimizer, scheduler & rng state.
3. Bump up the minimum `accelerate` version to `0.21.0`

* quality

* fix quality?

* Revert "fix quality?"

This reverts commit 149330a6abc078827be274db84c8a2d26a76eba1.

* fix fsdp doc strings

* fix quality

* Update src/transformers/training_args.py
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* please fix the quality issue 😅



* Apply suggestions from code review
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* address comment

* simplify conditional check as per the comment

* update documentation

---------
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

a761d6e9

14 Nov, 2023 1 commit
- Fix FA2 import + deprecation cycle (#27330) · 721d1c8c
  Marc Sun authored Nov 14, 2023
```
* put back import

* switch to logger.warnings instead
```
  721d1c8c
02 Nov, 2023 1 commit

support bf16 (#25879) · 7adaefe2

Roohollah Etemadi authored Nov 02, 2023

* added bf16 support

* added cuda availability check

* applied make style, quality

7adaefe2

01 Nov, 2023 1 commit

[`core` / `Quantization` ] AWQ integration (#27045) · ae093eef

Younes Belkada authored Nov 01, 2023



* working v1

* oops

* Update src/transformers/modeling_utils.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* fixup

* oops

* push

* more changes

* add docs

* some fixes

* fix copies

* add v1 doc

* added installation guide

* relax constraints

* revert

* attempt llm-awq

* oops

* oops

* fixup

* raise error when incorrect cuda compute capability

* nit

* add instructions for llm-awq

* fixup

* fix copies

* fixup and docs

* change

* few changes + add demo

* add v1 tests

* add autoawq in dockerfile

* finalize

* Update tests/quantization/autoawq/test_awq.py

* fix test

* fix

* fix issue

* Update src/transformers/integrations/awq.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/main_classes/quantization.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/main_classes/quantization.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/integrations/awq.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/integrations/awq.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add link to example script

* Update docs/source/en/main_classes/quantization.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add more content

* add more details

* add link to quantization docs

* camel case + change backend class name

* change to string

* fixup

* raise errors if libs not installed

* change to `bits` and `group_size`

* nit

* nit

* Apply suggestions from code review
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* disable training

* address some comments and fix nits

* fix

* final nits and fix tests

* adapt to our new runners

* make fix-copies

* Update src/transformers/utils/quantization_config.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/utils/quantization_config.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/integrations/awq.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/integrations/awq.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* move to top

* add conversion test

* final nit

* add more elaborated test

---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

ae093eef

26 Oct, 2023 1 commit
- Bump`flash_attn` version to `2.1` (#27079) · efba1a17
  Younes Belkada authored Oct 26, 2023
```
* pin FA-2 to `2.1`

* fix on modeling
```
  efba1a17
24 Oct, 2023 1 commit

Device agnostic testing (#25870) · 9da45171

Alex McKinney authored Oct 24, 2023



* adds agnostic decorators and availability fns

* renaming decorators and fixing imports

* updating some representative example tests
bloom, opt, and reformer for now

* wip device agnostic functions

* lru cache to device checking functions

* adds `TRANSFORMERS_TEST_DEVICE_SPEC`
if present, imports the target file and updates device to function
mappings

* comments `TRANSFORMERS_TEST_DEVICE_SPEC` code

* extra checks on device name

* `make style; make quality`

* updates default functions for agnostic calls

* applies suggestions from review

* adds `is_torch_available` guard

* Add spec file to docs, rename function dispatch names to backend_*

* add backend import to docs example for spec file

* change instances of  to

* Move register backend to before device check as per @statelesshz changes

* make style

* make opt test require fp16 to run

---------
Co-authored-by: arsalanu <arsalanu@graphcore.ai>
Co-authored-by: arsalanu <hzji210@gmail.com>

9da45171

17 Oct, 2023 1 commit
- Fix TensorFlow pakage check (#26842) · ef42cb62
  jayfurmanek authored Oct 17, 2023
```
Add tf-nightly-rocm to _is_tf_available check
```
  ef42cb62
13 Oct, 2023 1 commit
- [`core`] Fix fa-2 import (#26785) · 6df9179c
  Younes Belkada authored Oct 13, 2023
```
* fix fa-2 import

* nit
```
  6df9179c
11 Oct, 2023 1 commit

Revert #20715 (#26734) · e58cbed5

Yih-Dar authored Oct 11, 2023



* fix

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

e58cbed5

26 Sep, 2023 1 commit

Add Nougat (#25942) · ace74d16

NielsRogge authored Sep 26, 2023



* Add conversion script

* Add NougatImageProcessor

* Add crop margin

* More improvements

* Add docs, READMEs

* Remove print statements

* Include model_max_length

* Add NougatTokenizerFast

* Fix imports

* Improve postprocessing

* Improve image processor

* Fix image processor

* Improve normalize method

* More improvements

* More improvements

* Add processor, improve docs

* Simplify fast tokenizer

* Remove test file

* Fix docstrings

* Use NougatProcessor in conversion script

* Add is_levensthein_available

* Add tokenizer tests

* More improvements

* Use numpy instead of opencv

* Add is_cv2_available

* Fix cv2_available

* Add is_nltk_available

* Add image processor tests, improve crop_margin

* Add integration tests

* Improve integration test

* Use do_rescale instead of hacks, thanks Amy

* Remove random_padding

* Address comments

* Address more comments

* Add import

* Address more comments

* Address more comments

* Address comment

* Address comment

* Set max_model_input_sizes

* Add tests

* Add requires_backends

* Add Nougat to exotic tests

* Use to_pil_image

* Address comment regarding nltk

* Add NLTK

* Improve variable names, integration test

* Add test

* refactor, document, and test regexes

* remove named capture groups, add comments

* format

* add non-markdown fixed tokenization

* format

* correct flakyness of args parse

* add regex comments

* test functionalities for crop_image, align long axis and expected output

* add regex tests

* remove cv2 dependency

* test crop_margin equality between cv2 and python

* refactor table regexes to markdown

add newline

* change print to log, improve doc

* fix high count tables correction

* address PR comments: naming, linting, asserts

* Address comments

* Add copied from

* Update conversion script

* Update conversion script to convert both small and base versions

* Add inference example

* Add more info

* Fix style

* Add require annotators to test

* Define all keyword arguments explicitly

* Move cv2 annotator

* Add tokenizer init method

* Transfer checkpoints

* Add reference to Donut

* Address comments

* Skip test

* Remove cv2 method

* Add copied from statements

* Use cached_property

* Fix docstring

* Add file to not doctested

---------
Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com>

ace74d16

22 Sep, 2023 1 commit

[`core` ] Integrate Flash attention 2 in most used models (#25598) · 368a58e6

Younes Belkada authored Sep 22, 2023



* v1

* oops

* working v1

* fixup

* add some TODOs

* fixup

* padding support + try with module replacement

* nit

* alternative design

* oops

* add `use_cache` support for llama

* v1 falcon

* nit

* a bit of refactor

* nit

* nits nits

* add v1 padding support falcon (even though it seemed to work before)

* nit

* falcon works

* fixup

* v1 tests

* nit

* fix generation llama flash

* update tests

* fix tests + nits

* fix copies

* fix nit

* test- padding mask

* stype

* add more mem efficient support

* Update src/transformers/modeling_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fixup

* nit

* fixup

* remove it from config when saving

* fixup

* revert docstring

* add more checks

* use values

* oops

* new version

* fixup

* add same trick for falcon

* nit

* add another test

* change tests

* fix issues with GC and also falcon

* fixup

* oops

* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add init_rope

* updates

* fix copies

* fixup

* fixup

* more clarification

* fixup

* right padding tests

* add docs

* add FA in docker image

* more clarifications

* add some figures

* add todo

* rectify comment

* Change to FA2

* Update docs/source/en/perf_infer_gpu_one.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* split in two lines

* change test name

* add more tests

* some clean up

* remove `rearrange` deps

* add more docs

* revert changes on dockerfile

* Revert "revert changes on dockerfile"

This reverts commit 8d72a66b4b9b771abc3f15a9b9506b4246d62d8e.

* revert changes on dockerfile

* Apply suggestions from code review
Co-authored-by: Lysandre Debut <hi@lysand.re>

* address some comments

* docs

* use inheritance

* Update src/transformers/testing_utils.py
Co-authored-by: Lysandre Debut <hi@lysand.re>

* fixup

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

* final comments

* clean up

* style

* add cast + warning for PEFT models

* fixup

---------
Co-authored-by: Felix Marty <9808326+fxmarty@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>

368a58e6

20 Sep, 2023 2 commits

fix name error when accelerate is not available (#26278) · 86ffd5ff
Sourab Mangrulkar authored Sep 20, 2023
```
* fix name error when accelerate is not available

* fix `is_fsdp_available`
```
86ffd5ff

FSDP tests and checkpointing fixes (#26180) · 382ba670

Sourab Mangrulkar authored Sep 20, 2023



* add fsdp tests

* Update test_fsdp.py

* Update test_fsdp.py

* fixes

* checks

* Update trainer.py

* fix

* fixes for saving/resuming checkpoints

* fixes

* add tests and delete debug statements

* fixing tests

* Update test_fsdp.py

* fix tests

* fix tests

* minor nits

* fix code style and quality

* refactor and modularize test code

* reduce the time of tests

* reduce the test time

* fix test

* reduce test time

* reduce test time

* fix failing tests

* fix

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* resolve comments

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

382ba670

14 Sep, 2023 1 commit

Overhaul Conversation class and prompt templating (#25323) · 866df66f

Matt authored Sep 14, 2023



* First commit while I figure this out

* make fixup

* Remove unused method

* Store prompt attrib

* Fix prompt argument for tests

* Make same changes in fast tokenizer

* Remove global prompts from fast tokenizer too

* stash commit

* stash commit

* Migrate PromptConfig to its True Final Location

* Replace Conversation entirely with the new class

* Import/dependency fixes

* Import/dependency fixes

* Change format for lots of default prompts

* More default prompt fixups

* Revert llama old methods so we can compare

* Fix some default configs

* Fix some default configs

* Fix misspelled kwarg

* Fixes for Blenderbot

* make fixup

* little rebase cleanup

* Add basic documentation

* Quick doc fix

* Truncate docstring for now

* Add handling for the case when messages is a single string

* Quick llama merges

* Update conversational pipeline and tests

* Add a couple of legacy properties for backward compatibility

* More legacy handling

* Add docstring for build_conversation_input_ids

* Restructure PromptConfig

* Let's start T E M P L A T I N G

* Refactor all default configs to use templates instead

* Revert changes to the special token properties since we don't need them anymore

* More class templates

* Make the sandbox even sandier

* Everything replaced with pure templating

* Remove docs for PromptConfig

* Add testing and optional requirement boilerplate

* Fix imports and make fixup

* Fix LLaMA tests and add Conversation docstring

* Finally get LLaMA working with the template system

* Finally get LLaMA working with the template system

* make fixup

* make fixup

* fmt-off for the long lists of test tokens

* Rename method to apply_chat_template for now

* Start on documentation

* Make chat_template a property that reads through to the default if it's not set

* Expand docs

* Expand chat templating doc some more

* trim/lstrip blocks by default and update doc

* Few doc tweaks

* rebase cleanup

* Clarify docstring

* rebase cleanup

* rebase cleanup

* make fixup

* Quick doc edit

* Reformat the standard template to match ChatML

* Re-add PEFT check

* Update docs/source/en/chat_templating.md
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Add apply_chat_template to the tokenizer doc

* make fixup

* Add doc links

* Fix chat links

* Fix chat links

* Explain system messages in the doc

* Add chat template test

* Proper save-loading for chat template attribute

* Add test skips for layout models

* Remove _build_conversation_input_ids, add default_chat_template to code_llama

* Make sure all LLaMA models are using the latest template

* Remove default_system_prompt block in code_llama because it has no default prompt

* Update ConversationPipeline preprocess

* Add correct #Copied from links to the default_chat_templates

* Remove unneeded type checking line

* Add a dummy mark_processsed method

* Reorganize Conversation to have **deprecated_kwargs

* Update chat_templating.md

* Quick fix to LLAMA tests

* Small doc tweaks

* Add proper docstrings and "copied from" statements to all default chat templates

* Merge use_default_system_prompt support for code_llama too

* Improve clarity around self.chat_template

* Docstring fix

* Fix blenderbot default template

* More doctest fix

* Break out some tokenizer kwargs

* Update doc to explain default templates

* Quick tweaks to tokenizer args

* Cleanups for tokenizer args

* Add note about cacheing

* Quick tweak to the chat-templating doc

* Update the LLaMA template with error checking and correct system message embedding

* make fixup

* make fixup

* add requires_jinja

* Cleanup to expected output formatting

* Add cacheing

* Fix typo in llama default template

* Update LLaMA tests

* Update documentation

* Improved legacy handling in the Conversation class

* Update Jinja template with proper error handling

* Quick bugfix

* Proper exception raising

* Change cacheing behaviour so it doesn't try to pickle an entire Jinja env

* make fixup

* rebase cleanup

---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

866df66f

08 Sep, 2023 1 commit

Skip warning if tracing with dynamo (#25581) · 6c26faa1

Angela Yi authored Sep 08, 2023

* Ignore warning if tracing with dynamo

* fix import error

* separate to function

* add test

6c26faa1

05 Sep, 2023 1 commit

Patch with accelerate xpu (#25714) · 70a98024

Abhilash Majumder authored Sep 05, 2023

* patch with accelerate xpu

* patch with accelerate xpu

* formatting

* fix tests

* revert ruff unrelated fixes

* revert ruff unrelated fixes

* revert ruff unrelated fixes

* fix test

* review fixes

* review fixes

* black fixed

* review commits

* review commits

* style fix

* use pytorch_utils

* revert markuplm test

70a98024