Commits · 4c2538b863d8949a98d6b8dc1dea9ed4cf96a5df · chenpangpang / transformers

"vscode:/vscode.git/clone" did not exist on "de627f5a14a777d5cbf21edb78ed67ceb900f8dd"

09 Jul, 2024 1 commit
- Test loading generation config with safetensor weights (#31550) · 4c2538b8
  Joao Gante authored Jul 09, 2024
```
fix test
```
  4c2538b8
05 Jul, 2024 1 commit
- Fix serialization for offloaded model (#31727) · 8c5c180d
  Marc Sun authored Jul 05, 2024
```
* Fix serialization

* style

* add test
```
  8c5c180d
02 Jul, 2024 1 commit

Move some test files (`tets/test_xxx_utils.py`) to `tests/utils` (#31730) · 93cd94b7

Yih-Dar authored Jul 02, 2024



* move

* move

* move

* move

* Update tests/utils/test_image_processing_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

93cd94b7

27 Jun, 2024 1 commit
- [QoL] Allow dtype str for torch_dtype arg of from_pretrained (#31590) · 3a028101
  Billy Cao authored Jun 27, 2024
```
* Allow dtype str for torch_dtype in from_pretrained

* Update docstring

* Add tests for str torch_dtype
```
  3a028101
26 Jun, 2024 1 commit

Skip tests properly (#31308) · 1de7dc74

amyeroberts authored Jun 26, 2024

* Skip tests properly

* [test_all]

* Add 'reason' as kwarg for skipTest

* [test_all] Fix up

* [test_all]

1de7dc74

12 Jun, 2024 1 commit
- Use huggingface_hub helper function to split state dict (#31091) · 254b25ab
  Marc Sun authored Jun 12, 2024
```
* shard saving from hf hub

* index = None

* fix tests

* indent
```
  254b25ab
07 Jun, 2024 1 commit

Extend save_pretrained to offloaded models (#27412) · ff689f57

Benjamin Badger authored Jun 07, 2024



* added hidden subset

* debugged hidden subset contrastive search

* added contrastive search compression

* debugged compressed contrastive search

* memory reduction for contrastive search

* debugged mem red

* added low memory option feature

* debugged mem optmimization output stack

* debugged mem optmimization output stack

* debugged low mem

* added low mem cache

* fixed 2047 tensor view

* debugged 2042 past key val inputs

* reformatted tensors

* changed low mem output

* final clean

* removed subset hidden csearch

* fixed hidden device

* fixed hidden device

* changed compressor dtype

* removed hstate compression

* integrated csearch in generate

* test csearch integration into generation

exit()

* fixed csearch kwarg integration with generation

* final wrap and added doc

* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* added debug print

* direct hstate cat

* direct hstate cat

* direct hstate cat debug

* direct hstate cat debug

* expanded full hidden state stack

* expanded full hidden state stack

* matched dims for hstates

* matched dims for hstates

* logits fix

* equality test

* equality hidden debug

* debug

* added prints for debug

* added prints for debug

* equality check

* switched squeeze dim

* input format debug

* tracing top_k_ids

* removed trace

* added test context

* added jitter

* added jitter

* added jitter

* returned state

* rebuilt past key value reconstruction

* debugged

* cleaned traces

* added selection for pkv

* changed output to dict

* cleaned

* cleaned

* cleaned up contrastive search test

* moved low_memory kwarg

* debugged

* changed low mem test batch size to 1

* removed output

* debugged test input shape

* reformatted csearch test

* added trace

* removed unsqueeze on final forward pass

* replaced unsqueeze with view

* removed traces

* cleaned

* debugged model kwargs

* removed special models from test

* ran make quality

* Update src/transformers/generation/configuration_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/configuration_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* refactored

* refactored

* refactored

* make fixup

* renamed flag sequential

* renamed flag sequential

* iterative onloading

* black style and test utils

* added traces for integrated test

* debugged

* added traces

* make style

* removed traces, make style

* included suggestions and added test

* debugged test

* added offload module check and make style

* is_accelerate_available and make style

* added test decorator

* changed test model and config spec

* added offload condition

* added lazy loading for each shard

* debugged

* modified sharding

* debugged

* added traces

* removed safe serialization

* no index overload;

* trace on safe save ptrs

* added ptr condition

* debugged

* debugged ptr

* moved module map init

* remake shard only for offloaded modules

* refactored

* debugged

* refactored

* debugged

* cleaned and make style

* cleaned and make style

* added trace

* sparse module map

* debugged

* removed module map conditional

* refactored

* debug

* debugged

* added traces

* added shard mem trace

* added shard mem trace

* removed underlying storage check

* refactored

* memory leak removal and make style

* cleaned

* swapped test decs and make style

* added mem checks and make style

* added free mem warning

* implemented some suggestions

* moved onloading to accelerate

* refactored for accelerate integration

* cleaned test

* make style

* debugged offload map name

* cleaned and make style

* replaced meta device check for sharding

* cleaned and make style

* implemented some suggestions

* more suggestions

* update warning
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* more suggestions

* make style

* new make style

* Update src/transformers/modeling_utils.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/modeling_utils.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/modeling_utils.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

ff689f57

28 May, 2024 1 commit

fix from_pretrained in offline mode when model is preloaded in cache (#31010) · 936ab7ba

oOraph authored May 28, 2024



* Unit test to verify fix
Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com>

* fix from_pretrained in offline mode when model is preloaded in cache
Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com>

* minor: fmt
Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com>

---------
Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com>
Co-authored-by: Raphael Glon <oOraph@users.noreply.github.com>

936ab7ba

13 May, 2024 1 commit

Llama: fix custom 4D masks, v2 (#30348) · a0779b9e

Poedator authored May 13, 2024



* 4d mask fixes

* Update custom 4D mask logic

* test moved to mixin

* extra tests 4d mask

* upd 4d mask and StaticCache handling

* added Mask4DTestHard to mistral tests

* post-rebase fixes

* test fixes for StaticCache

* make fix-copies

* upd 1 after #30476

* fix common tests

* rm elif attention_mask.dim() == 4:

* tests combined, fixed, mixtral supported

* bigbird style chg reverted

* rm if attention_mask.dim() == 2

* modeling_llama formatting chg

---------
Co-authored-by: Joao Gante <joao@huggingface.co>

a0779b9e

07 May, 2024 1 commit
- Add safetensors to model not found error msg for default use_safetensors value (#30602) · cf7bed98
  David Xue authored May 07, 2024
```
* add safetensors to model not found error for default use_safetensors=None case

* format code w/ ruff

* fix assert true typo
```
  cf7bed98
26 Apr, 2024 1 commit
- Update `dtype_byte_size` to handle torch.float8_e4m3fn/float8_e5m2 types (#30488) · 20081c74
  Michael Goin authored Apr 26, 2024
```
* Update modeling_utils/dtype_byte_size to handle float8 types

* Add a test for dtype_byte_size

* Format

* Fix bool
```
  20081c74
24 Apr, 2024 1 commit
- [tests] make test device-agnostic (#30444) · 16c8e176
  Fanli Lin authored Apr 24, 2024
```
* make device-agnostic

* clean code
```
  16c8e176
19 Apr, 2024 1 commit
- Fix config + attn_implementation in AutoModelForCausalLM.from_pretrained (#30299) · 21c912e7
  hoshi-hiyouga authored Apr 20, 2024
```
* Update modeling_utils.py

* Update test_modeling_utils.py

* Update test_modeling_utils.py

* Update test_modeling_utils.py
```
  21c912e7
10 Apr, 2024 1 commit
- [tests] make 2 tests device-agnostic (#30008) · 18546378
  Fanli Lin authored Apr 10, 2024
```
add torch device
```
  18546378
02 Apr, 2024 1 commit

Hard error when ignoring tensors. (#27484) (#29906) · 9b0a8ea7

Nicolas Patry authored Apr 02, 2024



* Hard error when ignoring tensors. (#27484)

* [WIP] Hard error when ignoring tensors.

* Better selection/error when saving a checkpoint.

- Find all names we should normally drop (those are in the transformers
  config)
- Find all disjoint tensors (for those we can safely trigger a copy to
  get rid of the sharing before saving)
- Clone those disjoint tensors getting rid of the issue
- Find all identical names (those should be declared in the config
  but we try to find them all anyway.)
- For all identical names:
  - If they are in the config, just ignore them everything is fine
  - If they are not, warn about them.
- For all remainder tensors which are shared yet neither identical NOR
  disjoint. raise a hard error.

* Adding a failing test on `main` that passes here.

* We don't need to keep the subfolder logic in this test.

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Add small tests.

* Dead variable.

* Fixup.

* Fixing tied_Weights_keys on generic models.

* Fixup + T5 encoder/decoder tying (with different layers)

* Code quality.

* Dynamic member.

* trigger

* Fixing encoder name for other types of encoder/decoder combos.

* Fix scoping.

* Update .github/workflows/self-scheduled.yml
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Fixing the tied_weights after the call.

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

9b0a8ea7

27 Mar, 2024 1 commit

Reimplement "Automatic safetensors conversion when lacking these files" (#29846) · 4d8427f7

Lysandre Debut authored Mar 27, 2024

* Automatic safetensors conversion when lacking these files (#29390)

* Automatic safetensors conversion when lacking these files

* Remove debug

* Thread name

* Typo

* Ensure that raises do not affect the main thread

* Catch all errors

4d8427f7

25 Mar, 2024 1 commit

Remove static pretrained maps from the library's internals (#29112) · 39114c03

Lysandre Debut authored Mar 25, 2024



* [test_all] Remove static pretrained maps from the library's internals

* Deprecate archive maps instead of removing them

* Revert init changes

* [test_all] Deprecate instead of removing

* [test_all] PVT v2 support

* [test_all] Tests should all pass

* [test_all] Style

* Address review comments

* Update src/transformers/models/deprecated/_archive_maps.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/deprecated/_archive_maps.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* [test_all] trigger tests

* [test_all] LLAVA

* [test_all] Bad rebase

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

39114c03

19 Mar, 2024 1 commit

Llama: partial 4d masks (#29731) · 4294f0c3

Joao Gante authored Mar 19, 2024



* partial 4d masks

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

4294f0c3

15 Mar, 2024 1 commit
- [tests] remove deprecated tests for model loading (#29450) · c1993e68
  Fanli Lin authored Mar 15, 2024
```
* gix

* fix style

* remove equivalent tests

* add back for image_processor

* remove again
```
  c1993e68
13 Mar, 2024 1 commit
- Llama: allow custom 4d masks (#29618) · 1e21c4fb
  Joao Gante authored Mar 13, 2024
  
  1e21c4fb
11 Mar, 2024 1 commit

Experimental loading of MLX files (#29511) · b382a09e

Pedro Cuenca authored Mar 11, 2024

* Experimental loading of MLX files

* Update exception message

* Add test

* Style

* Use model from hf-internal-testing

b382a09e

08 Mar, 2024 1 commit
- Make sliding window size inclusive in eager attention (#29519) · 608fa549
  Jonatan Kłosko authored Mar 08, 2024
```
* Make sliding window size inclusive in eager attention

* Fix tests
```
  608fa549
07 Mar, 2024 2 commits
- test_generation_config_is_loaded_with_model - fall back to pytorch model for now (#29521) · 4ed9ae62
  amyeroberts authored Mar 07, 2024
```
* Fall back to pytorch model for now

* Fix up
```
  4ed9ae62
- Revert "Automatic safetensors conversion when lacking these files (#2… (#29507) · f6133d76
  Lysandre Debut authored Mar 07, 2024
```
Revert "Automatic safetensors conversion when lacking these files (#29390)"

This reverts commit a69cbf4e.
```
  f6133d76
06 Mar, 2024 1 commit
- [FIX] `offload_weight()` takes from 3 to 4 positional arguments but 5 were given (#29457) · 00bf4427
  Fanli Lin authored Mar 06, 2024
```
* use require_torch_gpu

* enable on XPU

* fix
```
  00bf4427
05 Mar, 2024 1 commit

Automatic safetensors conversion when lacking these files (#29390) · a69cbf4e

Lysandre Debut authored Mar 05, 2024

* Automatic safetensors conversion when lacking these files

* Remove debug

* Thread name

* Typo

* Ensure that raises do not affect the main thread

a69cbf4e

16 Feb, 2024 1 commit
- Update all references to canonical models (#29001) · f497f564
  Lysandre Debut authored Feb 16, 2024
```
* Script & Manual edition

* Update
```
  f497f564
06 Feb, 2024 1 commit
- Revert "[WIP] Hard error when ignoring tensors." (#28898) · 76b4f666
  Yih-Dar authored Feb 06, 2024
```
Revert "[WIP] Hard error when ignoring tensors. (#27484)"

This reverts commit 2da28c4b.
```
  76b4f666
05 Feb, 2024 1 commit

[WIP] Hard error when ignoring tensors. (#27484) · 2da28c4b

Nicolas Patry authored Feb 05, 2024



* [WIP] Hard error when ignoring tensors.

* Better selection/error when saving a checkpoint.

- Find all names we should normally drop (those are in the transformers
  config)
- Find all disjoint tensors (for those we can safely trigger a copy to
  get rid of the sharing before saving)
- Clone those disjoint tensors getting rid of the issue
- Find all identical names (those should be declared in the config
  but we try to find them all anyway.)
- For all identical names:
  - If they are in the config, just ignore them everything is fine
  - If they are not, warn about them.
- For all remainder tensors which are shared yet neither identical NOR
  disjoint. raise a hard error.

* Adding a failing test on `main` that passes here.

* We don't need to keep the subfolder logic in this test.

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

2da28c4b

02 Feb, 2024 1 commit

Add missing None check for hf_quantizer (#28804) · ec29d25d

Juri Ganitkevitch authored Feb 02, 2024



* Add missing None check for hf_quantizer

* Add test, fix logic.

* make style

* Switch test model to Mistral

* Comment

* Update tests/test_modeling_utils.py

---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

ec29d25d

18 Jan, 2024 1 commit

Use `LoggingLevel` context manager in 3 tests (#28575) · 0754217c

Yih-Dar authored Jan 18, 2024



* inside with LoggingLevel

* remove is_flaky

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

0754217c

16 Jan, 2024 2 commits
- Config: warning when saving generation kwargs in the model config (#28514) · f4f57f9d
  Joao Gante authored Jan 16, 2024
  
  f4f57f9d
- Fix mismatching loading in from_pretrained with/without accelerate (#28414) · 66db33dd
  fxmarty authored Jan 16, 2024
```
* fix mismatching behavior in from_pretrained with/without accelerate

* meaningful refactor

* remove added space

* add test

* fix model on the hub

* comment

* use tiny model

* style
```
  66db33dd
15 Jan, 2024 1 commit

[`core`/ FEAT] Add the possibility to push custom tags using `PreTrainedModel` itself (#28405) · 1b9a2e4c

Younes Belkada authored Jan 15, 2024



* v1 tags

* remove unneeded conversion

* v2

* rm unneeded warning

* add more utility methods

* Update src/transformers/utils/hub.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/utils/hub.py
Co-authored-by: Lucain <lucainp@gmail.com>

* Update src/transformers/utils/hub.py
Co-authored-by: Lucain <lucainp@gmail.com>

* more enhancements

* oops

* merge tags

* clean up

* revert unneeded change

* add extensive docs

* more docs

* more kwargs

* add test

* oops

* fix test

* Update src/transformers/modeling_utils.py
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>

* Update src/transformers/utils/hub.py
Co-authored-by: Lucain <lucainp@gmail.com>

* Update src/transformers/modeling_utils.py

* Update src/transformers/trainer.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add more conditions

* more logic

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Lucain <lucainp@gmail.com>
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>

1b9a2e4c

12 Jan, 2024 1 commit
- Mark two logger tests as flaky (#28458) · 4e36a6cd
  amyeroberts authored Jan 12, 2024
```
* Mark two logger tests as flaky

* Add description to is_flaky
```
  4e36a6cd
17 Dec, 2023 1 commit

4D `attention_mask` support (#27539) · f85a1e82

Poedator authored Dec 17, 2023



* edits to _prepare_4d_causal_attention_mask()

* initial tests for 4d mask

* attention_mask_for_sdpa support

* added test for inner model hidden

* added autotest decorators

* test mask dtype to torch.int64

* torch.testing.assert_close
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* torch_device and @torch_gpu in tests

* upd tests

* +torch decorators

* torch decorators fixed

* more decorators!

* even more decorators

* fewer decorators

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

f85a1e82

15 Dec, 2023 1 commit

[`FA-2`] Fix fa-2 issue when passing `config` to `from_pretrained` (#28043) · 1e209317

Younes Belkada authored Dec 15, 2023



* fix fa-2 issue

* fix test

* Update src/transformers/modeling_utils.py
Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>

* clenaer fix

* up

* add more robust tests

* Update src/transformers/modeling_utils.py
Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>

* fixup

* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* pop

* add test

---------
Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

1e209317

08 Dec, 2023 2 commits

F.scaled_dot_product_attention support (#26572) · 80377eb0

fxmarty authored Dec 08, 2023



* add sdpa

* wip

* cleaning

* add ref

* yet more cleaning

* and more :)

* wip llama

* working llama

* add output_attentions=True support

* bigcode sdpa support

* fixes

* gpt-bigcode support, require torch>=2.1.1

* add falcon support

* fix conflicts falcon

* style

* fix attention_mask definition

* remove output_attentions from attnmaskconverter

* support whisper without removing any Copied from statement

* fix mbart default to eager renaming

* fix typo in falcon

* fix is_causal in SDPA

* check is_flash_attn_2_available in the models init as well in case the model is not initialized through from_pretrained

* add warnings when falling back on the manual implementation

* precise doc

* wip replace _flash_attn_enabled by config.attn_implementation

* fix typo

* add tests

* style

* add a copy.deepcopy on the config in from_pretrained, as we do not want to modify it inplace

* obey to config.attn_implementation if a config is passed in from_pretrained

* fix is_torch_sdpa_available when torch is not installed

* remove dead code

* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/bart/modeling_bart.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove duplicate pretraining_tp code

* add dropout in llama

* precise comment on attn_mask

* add fmt: off for _unmask_unattended docstring

* precise num_masks comment

* nuke pretraining_tp in LlamaSDPAAttention following Arthur's suggestion

* cleanup modeling_utils

* backward compatibility

* fix style as requested

* style

* improve documentation

* test pass

* style

* add _unmask_unattended tests

* skip meaningless tests for idefics

* hard_check SDPA requirements when specifically requested

* standardize the use if XXX_ATTENTION_CLASSES

* fix SDPA bug with mem-efficient backend on CUDA when using fp32

* fix test

* rely on SDPA is_causal parameter to handle the causal mask in some cases

* fix FALCON_ATTENTION_CLASSES

* remove _flash_attn_2_enabled occurences

* fix test

* add OPT to the list of supported flash models

* improve test

* properly test on different SDPA backends, on different dtypes & properly handle separately the pad tokens in the test

* remove remaining _flash_attn_2_enabled occurence

* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/perf_infer_gpu_one.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove use_attn_implementation

* fix docstring & slight bug

* make attn_implementation internal (_attn_implementation)

* typos

* fix tests

* deprecate use_flash_attention_2=True

* fix test

* add back llama that was removed by mistake

* fix tests

* remove _flash_attn_2_enabled occurences bis

* add check & test that passed attn_implementation is valid

* fix falcon torchscript export

* fix device of mask in tests

* add tip about torch.jit.trace and move bt doc below sdpa

* fix parameterized.expand order

* move tests from test_modeling_attn_mask_utils to test_modeling_utils as a relevant test class is already there

* update sdpaattention class with the new cache

* Update src/transformers/configuration_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/bark/modeling_bark.py

* address review comments

* WIP torch.jit.trace fix. left: test both eager & sdpa

* add test for torch.jit.trace for both eager/sdpa

* fix falcon with torch==2.0 that needs to use sdpa

* fix doc

* hopefully last fix

* fix key_value_length that has no default now in mask converter

* is it flacky?

* fix speculative decoding bug

* tests do pass

* fix following #27907

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

80377eb0

[

⚠

️ removed a default argument] Make `AttentionMaskConverter` compatible with... · 307a7d0b

fxmarty authored Dec 08, 2023

[⚠️ removed a default argument] Make `AttentionMaskConverter` compatible with `torch.compile(..., fullgraph=True)` (#27868)

* remove bugged torch.float32 default

* add test

* fix tests

* fix test

* fix doc

307a7d0b

04 Dec, 2023 1 commit
- [`ModelOnTheFlyConversionTester`] Mark as slow for now (#27823) · c0b9db09
  Arthur authored Dec 04, 2023
```
* mark test as slow for now

* style
```
  c0b9db09