Commits · be3fd8a262fb1bfdbe2aaf1b00ab78e243632cba · chenpangpang / transformers

13 Mar, 2024 1 commit

[Flash Attention 2] Add flash attention 2 for GPT-J (#28295) · be3fd8a2

bytebarde authored Mar 13, 2024



* initial implementation of flash attention for gptj

* modify flash attention and overwrite test_flash_attn_2_generate_padding_right

* update flash attention support list

* remove the copy line in the `CodeGenBlock`

* address copy mechanism

* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Add GPTJ attention classes

* add expected outputs in the gptj test

* Ensure repo consistency with 'make fix-copies'

---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

be3fd8a2

31 Jan, 2024 1 commit
- DeepSpeed: hardcode `torch.arange` dtype on `float` usage to avoid incorrect... · beb2a096
  Joao Gante authored Jan 31, 2024
```
DeepSpeed: hardcode `torch.arange` dtype on `float` usage to avoid incorrect initialization (#28760)
```
  beb2a096
22 Dec, 2023 1 commit

Fix ONNX export for causal LM sequence classifiers by removing reverse indexing (#28144) · 548a8f61

Dean Wyatte authored Dec 22, 2023

* normalize reverse indexing for causal lm sequence classifiers

* normalize reverse indexing for causal lm sequence classifiers

* normalize reverse indexing for causal lm sequence classifiers

* use modulo instead

* unify modulo-based sequence lengths

548a8f61

16 Nov, 2023 1 commit
- Support ONNX export for causal LM sequence classifiers (#27450) · 1394e08c
  Dean Wyatte authored Nov 16, 2023
```
support onnx for causal lm sequence classification
```
  1394e08c
27 Oct, 2023 1 commit

[`core`/ `gradient_checkpointing`] Refactor GC - part 2 (#27073) · ffff9e70

Younes Belkada authored Oct 27, 2023



* fix

* more fixes

* fix other models

* fix long t5

* use `gradient_checkpointing_func` instead

* fix copies

* set `gradient_checkpointing_func` as a private attribute and retrieve previous behaviour

* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* replace it with `is_gradient_checkpointing_set`

* remove default

* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixup

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

ffff9e70

25 Oct, 2023 1 commit

[`core`] Refactor of `gradient_checkpointing` (#27020) · 06e782da

Younes Belkada authored Oct 25, 2023

* v1

* fix

* remove `create_custom_forward`

* fixup

* fixup

* add test and fix all failing GC tests

* remove all remaining `create_custom_forward` methods

* fix idefics bug

* fixup

* replace with `__call__`

* add comment

* quality

06e782da

24 Oct, 2023 1 commit
- Fix key dtype in GPTJ and CodeGen (#26836) · ede051f1
  fxmarty authored Oct 24, 2023
```
* fix key dtype in gptj and codegen

* delay the key cast to a later point

* fix
```
  ede051f1
11 Oct, 2023 1 commit

In assisted decoding, pass model_kwargs to model's forward call (fix... · dcc49d8a

Billy Bradley authored Oct 11, 2023

In assisted decoding, pass model_kwargs to model's forward call (fix prepare_input_for_generation in all models) (#25242)

* In assisted decoding, pass model_kwargs to model's forward call

Previously, assisted decoding would ignore any additional kwargs
that it doesn't explicitly handle. This was inconsistent with other
generation methods, which pass the model_kwargs through
prepare_inputs_for_generation and forward the returned dict to the
model's forward call.

The prepare_inputs_for_generation method needs to be amended in all
models, as previously it only kept the last input ID when a past_key_values
was passed.

* Improve variable names in _extend_attention_mask

* Refactor extending token_type_ids into a function

* Replace deepcopy with copy to optimize performance

* Update new persimmon model with llama changes for assisted generation

* Update new mistral model for assisted generation with prepare_inputs_for_generation

* Update position_ids creation in falcon prepare_inputs_for_generation to support assisted generation

dcc49d8a

06 Oct, 2023 1 commit

Remove unnecessary `view`s of `position_ids` (#26059) · 8878eb1b

Ramiro Leal-Cavazos authored Oct 06, 2023

* Remove unnecessary `view` of `position_ids` in `modeling_llama`

When `position_ids` is `None`, its value is generated using
`torch.arange`, which creates a tensor of size `(seq_length +
past_key_values_length) - past_key_values_length = seq_length`. The
tensor is then unsqueezed, resulting in a tensor of shape `(1,
seq_length)`. This means that the last `view` to a tensor of shape
`(-1, seq_length)` is a no-op.

This commit removes the unnecessary view.

* Remove no-op `view` of `position_ids` in rest of transformer models

8878eb1b

08 Aug, 2023 1 commit

Add warning for missing attention mask when pad tokens are detected (#25345) · 5ea2595e

JB (Don) authored Aug 08, 2023

* Add attention mask and pad token warning to many of the models

* Remove changes under examples/research_projects

These files are not maintained by HG.

* Skip the warning check during torch.fx or JIT tracing

* Switch ordering for the warning and input shape assignment

This ordering is a little cleaner for some of the cases.

* Add missing line break in one of the files

5ea2595e

07 Aug, 2023 1 commit

Loosen output shape restrictions on GPT-style models (#25188) · 65001cb1

calpt authored Aug 07, 2023

* Loosen output shape restrictions on GPT-style models

* Use more self-explanatory variables

* Revert "Use more self-explanatory variables"

This reverts commit 5fd9ab39119558b7e750f61aa4a19014dccc5ed5.

65001cb1

25 Jul, 2023 1 commit

[ `ForSequenceClassification`] Support `left` padding (#24979) · f1045227

Arthur authored Jul 25, 2023

* support left padding

* nit

* Update src/transformers/models/gpt_neox/modeling_gpt_neox.py

* Update src/transformers/models/gpt_neox/modeling_gpt_neox.py

f1045227

27 Jun, 2023 1 commit

Clean load keys (#24505) · 8e5d1619

Sylvain Gugger authored Jun 27, 2023

* Preliminary work on some models

* Fix test load missing and make sure nonpersistent buffers are tested

* Always ignore nonpersistent buffers if in state_dict

* Treat models

* More models

* Treat remaining models

* Fix quality

* Fix tests

* Remove draft

* This test is not needed anymore

* Fix copies

* Fix last test

* Newly added models

* Fix last tests

* Address review comments

8e5d1619

22 Jun, 2023 1 commit
- Revert "Fix gradient checkpointing + fp16 autocast for most models" (#24420) · 3ce3385c
  Younes Belkada authored Jun 22, 2023
```
Revert "Fix gradient checkpointing + fp16 autocast for most models (#24247)"

This reverts commit 285a4801.
```
  3ce3385c
21 Jun, 2023 1 commit

Fix gradient checkpointing + fp16 autocast for most models (#24247) · 285a4801

Younes Belkada authored Jun 21, 2023



* fix gc bug

* continue PoC on OPT

* fixes

* :exploding_head:

* fix tests

* remove pytest.mark

* fixup

* forward contrib credits from discussions

* forward contrib credits from discussions

* reverting changes on untouched files.

---------
Co-authored-by: zhaoqf123 <zhaoqf123@users.noreply.github.com>
Co-authored-by: 7eu7d7 <7eu7d7@users.noreply.github.com>

285a4801

13 Jun, 2023 1 commit

Tied params cleanup (#24211) · 695928e1

Sylvain Gugger authored Jun 13, 2023

* First test

* Add info for all models

* style

* Repo consistency

* Fix last model and cleanup prints

* Repo consistency

* Use consistent function for detecting tied weights

695928e1

31 May, 2023 1 commit
- Skip device placement for past key values in decoder models (#23919) · fabe17a7
  Sylvain Gugger authored May 31, 2023
  
  fabe17a7
24 May, 2023 1 commit
- fix gptj could not jit.trace in GPU (#23317) · 767e6b53
  Wang, Yi authored May 24, 2023
```
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
```
  767e6b53
04 May, 2023 1 commit
- [`GPT-J`] Fix causal mask dtype (#23147) · 57ffd8ab
  Younes Belkada authored May 04, 2023
```
* fix #23136

* better fix

* same fix for `masked_bias`
```
  57ffd8ab
03 May, 2023 1 commit

GPTNeoForQuestionAnswering (#23057) · 78b7debf

peter-sk authored May 03, 2023



* first draft - gives index error in question_answering.py

* maturing

* no labels

* pipeline should know about QA

* fixing checks

* formatting

* fixed docstring

* initial commit

* formatting

* adding the class to many places

* towards less unhappy checks

* nearly there

* Update src/transformers/models/gpt_neo/modeling_gpt_neo.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* avoid error

* moving to device of star/end_logits

---------
Co-authored-by: Prof. Peter Schneider-Kamp <jps@ordbogen.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

78b7debf

20 Apr, 2023 1 commit

moved labels to the same device as logits for OTP, CODEGEN ,gptj and pixel2struct model (#22872) · 91d6a593

SUSHMANTH REDDY authored Apr 20, 2023

* moved labels to the same device as logits for OTP model

* moved labels to the same device as logits for CODEGEN model

* Update modeling_codegen.py

* moved labels to the same device as logits for gptj and pix2struct model

* Update modeling_pix2struct.py

91d6a593

12 Apr, 2023 1 commit
- Added parallel device usage for GPT-J (#22713) · 17503b00
  jprivera44 authored Apr 12, 2023
  
  17503b00
27 Mar, 2023 1 commit
- Generate: support for left-padding on GPTNeoX and Llama (#22382) · 7dcd8703
  Joao Gante authored Mar 27, 2023
  
  7dcd8703
23 Mar, 2023 1 commit

[gptj] support older pytorch version (#22325) · 61f79b29

Stas Bekman authored Mar 22, 2023



* [gptj] support older pytorch version

* contributor

* contributor

* make copies

---------
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>

61f79b29

22 Mar, 2023 1 commit

Fix position embeddings for GPT-J and CodeGen (#22069) · 4e94c6c0

Nick Hill authored Mar 22, 2023

* Revert "[GPT-J] add deprecation warning (#21869)"

This reverts commit fb76994c.

* Fix position embeddings for GPT-J and CodeGen

* Address review comments from @gante

* Fix "Copied from" comment referencing wrong function

* Fix copy/paste mistake

* Fix training path

* Hopefully make torch.fx happy

* Move position_ids long cast

* Revert "Hopefully make torch.fx happy"

This reverts commit e41a6f4cad3ff441124c7457b19cfb630d4ca025.

* Changes to help with torch.fx tracing

* Linter fix

* Correct position_ids tensor type hint

* Work-around torch.fx tracing issue

* Get the changes to work with torch.fx

* Address review comment from @michaelbenayoun

* Another small adjustment

* Add explanatory comment; small code tidyup

4e94c6c0

02 Mar, 2023 1 commit
- [GPT-J] add deprecation warning (#21869) · fb76994c
  Arthur authored Mar 02, 2023
```
* add deprecation warning

* remove pos ids from args docstirng

* fix failing test
```
  fb76994c
28 Feb, 2023 1 commit

[GPTJ] Fix gradient checkpointing bug (#21794) · 31fa2b6c

Herumb Shandilya authored Feb 28, 2023



* If applied, this commit fixes generate bug in gptj

* Remove extra same code block

* formatting and test fix

* Conflict fix and declaration error fix

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

31fa2b6c

27 Feb, 2023 2 commits

introduce `logger.warning_once` and use it for grad checkpointing code (#21804) · c7f3abc2
Stas Bekman authored Feb 27, 2023
```
* logger.warning_once

* style
```
c7f3abc2

[torch] remove deprecated uint8 in favor of bool (#21384) · c51dc4f9

Arthur authored Feb 27, 2023



* uint8 -> bool

* fix copies

* style

* update test modeling commen when checking attention buffers

* style

* use logical not on random mask instead of subtraction with 1

* remove torch uint8

* quality

* remove modified modeling utils

* Update based on review
Co-authored-by: sgugger <sylvain.gugger@gmail.com>

---------
Co-authored-by: sgugger <sylvain.gugger@gmail.com>

c51dc4f9

22 Feb, 2023 1 commit
- Apply ruff flake8-comprehensions (#21694) · 5e8c8eb5
  Aaron Gokaslan authored Feb 22, 2023
  
  5e8c8eb5
13 Feb, 2023 1 commit
- Add `inputs_embeds` support when generating with GPT-J (#21575) · 93ed89bf
  Dzmitry Pletnikau authored Feb 13, 2023
  
  93ed89bf
07 Feb, 2023 2 commits

[CI ] Remove `past` in favor of `pat_key_values` (#21443) · 12eb528b

Arthur authored Feb 07, 2023

* fix past renamed to past_key_value

* update more `past`that were ski^êd

* fixup

* remove changes made to rag

* refactor `_reorder_cache` to use `past_key_values`

* fix git `prepare_inputs_for_generation` to pass tests when false is needed in use_cache

12eb528b

Deprecate parallelize API (#21448) · 5b493762
Sylvain Gugger authored Feb 06, 2023
```
* Deprecate parallelize API

* Add documentation

* Fix copies
```
5b493762

06 Feb, 2023 1 commit

Update quality tooling for formatting (#21480) · 6f79d264

Sylvain Gugger authored Feb 06, 2023

* Result of black 23.1

* Update target to Python 3.7

* Switch flake8 to ruff

* Configure isort

* Configure isort

* Apply isort with line limit

* Put the right black version

* adapt black in check copies

* Fix copies

6f79d264

23 Jan, 2023 1 commit

Models docstring (#21225) · fd5cdaee

Sylvain Gugger authored Jan 23, 2023

* Clean all models

* Style

* Last to remove

* address review comments

* Address review comments

fd5cdaee

20 Jan, 2023 1 commit

Fix `GPTJ` doctest (#21213) · ef530175

Yih-Dar authored Jan 20, 2023



Replace the checkpoint - the current one has shape issue
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

ef530175

19 Jan, 2023 1 commit
- Add disclaimer for necessary fake models (#21178) · 862888a3
  Sylvain Gugger authored Jan 19, 2023
```
* Add disclaimer for necessary fake models

* Address review comments

* Use for GPT-NeoX as well
```
  862888a3
08 Jan, 2023 1 commit

Replace `past` with `past_key_values` (#20944) · f0577df6

Arthur authored Jan 08, 2023

* start cleanup

* more updates

* more models are affected

* more updates

* update generation utils

* style

* revert change that removed reorder cachce

* update generation utils

* style

* style

* remove reorder cache

f0577df6

08 Dec, 2022 1 commit

Fix CIs for PyTorch 1.13 (#20686) · e3cc4487

Yih-Dar authored Dec 08, 2022



* fix 1

* fix 2

* fix 3

* fix 4
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

e3cc4487

23 Sep, 2022 1 commit

Fix incorrect comments about atten mask for pytorch backend (#18728) · ece76244

Tianqi Zhang (张天启) authored Sep 24, 2022



* fix incorrect comments about atten mask

* typo

* Update for CodeGen
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

ece76244