Commits · bc4bbd9f6eb7a9281aa48bda737697435399b09f · chenpangpang / transformers

19 Oct, 2023 1 commit
- [`FA-2` / `Mistral`] Supprot fa-2 + right padding + forward (#26912) · bc4bbd9f
  Younes Belkada authored Oct 19, 2023
```
supprot fa-2 + right padding + forward
```
  bc4bbd9f
18 Oct, 2023 2 commits

[`FA-2`] Revert suggestion that broke FA2 fine-tuning with quantized models (#26916) · 574a5384
Younes Belkada authored Oct 19, 2023
```
revert
```
574a5384

[`FA-2`] Final fix for FA2 dtype (#26846) · 5a73316b

Younes Belkada authored Oct 18, 2023



* final fix for FA2 dtype

* try

* oops

* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* apply fix everywhere

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

5a73316b

13 Oct, 2023 1 commit
- [`core`] Fix fa-2 import (#26785) · 6df9179c
  Younes Belkada authored Oct 13, 2023
```
* fix fa-2 import

* nit
```
  6df9179c
11 Oct, 2023 1 commit

In assisted decoding, pass model_kwargs to model's forward call (fix... · dcc49d8a

Billy Bradley authored Oct 11, 2023

In assisted decoding, pass model_kwargs to model's forward call (fix prepare_input_for_generation in all models) (#25242)

* In assisted decoding, pass model_kwargs to model's forward call

Previously, assisted decoding would ignore any additional kwargs
that it doesn't explicitly handle. This was inconsistent with other
generation methods, which pass the model_kwargs through
prepare_inputs_for_generation and forward the returned dict to the
model's forward call.

The prepare_inputs_for_generation method needs to be amended in all
models, as previously it only kept the last input ID when a past_key_values
was passed.

* Improve variable names in _extend_attention_mask

* Refactor extending token_type_ids into a function

* Replace deepcopy with copy to optimize performance

* Update new persimmon model with llama changes for assisted generation

* Update new mistral model for assisted generation with prepare_inputs_for_generation

* Update position_ids creation in falcon prepare_inputs_for_generation to support assisted generation

dcc49d8a

06 Oct, 2023 1 commit

Remove unnecessary unsqueeze - squeeze in rotary positional embedding (#26162) · 64845307

fxmarty authored Oct 06, 2023

* remove unnecessary unsqueeze-squeeze in llama

* correct other models

* fix

* revert gpt_neox_japanese

* fix copie

* fix test

64845307

03 Oct, 2023 1 commit

[`Mistral`] Add Flash Attention-2 support for `mistral` (#26464) · ae9a344c

Younes Belkada authored Oct 03, 2023



* add FA-2 support for mistral

* fixup

* add sliding windows

* fixing few nits

* v1 slicing cache - logits do not match

* add comment

* fix bugs

* more mem efficient

* add warning once

* add warning once

* oops

* fixup

* more comments

* copy

* add safety checker

* fixup

* Update src/transformers/models/mistral/modeling_mistral.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* copied from

* up

* raise when padding side is right

* fixup

* add doc + few minor changes

* fixup

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

ae9a344c

27 Sep, 2023 1 commit

[Mistral] Mistral-7B-v0.1 support (#26447) · 72958fcd

Chris Bamford authored Sep 27, 2023



* [Mistral] Mistral-7B-v0.1 support

* fixing names

* slightly longer test

* fixups

* not_doctested

* wrongly formatted references

* make fixuped

---------
Co-authored-by: Timothee Lacroix <t@eugen.ai>
Co-authored-by: timlacroix <t@mistral.ai>

72958fcd

22 Sep, 2023 2 commits

[`core` ] Integrate Flash attention 2 in most used models (#25598) · 368a58e6

Younes Belkada authored Sep 22, 2023



* v1

* oops

* working v1

* fixup

* add some TODOs

* fixup

* padding support + try with module replacement

* nit

* alternative design

* oops

* add `use_cache` support for llama

* v1 falcon

* nit

* a bit of refactor

* nit

* nits nits

* add v1 padding support falcon (even though it seemed to work before)

* nit

* falcon works

* fixup

* v1 tests

* nit

* fix generation llama flash

* update tests

* fix tests + nits

* fix copies

* fix nit

* test- padding mask

* stype

* add more mem efficient support

* Update src/transformers/modeling_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fixup

* nit

* fixup

* remove it from config when saving

* fixup

* revert docstring

* add more checks

* use values

* oops

* new version

* fixup

* add same trick for falcon

* nit

* add another test

* change tests

* fix issues with GC and also falcon

* fixup

* oops

* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add init_rope

* updates

* fix copies

* fixup

* fixup

* more clarification

* fixup

* right padding tests

* add docs

* add FA in docker image

* more clarifications

* add some figures

* add todo

* rectify comment

* Change to FA2

* Update docs/source/en/perf_infer_gpu_one.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* split in two lines

* change test name

* add more tests

* some clean up

* remove `rearrange` deps

* add more docs

* revert changes on dockerfile

* Revert "revert changes on dockerfile"

This reverts commit 8d72a66b4b9b771abc3f15a9b9506b4246d62d8e.

* revert changes on dockerfile

* Apply suggestions from code review
Co-authored-by: Lysandre Debut <hi@lysand.re>

* address some comments

* docs

* use inheritance

* Update src/transformers/testing_utils.py
Co-authored-by: Lysandre Debut <hi@lysand.re>

* fixup

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

* final comments

* clean up

* style

* add cast + warning for PEFT models

* fixup

---------
Co-authored-by: Felix Marty <9808326+fxmarty@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>

368a58e6

Fix doctest CI (#26324) · c3ecf2d9

Yih-Dar authored Sep 22, 2023



fix doc CI
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

c3ecf2d9

18 Sep, 2023 1 commit
- [Permisson] Style fix (#26228) · e5f7e03b
  Sanchit Gandhi authored Sep 18, 2023
```
fix copies
```
  e5f7e03b
12 Sep, 2023 1 commit

[`Persimmon`] Add support for persimmon (#26042) · 9cccb3a8

Arthur authored Sep 12, 2023



* intiial commit

* updates

* nits

* update conversion script

* update conversion script

* use path to load

* add tips etc

* some modeling logic

* modeling update

* more nits

* nits

* normal layer norm

* update config and doc

* nits

* update doc remove unused

* update

* fix inits and stuff

* fixup

* revert wrong changes

* updates

* more nits

* add default config values to the configuration file

* fixup happy

* update

* 2 tests left

* update readmes

* more nits

* slow test and more documentation

* update readme

* fix licences

* styling

* use fast if possible when saving tokenizer

* remove todo

* remove tokenization tests

* small last nits

* Apply suggestions from code review
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* nits to skip the timout doctest

* fix integration test

* fix test

* update eos token

* update to allow fast tokenization

* styling

* fix codeLlama as well for the update post processor

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add more copied from statements

* update

* doc passes doctest

* remove `# final layer norm?`

* change docstring prompot

* update

* Update README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* don't doctest the conversion script as it requires more packages

* don't init a model in the config

* oups

* fix doctest

---------
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

9cccb3a8

25 Aug, 2023 1 commit

[`CodeLlama`] Add support for `CodeLlama` (#25740) · 015f8e11

Arthur authored Aug 25, 2023



* add all

* Revert "Delete .github directory"

This reverts commit 9b0ff7b052e2b20b629a26fb13606b78a42944d1.

* make conversion script backward compatible

* fixup

* more styling

* copy to llama changes

* fix repo consistency

* nits

* document correct classes

* updates

* more fixes

* nits

* update auto mappings

* add readmes

* smallupdates

* llama-code replace with llama_code

* make fixup

* updates to the testsing suite

* fix fast nits

* more small fixes

* fix decode

* fix template processing

* properly reset the normalizer

* nits processor

* tokenization tests pass

* styling

* last tests

* additional nits

* one test is left

* nits

Co-authored-by faabian <faabian@users.noreply.github.com>

* update failing test

* fixup

* remove decode infilling users should handle it on their onw after generation, padding can be a problem

* update

* make test slow and more meaningfull

* fixup

* doc update

* fixup

* Apply suggestions from code review

* add kwargs doc

* tokenizer requires `requires_backend`

* type requires_backends

* CodeLlama instead of LlamaCode

* more name cahnges

* nits

* make doctests happy

* small pipeline nits

* last nit

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* update

* add codellama to toctree

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

015f8e11

18 Aug, 2023 1 commit

[`Llama`] remove prompt and fix prefix finetuning (#25565) · bc3e20dc

Arthur authored Aug 18, 2023

* nit

* update

* make sure use_default_system_prompt is saved

* update checkpointing

* consistency

* use_default_system_prompt for test

bc3e20dc

25 Jul, 2023 1 commit

[ `ForSequenceClassification`] Support `left` padding (#24979) · f1045227

Arthur authored Jul 25, 2023

* support left padding

* nit

* Update src/transformers/models/gpt_neox/modeling_gpt_neox.py

* Update src/transformers/models/gpt_neox/modeling_gpt_neox.py

f1045227

21 Jul, 2023 1 commit
- [`Llama`] remove persistent `inv_freq` tensor (#24998) · 95f96b45
  Arthur authored Jul 21, 2023
```
remove persistent tensor
```
  95f96b45
19 Jul, 2023 1 commit
- [`Llama2`] replace `self.pretraining_tp` with `self.config.pretraining_tp` (#24906) · ee4250a3
  Younes Belkada authored Jul 19, 2023
```
* add possibility to disable TP

* fixup

* adapt from offline discussions
```
  ee4250a3
18 Jul, 2023 1 commit

[`Llama2`] Add support for Llama 2 (#24891) · 07360b6c

Arthur authored Jul 18, 2023



* add llama

* add other readmes

* update padding id in readme

* add link to paper

* fix paths and tokenizer

* more nits

* styling

* fit operation in 2 lines when possible

* nits

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add form

* update reademe

* update readme, we don't have a default pad token

* update test and tokenization

* LLaMA instead of Llama

* nits

* add expected text

* add greeedy output

* styling

* Update src/transformers/models/llama/modeling_llama.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* sequential device map

* skip relevant changes

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

07360b6c

13 Jul, 2023 2 commits

Llama/GPTNeoX: add RoPE scaling (#24653) · 34d94094

Joao Gante authored Jul 13, 2023



* add rope_scaling

* tmp commit

* add gptneox

* add tests

* GPTNeoX can now handle long inputs, so the pipeline test was wrong

* Update src/transformers/models/open_llama/configuration_open_llama.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* remove ntk

* remove redundant validation

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

34d94094

Removing unnecessary `device=device` in modeling_llama.py (#24696) · 1f6f32c2
Liyang90 authored Jul 13, 2023
```
* Update modeling_llama.py

Removing unnecessary `device=device`

* fix in all occurrences of _make_causal_mask
```
1f6f32c2

04 Jul, 2023 1 commit

llama fp16 torch.max bug fix (#24561) · a3b402ff

Prathik Rao authored Jul 04, 2023



* open llama fp16 bug fix

* bug fix

* bug fixed

* make style

* Update modeling_llama.py

* apply formatting

* Address amy's comment

---------

Co-authored-by: Prathik Rao <prathikrao@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: root <root@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>

a3b402ff

27 Jun, 2023 1 commit

Clean load keys (#24505) · 8e5d1619

Sylvain Gugger authored Jun 27, 2023

* Preliminary work on some models

* Fix test load missing and make sure nonpersistent buffers are tested

* Always ignore nonpersistent buffers if in state_dict

* Treat models

* More models

* Treat remaining models

* Fix quality

* Fix tests

* Remove draft

* This test is not needed anymore

* Fix copies

* Fix last test

* Newly added models

* Fix last tests

* Address review comments

8e5d1619

22 Jun, 2023 1 commit
- Revert "Fix gradient checkpointing + fp16 autocast for most models" (#24420) · 3ce3385c
  Younes Belkada authored Jun 22, 2023
```
Revert "Fix gradient checkpointing + fp16 autocast for most models (#24247)"

This reverts commit 285a4801.
```
  3ce3385c
21 Jun, 2023 1 commit

Fix gradient checkpointing + fp16 autocast for most models (#24247) · 285a4801

Younes Belkada authored Jun 21, 2023



* fix gc bug

* continue PoC on OPT

* fixes

* :exploding_head:

* fix tests

* remove pytest.mark

* fixup

* forward contrib credits from discussions

* forward contrib credits from discussions

* reverting changes on untouched files.

---------
Co-authored-by: zhaoqf123 <zhaoqf123@users.noreply.github.com>
Co-authored-by: 7eu7d7 <7eu7d7@users.noreply.github.com>

285a4801

15 Jun, 2023 1 commit

Fix LLaMa beam search when using parallelize (#24224) · 33196b45

Fei Wang authored Jun 15, 2023

* Fix LLaMa beam search when using parallelize

same issue as T5 #11717

* fix code format in modeling_llama.py

* fix format of _reorder_cache in modeling_llama.py

33196b45

13 Jun, 2023 1 commit

Tied params cleanup (#24211) · 695928e1

Sylvain Gugger authored Jun 13, 2023

* First test

* Add info for all models

* style

* Repo consistency

* Fix last model and cleanup prints

* Repo consistency

* Use consistent function for detecting tied weights

695928e1

12 Jun, 2023 1 commit
- Remove unnecessary aten::to overhead in llama (#24203) · e5dd7432
  fxmarty authored Jun 13, 2023
```
* fix dtype init

* fix copies

* fix fixcopies mess

* edit forward as well

* copy
```
  e5dd7432
08 Jun, 2023 1 commit

Fix typo in Llama docstrings (#24020) · 9322c244

Serge Panev authored Jun 08, 2023



* Fix typo in Llama docstrings
Signed-off-by: Serge Panev <spanev@nvidia.com>

* Update
Signed-off-by: Serge Panev <spanev@nvidia.com>

* make style
Signed-off-by: Serge Panev <spanev@nvidia.com>

---------
Signed-off-by: Serge Panev <spanev@nvidia.com>

9322c244

31 May, 2023 1 commit
- Skip device placement for past key values in decoder models (#23919) · fabe17a7
  Sylvain Gugger authored May 31, 2023
  
  fabe17a7
22 May, 2023 2 commits

Bugfix: LLaMA layer norm incorrectly changes input type and consumers lots of memory (#23535) · 4ddd9de9

Tim Dettmers authored May 22, 2023



* Fixed bug where LLaMA layer norm would change input type.

* make fix-copies

---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>

4ddd9de9

Fix tensor device while attention_mask is not None (#23538) · 29294b0e

zspo authored May 22, 2023

* Fix tensor device while attention_mask is not None

* Fix tensor device while attention_mask is not None

29294b0e

24 Apr, 2023 1 commit
- fix ValueError message in LlamaAttention (#22966) · 503e8c8b
  othertea authored Apr 24, 2023
  
  503e8c8b
17 Apr, 2023 2 commits
- Fix squeeze into torch 1.x compatible form in llama model (#22808) · f8c43c94
  Kunhao ZHENG authored Apr 17, 2023
```
fix-squeeze-tuple
```
  f8c43c94
- improve(llama): Faster apply_rotary_pos_emb (#22785) · 626c1b8a
  fpgaminer authored Apr 17, 2023
  
  626c1b8a
07 Apr, 2023 1 commit
- Move labels to the same device as logits for LlamaForSequenceClassification and Blip2 (#22596) · 1de8ce9e
  Shikhar Chauhan authored Apr 07, 2023
```
* (feat): Move labels to the same device as logits

* Trigger CI

* Trigger CI

* Trigger CI

* (feat): Making changes for Blip2
```
  1de8ce9e
31 Mar, 2023 1 commit

Making sure we can use safetensors to serialize all the time. (#22437) · d143087d

Nicolas Patry authored Mar 31, 2023



* Making sure we can use safetensors to serialize all the time.

* Expanding the tests for increased coverage.

* Update the test.

* Getting current state of affairs.

* Tentative fix.

* Fixing black version.

* Fixing the worst offenders.

* Try to modify less files.

* Fixing blip_2 (Weird solution right now).

* Fixing deta.

* Fix blip ?

* Missing extra newline.

* No deta modification.

* Adding some comments.

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Addressing comments.

* Addressing comments.

* creating warn_once.

* Warning_once !

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

d143087d

30 Mar, 2023 1 commit
- Llama: support for `max_position_embeddings` (#22471) · 165dd6dc
  Joao Gante authored Mar 30, 2023
```
* Llama now supports max_position_embeddings

* Save config; Cosmetic edits
```
  165dd6dc
28 Mar, 2023 1 commit

[performance] ensure `causal_mask` is created directly on device (#22378) · ae5fc2db

Jeff Rasley authored Mar 28, 2023

* ensure causal_mask is created directly on device

* add copy tag to opt, update bart implementation

* add device to all _make_causal_mask copies

* formatting fixes

* more manual fixes due to unlinked versions of _prepare_decoder_attention_mask

ae5fc2db

27 Mar, 2023 2 commits
- Generate: support for left-padding on GPTNeoX and Llama (#22382) · 7dcd8703
  Joao Gante authored Mar 27, 2023
  
  7dcd8703
- load_in_8bit now respects 'balanced' device maps in multi-gpu environments (#22377) · 66d1eee6
  кѳѳsнī authored Mar 27, 2023
```
balanced 8bit memory
```
  66d1eee6