Commits · 25e6e9418cc03bde2051f20f3ce0765c268e5943 · chenpangpang / transformers

"magic_pdf/resources/vscode:/vscode.git/clone" did not exist on "73f66af9bd8c4f861b4725d73baeb6b9905e49f9"

31 Oct, 2023 10 commits

[docs] Update CPU/GPU inference docs (#26881) · 77930f8a

Steven Liu authored Oct 31, 2023



* first draft

* remove non-existent paths

* edits

* feedback

* feedback and optimum

* Apply suggestions from code review
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>

* redirect to correct doc

* _redirects.yml

---------
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>

77930f8a

Fix dropout in `StarCoder` (#27182) · e22b7ced
Susnato Dhar authored Oct 31, 2023
```
fix dropout in modeling_gpt_bigcode.py
```
e22b7ced
Backward compatibility fix for the Conversation class (#27176) · 05f22901
Matt authored Oct 31, 2023
```
* Backward compatibility fix for the Conversation class

* Explain what's going on in the conditional
```
05f22901

[FEAT] Add Neftune into transformers Trainer (#27141) · 309a9066

Younes Belkada authored Oct 31, 2023



* add v1 neftune

* use `unwrap_model` instead

* add test + docs

* Apply suggestions from code review
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* more details

* fixup

* Update docs/source/en/main_classes/trainer.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* refactor a bit

* more elaborated test

* fix unwrap issue

---------
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

309a9066

Add support for loading GPTQ models on CPU (#26719) · 2963e196

Vivek Khandelwal authored Oct 31, 2023

* Add support for loading GPTQ models on CPU

Right now, we can only load the GPTQ Quantized model on the CUDA
device. The attribute `gptq_supports_cpu` checks if the current
auto_gptq version is the one which has the cpu support for the
model or not.
The larger variants of the model are hard to load/run/trace on
the GPU and that's the rationale behind adding this attribute.

Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>

* Update quantization.md

* Update quantization.md

* Update quantization.md

2963e196

fix: Fix typical_p behaviour broken in recent change (#27165) · 3cd3eaf9

Nick Hill authored Oct 31, 2023

A recent PR https://github.com/huggingface/transformers/pull/26579 fixed an edge case out-of-bounds tensor indexing error in TypicalLogitsWarper, and a related behaviour change was made that we thought fixed a long-standing bug w.r.t. the token inclusion cutoff.

However after looking more closely, I am pretty certain that the original logic was correct and that the OOB fix should have been made differently.

Specifically the docs state that it should include the "smallest set of tokens that add up to P or higher" and so `last_ind` should actually be one more than the index of the last token satisfying (cumulative_probs < self.mass).

We still need a max clamp in case that last token is the very last one in the tensor.

3cd3eaf9

Add flash attention for `gpt_bigcode` (#26479) · b5db8ca6

Susnato Dhar authored Oct 31, 2023



* added flash attention of gpt_bigcode

* changed docs

* Update src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py

* add FA-2 docs

* oops

* Update docs/source/en/perf_infer_gpu_one.md Last Nit
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix

* oops

* remove padding_mask

* change getattr->hasattr logic

* changed .md file

---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

b5db8ca6

[doctring] Fix docstring for BlipTextConfig, BlipVisionConfig (#27173) · 14bb196c
Seungwoo, Jeong authored Oct 31, 2023
```
Update configuration_blip.py

edit docstrings
```
14bb196c

[docstring] Fix docstring for AltCLIPTextConfig, AltCLIPVisionConfig and AltCLIPConfig (#27128) · 9234caef

Akshar Goyal authored Oct 31, 2023

* [docstring] Fix docstring for AltCLIPVisionConfig, AltCLIPTextConfig + cleaned some docstring

* Removed entries from check_docstring.py

* Removed entries from check_docstring.py

* Removed entry from check_docstring.py

* [docstring] Fix docstring for AltCLIPTextConfig, AltCLIPVisionConfig and AltCLIPConfig

9234caef

deprecate function `get_default_device` in `tools/base.py` (#26774) · df6f36a1

Hz, Ji authored Oct 31, 2023



* get default device through `PartialState().default_device` as is has
been officially released

* apply code review suggestion

* apply code review suggestion
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

---------
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

df6f36a1

30 Oct, 2023 8 commits

Fix import of torch.utils.checkpoint (#27155) · d39352d1

NielsRogge authored Oct 30, 2023



* Fix import

* Apply suggestions from code review
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

d39352d1

Device agnostic trainer testing (#27131) · 5bbf6712
Hz, Ji authored Oct 31, 2023

5bbf6712

Remove some Kosmos-2 `copied from` (#27149) · 3224c0c1

Yih-Dar authored Oct 30, 2023



* fix

* fix

* fix

* fix

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

3224c0c1

Add `Kosmos-2` model (#24709) · 691fd8fd

Yih-Dar authored Oct 30, 2023



* Add KOSMOS-2 model

* update

* update

* update

* address review comment - 001

* address review comment - 002

* address review comment - 003

* style

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix

* address review comment - 004

* address review comment - 005

* address review comment - 006

* address review comment - 007

* address review comment - 008

* address review comment - 009

* address review comment - 010

* address review comment - 011

* update readme

* fix

* fix

* fix

* [skip ci] fix

* revert the change in _decode

* fix docstring

* fix docstring

* Update docs/source/en/model_doc/kosmos-2.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* no more Kosmos2Tokenizer

* style

* remove "returned when being computed by the model"

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* UTM5 Atten

* fix attn mask

* use present_key_value_states instead of next_decoder_cache

* style

* conversion scripts

* conversion scripts

* conversion scripts

* Add _reorder_cache

* fix doctest and copies

* rename 1

* rename 2

* rename 3

* make fixup

* fix table

* fix docstring

* rename 4

* change repo_id

* remove tip

* update md file

* make style

* update md file

* put docs/source/en/model_doc/kosmos-2.md to slow

* update conversion script

* Use CLIPImageProcessor in Kosmos2Processor

* Remove Kosmos2ImageProcessor

* Remove to_dict in Kosmos2Config

* Remove files

* fix import

* Update conversion

* normalized=False

* Not using hardcoded values like <image>

* elt --> element

* Apply suggestion

* Not using hardcoded values like </image>

* No assert

* No nested functions

* Fix md file

* copy

* update doc

* fix docstring

* fix name

* Remove _add_remove_spaces_around_tag_tokens

* Remove dummy docstring of _preprocess_single_example

* Use `BatchEncoding`

* temp

* temp

* temp

* Update

* Update

* Make Kosmos2ProcessorTest a bit pretty

* Update gradient checkpointing

* Fix gradient checkpointing test

* Remove one liner remove_special_fields

* Simplify conversion script

* fix add_eos_token

* update readme

* update tests

* Change to microsoft/kosmos-2-patch14-224

* style

* Fix doc

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

691fd8fd

remove the obsolete code related to fairscale FSDP (#26651) · d751dbec
Hz, Ji authored Oct 30, 2023
```
* remove the obsolete code related to fairscale FSDP

* apple review suggestion
```
d751dbec
[`Trainer` / `GC`] Add `gradient_checkpointing_kwargs` in trainer and training arguments (#27068) · 5fbed2d7
Younes Belkada authored Oct 30, 2023
```
* add `gradient_checkpointing_kwargs` in trainer and training arguments

* add comment

* add test - currently failing

* now tests pass
```
5fbed2d7
Fix data2vec-audio note about attention mask (#27116) · e830495c
Thien Tran authored Oct 30, 2023
```
fix data2vec audio note about attention mask
```
e830495c
[`FA2`/ `Mistral`] Revert previous behavior with right padding + forward (#27125) · 16043211
Younes Belkada authored Oct 30, 2023
```
Update modeling_mistral.py
```
16043211

27 Oct, 2023 9 commits

Fix docstring and type hint for resize (#27104) · 9e87618f
Daniil authored Oct 27, 2023
```
fix docstring and type hint for resize
```
9e87618f

[Attention Mask] Refactor all encoder-decoder attention mask (#27086) · ac589375

Patrick von Platen authored Oct 27, 2023



* [FA2 Bart] Add FA2 to all Bart-like

* better

* Refactor attention mask

* remove all customized atteniton logic

* format

* mass rename

* replace _expand_mask

* replace _expand_mask

* mass rename

* add pt files

* mass replace & rename

* mass replace & rename

* mass replace & rename

* mass replace & rename

* Update src/transformers/models/idefics/modeling_idefics.py

* fix more

* clean more

* fix more

* make style

* fix again

* finish

* finish

* finish

* finish

* finish

* finish

* finish

* finish

* finish

* finish

* Apply suggestions from code review

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* small fix mistral

* finish

* finish

* finish

* finish

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

ac589375

fix detr device map (#27089) · 29c74f58
Marc Sun authored Oct 27, 2023
```
* fix detr device map

* add comments
```
29c74f58

[`core`/ `gradient_checkpointing`] Refactor GC - part 2 (#27073) · ffff9e70

Younes Belkada authored Oct 27, 2023



* fix

* more fixes

* fix other models

* fix long t5

* use `gradient_checkpointing_func` instead

* fix copies

* set `gradient_checkpointing_func` as a private attribute and retrieve previous behaviour

* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* replace it with `is_gradient_checkpointing_set`

* remove default

* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixup

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

ffff9e70

Fix no split modules underlying modules (#27090) · 5be1fb6d

Marc Sun authored Oct 27, 2023



* fix no split

* style

* remove comm

* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* rename modules

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

5be1fb6d

Provide alternative when warning on use_auth_token (#27105) · 66b088fa
Lucain authored Oct 27, 2023

66b088fa

Add early stopping for Bark generation via logits processor (#26675) · e2bffcfa

Isaac Chung authored Oct 27, 2023

* add early stopping logits processor

* black formmated

* indent

* follow method signature

* actual logic

* check for None

* address comments on docstrings and method signature

* add unit test under `LogitsProcessorTest` wip

* unit test passing

* black formatted

* condition per sample

* add to BarkModelIntegrationTests

* wip BarkSemanticModelTest

* rename and add to kwargs handling

* not add to BarkSemanticModelTest

* correct logic and assert last outputs tokens different in test

* doc-builder style

* read from kwargs as well

* assert len of with less than that of without

* ruff

* add back seed and test case

* add original impl default suggestion

* doc-builder

* rename and use softmax

* switch back to LogitsProcessor and update docs wording

* camelCase and spelling and saving compute

* assert strictly less than

* assert less than

* expand test_generate_semantic_early_stop instead

e2bffcfa

Revert "add exllamav2 arg" (#27102) · 90ee9cea
Arthur authored Oct 27, 2023
```
Revert "add exllamav2 arg (#26437)"

This reverts commit 8214d6e7.
```
90ee9cea
[`T5Tokenizer`] Fix fast and extra tokens (#27085) · aa4198a2
Arthur authored Oct 27, 2023
```
* v4.35.dev.0

* nit t5fast match t5 slow
```
aa4198a2

26 Oct, 2023 9 commits

Save TB logs as part of push_to_hub (#27022) · 34a64064

Zach Mueller authored Oct 26, 2023

* Support runs/

* Upload runs folder as part of push to hub

* Add a test

* Add to test deps

* Update with proposed solution from Slack

* Ensure that repo gets deleted in tests

34a64064

Correct docstrings and a typo in comments (#27047) · 18925925

L. Yeung authored Oct 26, 2023



* docs(training_args): correct docstrings

Correct docstrings of these methods in `TrainingArguments`:

- `set_save`
- `set_logging`

* docs(training_args): adjust words in docstrings
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* docs(trainer): correct a typo in comments

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

18925925

add exllamav2 arg (#26437) · 8214d6e7

Marc Sun authored Oct 26, 2023

* add_ xllamav2 arg

* add test

* style

* add check

* add doc

* replace by use_exllama_v2

* fix tests

* fix doc

* style

* better condition

* fix logic

* add deprecate msg

8214d6e7

[Llama FA2] Re-add _expand_attention_mask and clean a couple things (#27074) · d7cb5e13

Patrick von Platen authored Oct 26, 2023

* clean

* clean llama

* fix more

* make style

* Apply suggestions from code review

* Apply suggestions from code review

* Update src/transformers/models/llama/modeling_llama.py

* Update src/transformers/models/llama/modeling_llama.py

* Apply suggestions from code review

* finish

* make style

d7cb5e13

Add-support for commit description (#26704) · 4864d08d
Arthur authored Oct 26, 2023
```
* fix

* update

* revert

* add dosctring

* good to go

* update

* add a test
```
4864d08d
Remove unneeded prints in modeling_gpt_neox.py (#27080) · fe2877ce
Younes Belkada authored Oct 26, 2023

fe2877ce
Bump`flash_attn` version to `2.1` (#27079) · efba1a17
Younes Belkada authored Oct 26, 2023
```
* pin FA-2 to `2.1`

* fix on modeling
```
efba1a17

Bring back `set_epoch` for Accelerate-based dataloaders (#26850) · 90412401

Zach Mueller authored Oct 26, 2023



* Working tests!

* Fix sampler

* Fix

* Update src/transformers/trainer.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Fix check

* Clean

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

90412401

Handle unsharded Llama2 model types in conversion script (#27069) · df2eebf1
corey hu authored Oct 25, 2023
```
Handle all unshared models types
```
df2eebf1

25 Oct, 2023 4 commits

[`docs`] Add `MaskGenerationPipeline` in docs (#27063) · c34c50cd

Younes Belkada authored Oct 25, 2023

* add `MaskGenerationPipeline` in docs

* Update __init__.py

* fix repo consistency and clarify docstring

* add on check docstirngs

* actually we do have a tf sam

* oops

c34c50cd

[docstring] fix incorrect llama docstring: encoder -> decoder (#27071) · a64f8c1f
Jing Hua authored Oct 26, 2023
```
fix incorrect docstring: encoder -> decoder
```
a64f8c1f

Fix TypicalLogitsWarper tensor OOB indexing edge case (#26579) · 0baa9246

Nick Hill authored Oct 25, 2023

* Fix TypicalLogitsWarper tensor OOB indexing edge case

This can be triggerd fairly quickly with low precision e.g. bfloat16 and typical_p = 0.99.

* Shift threshold index by one

* Use explicit named arg for clamp min

0baa9246

[`core`] Refactor of `gradient_checkpointing` (#27020) · 06e782da

Younes Belkada authored Oct 25, 2023

* v1

* fix

* remove `create_custom_forward`

* fixup

* fixup

* add test and fix all failing GC tests

* remove all remaining `create_custom_forward` methods

* fix idefics bug

* fixup

* replace with `__call__`

* add comment

* quality

06e782da