Commits · 14d5b2b645596924ff7fd703506104ee52b1dc33 · chenpangpang / transformers

07 Apr, 2023 4 commits
- Fix `MegaModel` CI (#22652) · 14d5b2b6
  Yih-Dar authored Apr 07, 2023
```
* fix

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  14d5b2b6
- Fix typo (#22650) · f2cc8ffd
  Seung-Moo Yang authored Apr 07, 2023
  
  f2cc8ffd
- Move labels to the same device as logits for LlamaForSequenceClassification and Blip2 (#22596) · 1de8ce9e
  Shikhar Chauhan authored Apr 07, 2023
```
* (feat): Move labels to the same device as logits

* Trigger CI

* Trigger CI

* Trigger CI

* (feat): Making changes for Blip2
```
  1de8ce9e
- 🌐[i18n-KO] Translate `autoclass_tutorial` to Korean and Fix the typo of `quicktour` (#22533) · d59034ff
  gabrielwithappy authored Apr 07, 2023
```
translate the autoclass_tutorial and fix the typo of the quicktour
```
  d59034ff
06 Apr, 2023 13 commits

fix FSDP version related issues (#22489) · ee8e80a0
Sourab Mangrulkar authored Apr 07, 2023
```
fix fsdp
```
ee8e80a0

Update tiny model summary file for recent models (#22637) · c7ec71ba

Yih-Dar authored Apr 06, 2023



* Update tiny model summary file for recent models

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

c7ec71ba

[`Blip`] Fix slow tests and doctests with correct values (#22632) · ed672864
Younes Belkada authored Apr 06, 2023
```
fix slow tests and doctests
```
ed672864
LlamaTokenizerFast Fix (.., from_slow=True). (#22630) · 6a02e980
Nicolas Patry authored Apr 06, 2023

6a02e980
[`bnb`] 8bit models should not be converted to `DDP` (#22628) · 09a9888f
Younes Belkada authored Apr 06, 2023
```
add safety checker
```
09a9888f

A script to add/update `pipeline_model_mapping` systematically (#22180) · d0b83fe2

Yih-Dar authored Apr 06, 2023



* Auto. add and update pipeline_model_mapping

* Fix style and quality

* Finalize (comments)

* fix

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

d0b83fe2

update_pip_test_mapping (#22606) · fa01127a

Yih-Dar authored Apr 06, 2023



* Add TFBlipForConditionalGeneration

* update pipeline_model_mapping

* Add import

* Revert changes in GPTSanJapaneseTest

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

fa01127a

docs: Fix broken link to generation strategies (#22623) · 321b0908
Connor Henderson authored Apr 06, 2023
```
fix broken link
```
321b0908
Make tiny model creation + pipeline testing more robust (#22500) · 2c22bc79
Yih-Dar authored Apr 06, 2023
```
* Final Tiny things

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
2c22bc79

Backbone add mixin tests (#22542) · 12d51db2

amyeroberts authored Apr 06, 2023

* Add out_indices to backbones, deprecate out_features

* Update - can specify both out_features and out_indices but not both

* Add backbone mixin tests

* Test tidy up

* Add test_backbone for convnext

* Remove redefinition of method

* Update for Dinat and Nat backbones

* Update tests

* Smarter indexing

* Add checks on config creation for backbone

* PR comments

12d51db2

Seq2SeqTrainer: use unwrapped model to retrieve the generation config (#22584) · 48706c71
Joao Gante authored Apr 06, 2023

48706c71
Revert error back into warning for byte fallback conversion. (#22607) · 0aa1153f
Nicolas Patry authored Apr 06, 2023

0aa1153f

Adding Llama FastTokenizer support. (#22264) · 1670be4b

Nicolas Patry authored Apr 06, 2023

* Adding Llama FastTokenizer support.

- Requires https://github.com/huggingface/tokenizers/pull/1183 version
- Only support byte_fallback for llama, raise otherwise (safety net).
- Lots of questions are special tokens

How to test:

```python

from transformers.convert_slow_tokenizer import convert_slow_tokenizer
from transformers import AutoTokenizer
from tokenizers import Tokenizer

tokenizer = AutoTokenizer.from_pretrained("huggingface/llama-7b")

if False:
    new_tokenizer = Tokenizer.from_file("tok.json")
else:
    new_tokenizer = convert_slow_tokenizer(tokenizer)
    new_tokenizer.save("tok.json")

strings = [
    "This is a test",
    "生活的真谛是",
    "生活的真谛是[MASK]。",
    # XXX: This one is problematic because of special tokens
    # "<s> Something something",
]

for string in strings:
    encoded = tokenizer(string)["input_ids"]
    encoded2 = new_tokenizer.encode(string).ids

    assert encoded == encoded2, f"{encoded} != {encoded2}"

    decoded = tokenizer.decode(encoded)
    decoded2 = new_tokenizer.decode(encoded2)

    assert decoded.strip() == decoded2, f"{repr(decoded)} != {repr(decoded2)}"
```

The converter + some test script.

The test script.

Tmp save.

Adding Fast tokenizer + tests.

Adding the tokenization tests.

Correct combination.

Small fix.

Fixing tests.

Fixing with latest update.

Rebased.

fix copies + normalized added tokens  + copies.

Adding doc.

TMP.

Doc + split files.

Doc.

Versions + try import.

Fix Camembert + warnings -> Error.

Fix by ArthurZucker.

Not a decorator.

* Fixing comments.

* Adding more to docstring.

* Doc rewriting.

1670be4b

05 Apr, 2023 13 commits

feat(model parallelism): moving the labels to the same device as the logits... · 15641892
Kaustubh authored Apr 06, 2023
```
feat(model parallelism): moving the labels to the same device as the logits for gpt2 and bart (#22591)
```
15641892
Use native TF checkpoints for the BLIP TF tests (#22593) · e577bd0f
Matt authored Apr 05, 2023
```
* Use native TF checkpoints for the TF tests

* Remove unneeded exceptions
```
e577bd0f

Add DePlot + MatCha on `transformers` (#22528) · 176ceff9

Younes Belkada authored Apr 05, 2023



* add deplot + matcha on `transformers`

* more docs

* correct path

* Update docs/source/en/model_doc/deplot.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix

* use auto processor

* Update docs/source/en/model_doc/matcha.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* make fixup

* Update docs/source/en/model_doc/deplot.mdx
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* add correct names

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

176ceff9

Adding support for BPE merge creation from scores instead of ids. (#22582) · 126eafe3

Nicolas Patry authored Apr 05, 2023

* Adding support for BPE merge creation from scores instead of ids.

* Revert warn -> raise.

* Update src/transformers/convert_slow_tokenizer.py

* Quality.

126eafe3

Fix a typo in one of the BLIP pretrained checkpoint names (#22588) · 12f1a3bb
Matt authored Apr 05, 2023
```
Fixes a typo in one of the BLIP pretrained checkpoint names
```
12f1a3bb

Sync preprocesses before loading the processor at run_speech_recognition_ctc.py (#21926) · d5239bab

Mikel Penagarikano authored Apr 05, 2023

* Update run_speech_recognition_ctc.py

Make sure all processes wait until data is saved before loading the processor from the output_dit

* Make sure all processes wait until data is saved before loading the processor from the output_dit

* Update run_speech_recognition_ctc.py

* Update run_speech_recognition_seq2seq.py

d5239bab

docs: ko: complete `_toctree.yml` (#22581) · f49b0762
Wonhyeong Seo authored Apr 05, 2023
```
Co-authored-by: gabrielwithappy <102908949+gabrielwithappy@users.noreply.github.com>
```
f49b0762

Add thousands separator in training summary (#22583) · 4861c258

Quentin Meeus authored Apr 05, 2023

The logger prints a summary at the beginning of training that displays some info such as number of examples, number of parameters, total number of steps, etc. Those numbers can be quite large and difficult to read. I added a thousand separator to improve readability for the following:
- num_examples
- num_train_epochs
- per_device_train_batch_size
- total_train_batch_size
- max_steps
- num_trainable_params

4861c258

Fix PT-TF equivalence test for GPT1 (#22586) · 2a91a9ef

Matt authored Apr 05, 2023

* Re-enable skipped test and fix the hidden state shape issue

* Actually fix the bug instead of just doing something wrong

2a91a9ef

Tests: disable `accelerate_tests` mark warnings (#22585) · 06842849
Joao Gante authored Apr 05, 2023

06842849
Move back doctest instructions to setup.cfg (#22587) · 6c640f09
Sylvain Gugger authored Apr 05, 2023

6c640f09
Generate: `TextIteratorStreamer` timeout (#22576) · 861ff890
Joao Gante authored Apr 05, 2023

861ff890
Skip failing test · 11fd2c77
Sylvain Gugger authored Apr 04, 2023

11fd2c77

04 Apr, 2023 10 commits

Fix inverted conditional in TF common test! (#22540) · edb704b2

Matt authored Apr 04, 2023

* Fix inverted conditional in TF common test!

* Make the same change in the PT tests file

* Make sure hidden states for GPT2 have the same output shape in PT/TF

* Minor fix to PT implementation of token classification loss

* Skip loss equivalence test for TFHubert because it keeps overflowing to inf

* Compute LM loss for TF the (weird) way it's computed in PT

* Skip loss equivalence test for Wav2Vec2 for the same reason as Hubert

* Fix - don't try to access the hidden states property when output is a tuple

edb704b2

fix `_no_split_modules` for Whisper model (#22486) · 48fbd8fa
Sourab Mangrulkar authored Apr 04, 2023

48fbd8fa

Flax Regnet (#21867) · 90067748

Shubhamai authored Apr 04, 2023

* initial commit

* review changes

* post model PR merge

* updating doc

90067748

corrected the code comment for the output of find_pruneable_heads_and_indices (#22557) · fc5b7419
Sun Haozhe authored Apr 04, 2023
```
* corrected/clarified the code comment of find_pruneable_heads_and_indices

* have run make style
```
fc5b7419

Add TF port of BLIP (#22090) · 5f3ea66b

Matt authored Apr 04, 2023



* Initial commit

* more stash commit

* Yet another stash commit

* yet more stash commit

* Mostly working except for docs / repo consistency

* Stop importing model list from torch file

* Add TF BLIP models to docs

* Add auto classes

* Move get_text_features and get_image_features

* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip_text.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/blip/test_modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/blip/test_modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update tests/models/blip/test_modeling_tf_blip_text.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip_text.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Use channels_last convolutions in TF (better performance + compatibility)

* Remove _shape function

* Move multi-line statement to one line in PT + TF

* Specify tf.keras.layers instead of importing from it

* Remove test_gradient_checkpointing and empty test_training methods

* move some multi-line statements to one line

* Update docstring for generate

* Remove pruned heads set

* Remove self.seq_len_dim

* Fixed issues with loss computation, should resolve some tests. Also ensured that the PT version follows the config for output_attentions and output_hidden_states

* ensure original model follows config in more cases

* Skip the same cross-attention tests in the PT tests - didn't realize we did it twice!

* Add training args throughout the models and layers

* make fixup

* Fix docstring for inputs_embeds

* Add docstring for is_decoder

* Add docstrings to text models

* Remove redundant computation

* Add unpack_inputs / keras_serializable

* Add modeling_tf_blip to doctests

* Add config classes for keras serialization

* Changes to allow model porting with pt-to-tf

* Quick fix to decoder head and test tweaks

* Revert an issue with masking the embeddings outputs

* Allow missing keys in some equivalence tests (for unused layers)

* Add tf-pt equivalence tests back in

* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip_text.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip_text.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* make fixup

* Refactor invert_attention_mask out into tf_utils

* Re-enable cross-tests on the PT side too

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

5f3ea66b

Soft error whisper. (#22475) · a515d0a7

Nicolas Patry authored Apr 04, 2023



* Soft error whisper.

* Fix format.

---------
Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-94.taildb5d.ts.net>

a515d0a7

Add id2label and label2id to model's config in run_xnil (#22558) · 98268b2e
Maziyar Panahi authored Apr 04, 2023
```
Add id2label and label2id to config in run_xnil
```
98268b2e
[`bnb`] Fix typo (#22556) · fa2bdffc
Younes Belkada authored Apr 04, 2023
```
Update modeling_utils.py
```
fa2bdffc
Remove hack for dynamic modules and use Python functions instead (#22537) · 28fcf006
Sylvain Gugger authored Apr 04, 2023

28fcf006

Implemented safetensors checkpoints save/load for Trainer (#22498) · 871598be

Viktor Scherbakov authored Apr 04, 2023



* implemented safetensors save/load

* remove duplicated file

* added tests

* more tests

* style fix

* fix tf tests

* change to list comprehension
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* review fixes + safe load for sharded checkpoint

* style fix

* remove rogue import

* remove partial to avoid undefined exception

* use naming alias instead of safetensors.torch

* fix safe sharding in tests

* grammar
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* update docs
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* update docs
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* minor corrections

* style

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

871598be