Commits · 98597725f1cbcd0cca81e06d23053bd581c86450 · chenpangpang / transformers

10 Apr, 2023 1 commit

Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) (#22575) · e0921c6b

Joel Lamy-Poirier authored Apr 10, 2023



* Add model with cli tool

* Remove unwanted stuff

* Add new code

* Remove inference runner

* Style

* Fix checks

* Test updates

* make fixup

* fix docs

* fix doc

* fix test

* hopefully fix pipeline tests

* refactor

* fix CIs

* add comment

* rename to `GPTBigCodeForCausalLM`

* correct readme

* make fixup + docs

* make fixup

* fixes

* fixes

* Remove pruning

* Remove import

* Doc updates

* More pruning removal

* Combine copies

* Single MQA implementation, remove kv cache pre-allocation and padding

* Update doc

* Revert refactor to match gpt2 style

* Merge back key and value caches, fix some type hints

* Update doc

* Fix position ids pith padding (PR 21080)

* Add conversion script temporarily

* Update conversion script

* Remove checkpoint conversion

* New model

* Fix MQA test

* Fix copies

* try fix tests

* FIX TEST!!

* remove  `DoubleHeadsModel`

* add MQA tests

* add slow tests

* clean up

* add CPU checker

* final fixes

* fixes

- fix GPU issue
- fixed slow tests
- skip disk offload

* fix final issue

* Simplify and comment baddbmm fix

* Remove unnecessary code

* Transpose tweaks

* Use beta=1 on cpu, improve tests

---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>

e0921c6b

07 Apr, 2023 2 commits

[OPT] Fix default attention mask size (#22649) · f3341926

Arthur authored Apr 07, 2023

* Fix default attention mask size

* fixup

* add a test to make sure that even if attention mask are not provided, works

* style

f3341926

Fix `MegaModel` CI (#22652) · 14d5b2b6

Yih-Dar authored Apr 07, 2023



* fix

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

14d5b2b6

06 Apr, 2023 7 commits

Update tiny model summary file for recent models (#22637) · c7ec71ba

Yih-Dar authored Apr 06, 2023



* Update tiny model summary file for recent models

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

c7ec71ba

[`Blip`] Fix slow tests and doctests with correct values (#22632) · ed672864
Younes Belkada authored Apr 06, 2023
```
fix slow tests and doctests
```
ed672864

update_pip_test_mapping (#22606) · fa01127a

Yih-Dar authored Apr 06, 2023



* Add TFBlipForConditionalGeneration

* update pipeline_model_mapping

* Add import

* Revert changes in GPTSanJapaneseTest

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

fa01127a

Make tiny model creation + pipeline testing more robust (#22500) · 2c22bc79
Yih-Dar authored Apr 06, 2023
```
* Final Tiny things

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
2c22bc79

Backbone add mixin tests (#22542) · 12d51db2

amyeroberts authored Apr 06, 2023

* Add out_indices to backbones, deprecate out_features

* Update - can specify both out_features and out_indices but not both

* Add backbone mixin tests

* Test tidy up

* Add test_backbone for convnext

* Remove redefinition of method

* Update for Dinat and Nat backbones

* Update tests

* Smarter indexing

* Add checks on config creation for backbone

* PR comments

12d51db2

Revert error back into warning for byte fallback conversion. (#22607) · 0aa1153f
Nicolas Patry authored Apr 06, 2023

0aa1153f

Adding Llama FastTokenizer support. (#22264) · 1670be4b

Nicolas Patry authored Apr 06, 2023

* Adding Llama FastTokenizer support.

- Requires https://github.com/huggingface/tokenizers/pull/1183 version
- Only support byte_fallback for llama, raise otherwise (safety net).
- Lots of questions are special tokens

How to test:

```python

from transformers.convert_slow_tokenizer import convert_slow_tokenizer
from transformers import AutoTokenizer
from tokenizers import Tokenizer

tokenizer = AutoTokenizer.from_pretrained("huggingface/llama-7b")

if False:
    new_tokenizer = Tokenizer.from_file("tok.json")
else:
    new_tokenizer = convert_slow_tokenizer(tokenizer)
    new_tokenizer.save("tok.json")

strings = [
    "This is a test",
    "生活的真谛是",
    "生活的真谛是[MASK]。",
    # XXX: This one is problematic because of special tokens
    # "<s> Something something",
]

for string in strings:
    encoded = tokenizer(string)["input_ids"]
    encoded2 = new_tokenizer.encode(string).ids

    assert encoded == encoded2, f"{encoded} != {encoded2}"

    decoded = tokenizer.decode(encoded)
    decoded2 = new_tokenizer.decode(encoded2)

    assert decoded.strip() == decoded2, f"{repr(decoded)} != {repr(decoded2)}"
```

The converter + some test script.

The test script.

Tmp save.

Adding Fast tokenizer + tests.

Adding the tokenization tests.

Correct combination.

Small fix.

Fixing tests.

Fixing with latest update.

Rebased.

fix copies + normalized added tokens  + copies.

Adding doc.

TMP.

Doc + split files.

Doc.

Versions + try import.

Fix Camembert + warnings -> Error.

Fix by ArthurZucker.

Not a decorator.

* Fixing comments.

* Adding more to docstring.

* Doc rewriting.

1670be4b

05 Apr, 2023 4 commits
- Use native TF checkpoints for the BLIP TF tests (#22593) · e577bd0f
  Matt authored Apr 05, 2023
```
* Use native TF checkpoints for the TF tests

* Remove unneeded exceptions
```
  e577bd0f
- Fix PT-TF equivalence test for GPT1 (#22586) · 2a91a9ef
  Matt authored Apr 05, 2023
```
* Re-enable skipped test and fix the hidden state shape issue

* Actually fix the bug instead of just doing something wrong
```
  2a91a9ef
- Generate: `TextIteratorStreamer` timeout (#22576) · 861ff890
  Joao Gante authored Apr 05, 2023
  
  861ff890
- Skip failing test · 11fd2c77
  Sylvain Gugger authored Apr 04, 2023
  
  11fd2c77
04 Apr, 2023 8 commits

Fix inverted conditional in TF common test! (#22540) · edb704b2

Matt authored Apr 04, 2023

* Fix inverted conditional in TF common test!

* Make the same change in the PT tests file

* Make sure hidden states for GPT2 have the same output shape in PT/TF

* Minor fix to PT implementation of token classification loss

* Skip loss equivalence test for TFHubert because it keeps overflowing to inf

* Compute LM loss for TF the (weird) way it's computed in PT

* Skip loss equivalence test for Wav2Vec2 for the same reason as Hubert

* Fix - don't try to access the hidden states property when output is a tuple

edb704b2

Flax Regnet (#21867) · 90067748

Shubhamai authored Apr 04, 2023

* initial commit

* review changes

* post model PR merge

* updating doc

90067748

Add TF port of BLIP (#22090) · 5f3ea66b

Matt authored Apr 04, 2023



* Initial commit

* more stash commit

* Yet another stash commit

* yet more stash commit

* Mostly working except for docs / repo consistency

* Stop importing model list from torch file

* Add TF BLIP models to docs

* Add auto classes

* Move get_text_features and get_image_features

* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip_text.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/blip/test_modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/blip/test_modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update tests/models/blip/test_modeling_tf_blip_text.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip_text.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Use channels_last convolutions in TF (better performance + compatibility)

* Remove _shape function

* Move multi-line statement to one line in PT + TF

* Specify tf.keras.layers instead of importing from it

* Remove test_gradient_checkpointing and empty test_training methods

* move some multi-line statements to one line

* Update docstring for generate

* Remove pruned heads set

* Remove self.seq_len_dim

* Fixed issues with loss computation, should resolve some tests. Also ensured that the PT version follows the config for output_attentions and output_hidden_states

* ensure original model follows config in more cases

* Skip the same cross-attention tests in the PT tests - didn't realize we did it twice!

* Add training args throughout the models and layers

* make fixup

* Fix docstring for inputs_embeds

* Add docstring for is_decoder

* Add docstrings to text models

* Remove redundant computation

* Add unpack_inputs / keras_serializable

* Add modeling_tf_blip to doctests

* Add config classes for keras serialization

* Changes to allow model porting with pt-to-tf

* Quick fix to decoder head and test tweaks

* Revert an issue with masking the embeddings outputs

* Allow missing keys in some equivalence tests (for unused layers)

* Add tf-pt equivalence tests back in

* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip_text.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/blip/modeling_tf_blip_text.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* make fixup

* Refactor invert_attention_mask out into tf_utils

* Re-enable cross-tests on the PT side too

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

5f3ea66b

Soft error whisper. (#22475) · a515d0a7

Nicolas Patry authored Apr 04, 2023



* Soft error whisper.

* Fix format.

---------
Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-94.taildb5d.ts.net>

a515d0a7

Implemented safetensors checkpoints save/load for Trainer (#22498) · 871598be

Viktor Scherbakov authored Apr 04, 2023



* implemented safetensors save/load

* remove duplicated file

* added tests

* more tests

* style fix

* fix tf tests

* change to list comprehension
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* review fixes + safe load for sharded checkpoint

* style fix

* remove rogue import

* remove partial to avoid undefined exception

* use naming alias instead of safetensors.torch

* fix safe sharding in tests

* grammar
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* update docs
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* update docs
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* minor corrections

* style

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

871598be

🚨

`[NLLB Tokenizer]` Fix the prefix tokens

🚨

(#22313) · 00b5887b

Arthur authored Apr 04, 2023



* fix the prefix tokens

* update fast and test values

* add legacy behaviour
Co-authored-by: sgugger <sylvain.gugger@gmail.com>

* update disclaimer, linkissue PR and behaviral changes

* Apply suggestions from code review
Co-authored-by: Lysandre Debut <hi@lysand.re>

* styling

* make a quote

* quote this time

---------
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>

00b5887b

[Roformer] Fixing a bug in RoFormerEncoder where it was ignoring the length of... · ad5e9b6c

TheWall9 authored Apr 04, 2023

[Roformer] Fixing a bug in RoFormerEncoder where it was ignoring the length of past_key_values when generating as a decoder (#22416)

* fix RoFormerEncoder postion embedding when generate as decoder

* make fixup

* add test case for check generate with past key values

* remove duplicating code

ad5e9b6c

Generate: Add text streamer decoding options (#22544) · 1905384f
Joao Gante authored Apr 04, 2023

1905384f

03 Apr, 2023 7 commits

Update test_image_processing_pix2struct.py (#22543) · 159ff334
Younes Belkada authored Apr 03, 2023

159ff334
Skip failing test · c14d3129
Sylvain Gugger authored Apr 03, 2023

c14d3129

fix LayoutLMv3TokenizerFast subword label after 'Ġ' token (#21695) · 4e441e52

Thibault Douzon authored Apr 03, 2023

LayoutLMv3TokenizerFast produces empty 'Ġ' token with `offset_mapping = (0, 0)`.
Next token is wrongly assumed to also be beginning of word and isn't
correctly assigned `pad_token_label`.
Modify test with text that produce 'Ġ' token.
Remove copy check from LayoutLMv2TokenizerFast for `_batch_encode_plus`.

solves issue: #19978

4e441e52

Generate: `TextIteratorStreamer` (streamer for gradio) (#22501) · a55a822a
Joao Gante authored Apr 03, 2023
```
* haha text go brrr (but in gradio)
```
a55a822a

added biogpt token classifier (#22447) · 7d25c9c8

Mohammed Jabir authored Apr 03, 2023



* added biogpt token classifier

* fix reviews

* Updated modeling_biogpt.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

7d25c9c8

Fix llama tokenizer (#22402) · c0f99b4d

Arthur authored Apr 03, 2023

* draft

* update tokenization limma and conversion script

* more udpates

* initial commit

* style

* default pad to None

* draft tokenization tests

* update test

* update tokenization tests

* nits

* update

* versioning test

* major fix

* fix more testst

* finish fixing special masks

* last nit

* more nits

* add encode decode tests

* add more

* fix token type ids

* style

c0f99b4d

[Time-Series] fix past_observed_mask type (#22076) · 9eae4aa5
Eli Simhayev authored Apr 03, 2023
```
added > 0.5 to `past_observed_mask`
```
9eae4aa5

31 Mar, 2023 2 commits

Test fetch v2 (#22367) · c6126280

Sylvain Gugger authored Mar 31, 2023



* Test fetcher v2

* Fix regexes

* Remove sanity check

* Fake modification to OPT

* Fixes some .sep issues

* Remove fake OPT change

* Fake modif for BERT

* Fake modif for init

* Exclude SageMaker tests

* Fix test and remove fake modif

* Fake setup modif

* Fake pipeline modif

* Remove all fake modifs

* Adds options to skip/force tests

* [test-all-models] Fake modif for BERT

* Try this way

* Does the command actually work?

* [test-all-models] Try again!

* [skip circleci] Remove fake modif

* Remove debug statements

* Add the list of important models

* Quality

* Update utils/tests_fetcher.py
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

* Address review comments

* Address review comments

* Fix and add test

* Apply suggestions from code review
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* Address review comments

---------
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

c6126280

Making sure we can use safetensors to serialize all the time. (#22437) · d143087d

Nicolas Patry authored Mar 31, 2023



* Making sure we can use safetensors to serialize all the time.

* Expanding the tests for increased coverage.

* Update the test.

* Getting current state of affairs.

* Tentative fix.

* Fixing black version.

* Fixing the worst offenders.

* Try to modify less files.

* Fixing blip_2 (Weird solution right now).

* Fixing deta.

* Fix blip ?

* Missing extra newline.

* No deta modification.

* Adding some comments.

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Addressing comments.

* Addressing comments.

* creating warn_once.

* Warning_once !

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

d143087d

30 Mar, 2023 4 commits
- [NLLB-MoE] `model_type` update for auto mapping (#22470) · 349e1242
  Arthur authored Mar 30, 2023
```
edit default model type and testing path set to hf-internal-testing
```
  349e1242
- Generate: basic token streaming (#22449) · 228792a9
  Joao Gante authored Mar 30, 2023
```
* haha tokens go brrrr
```
  228792a9
- Skip flaky NLLB Moe test for now (#22463) · f0aeb1be
  amyeroberts authored Mar 30, 2023
```
Skip flaky test for now
```
  f0aeb1be
- Rescale image back if it was scaled during PIL conversion (#22458) · 154c6bb7
  amyeroberts authored Mar 30, 2023
```
* Rescale image back if it was scaled during PIL conversion

* do_rescale is defined if PIL image passed in
```
  154c6bb7
29 Mar, 2023 4 commits

[`Pix2Struct`] Fix slow test (#22448) · b844f8a9
Younes Belkada authored Mar 29, 2023
```
fix slow test
```
b844f8a9
Use real tokenizers if tiny version(s) creation has issue(s) (#22428) · 8894b817
Yih-Dar authored Mar 29, 2023
```
Fix some tiny model creation issues
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
8894b817
[`bnb`] fix bnb failing test (#22439) · 33f4cb10
Younes Belkada authored Mar 29, 2023
```
* fix bnb failing test

* fix

* fix

* fixup
```
33f4cb10

Add clean_up_tokenization_spaces to config (#22341) · 8d9c3836

Arthur authored Mar 29, 2023



* add draft changes

* fix failing wav2vec

* style

* make sure that the argument is saved + add tests

* style

* fixup

* update test

* default clean_up_tokenization_spaces to False for Bloom and Llama

* Update code based on review
Co-authored-by: Nicolas Patry <patry.nicolas@gmail.com>

* style

* quality

---------
Co-authored-by: Nicolas Patry <patry.nicolas@gmail.com>

8d9c3836

27 Mar, 2023 1 commit

[WIP]`NLLB-MoE` Adds the moe model (#22024) · 19ade242

Arthur authored Mar 27, 2023

* Initial commit

* update modeling code

* update doc

* add functions necessary

* fix impotrs

* revert changes

* fixup

* more styling to get going

* remove standalone encoder

* update code

* styling

* fix config and model

* update code and some refactoring

* make more tests pass

* Adding NLLB-200 - MoE - 54.5B for no language left behind
Fixes #21300

* fix mor common tests

* styke

* update testing file

* update

* update

* Router2 doc

* update check config with sparse layer

* add dummy router

* update current conversion script

* create on the fly conversion script

* Fixup

* style

* style 2

* fix empty return

* fix return

* Update default config sparse layers

* easier to create sparse layers

* update

* update conversion script

* update modeling

* add to toctree

* styling

* make ruff happy

* update docstring

* update conversion script

* update, will break tests but impelemting top2

* update

* ❗local groups are supported here

* ⚠️ Support for local groups is now removed ⚠️

This is because it has to work with model parallelism that we do not support

* finish simplificaiton

* Fix forward

* style

* fixup

* Update modelling and test, refactoring

* update tests

* remove final layer)norm as it is done in the FF

* routing works! Logits test added

* nit in test

* remove top1router

* style

* make sure sparse are tested. Had to change route_tokens a liottle bit

* add support for unslip models when converting

* fixup

* style

* update test s

* update test

* REFACTOR

* encoder outputs match!

* style

* update testing

* 🎉encoder and decoder logits match 🎉



* styleing

* update tests

* cleanup tests

* fix router test and CIs

* cleanup

* cleanup test styling

* fix tests

* Finally the generation tests match!

* cleanup

* update test

* style testing file

* remove script

* cleanup

* more cleanup

* nits

* update

* NLLB tokenizer is wrong and will be fixed soon

* use LongTensors

* update tests

* revert some small changes

* fix second expert sampling and batch prioritized routing

* update tests

* finish last tests

* make ruff happy

* update

* ruff again

* style

* Update docs/source/en/model_doc/nllb-moe.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Updates based on review

* style and fix import issue

* nit

* more nits

* cleanup

* styling

* update test_seconde_expert_policy

* fix name

* last nit on the markdown examples

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

19ade242