Commits · 870d91fb89dd2e0924df6ad2fe85e3084f1c380f · chenpangpang / transformers

10 Apr, 2023 2 commits

Model parallelism: Moving labels to the same device as logits for BridgeTower models (#22676) · 870d91fb
Shahad Mahmud authored Apr 10, 2023
```
BrideTower Model parallelism logits device for loss calculation
```
870d91fb

Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) (#22575) · e0921c6b

Joel Lamy-Poirier authored Apr 10, 2023



* Add model with cli tool

* Remove unwanted stuff

* Add new code

* Remove inference runner

* Style

* Fix checks

* Test updates

* make fixup

* fix docs

* fix doc

* fix test

* hopefully fix pipeline tests

* refactor

* fix CIs

* add comment

* rename to `GPTBigCodeForCausalLM`

* correct readme

* make fixup + docs

* make fixup

* fixes

* fixes

* Remove pruning

* Remove import

* Doc updates

* More pruning removal

* Combine copies

* Single MQA implementation, remove kv cache pre-allocation and padding

* Update doc

* Revert refactor to match gpt2 style

* Merge back key and value caches, fix some type hints

* Update doc

* Fix position ids pith padding (PR 21080)

* Add conversion script temporarily

* Update conversion script

* Remove checkpoint conversion

* New model

* Fix MQA test

* Fix copies

* try fix tests

* FIX TEST!!

* remove  `DoubleHeadsModel`

* add MQA tests

* add slow tests

* clean up

* add CPU checker

* final fixes

* fixes

- fix GPU issue
- fixed slow tests
- skip disk offload

* fix final issue

* Simplify and comment baddbmm fix

* Remove unnecessary code

* Transpose tweaks

* Use beta=1 on cpu, improve tests

---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>

e0921c6b

07 Apr, 2023 11 commits

moved labels to the same device as logits for BLOOM, GPT Neo, GPT NeoX,... · 656e869a

Arun Brahma authored Apr 08, 2023

moved labels to the same device as logits for BLOOM, GPT Neo, GPT NeoX, RoBERTa and VIT models (#22663)

moved labels to the same device as logits

656e869a

Revert migration of setup to pyproject.toml (#22658) · 6db23af5
Sylvain Gugger authored Apr 07, 2023

6db23af5
Generate: add API warning to streamers (#22659) · 3f96e0b4
Joao Gante authored Apr 07, 2023
```
add API warning
```
3f96e0b4

[OPT] Fix default attention mask size (#22649) · f3341926

Arthur authored Apr 07, 2023

* Fix default attention mask size

* fixup

* add a test to make sure that even if attention mask are not provided, works

* style

f3341926

[tokenization] do not push special file (#22657) · b1b3dc3e

Arthur authored Apr 07, 2023



* do not push special file

* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

b1b3dc3e

Small nit, (#22653) · 117a0f6a

Arthur authored Apr 07, 2023

* Small nit,
Fixes #21986

* Update src/transformers/pipelines/__init__.py

117a0f6a

🌐

[i18n-KO] Translated `pipeline_tutorial.mdx` to Korean (#22508) · fc1ba6fd

Wonhyeong Seo authored Apr 08, 2023



docs: feat: Korean pipeline_tutorial
Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>
Co-authored-by: Hyeonseo Yun <0525_hhgus@naver.com>
Co-authored-by: gabrielwithappy <102908949+gabrielwithappy@users.noreply.github.com>
Co-authored-by: Na Yeon Han <nayeon2.han@gmail.com>

fc1ba6fd

Fix `MegaModel` CI (#22652) · 14d5b2b6

Yih-Dar authored Apr 07, 2023



* fix

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

14d5b2b6

Fix typo (#22650) · f2cc8ffd
Seung-Moo Yang authored Apr 07, 2023

f2cc8ffd
Move labels to the same device as logits for LlamaForSequenceClassification and Blip2 (#22596) · 1de8ce9e
Shikhar Chauhan authored Apr 07, 2023
```
* (feat): Move labels to the same device as logits

* Trigger CI

* Trigger CI

* Trigger CI

* (feat): Making changes for Blip2
```
1de8ce9e
🌐[i18n-KO] Translate `autoclass_tutorial` to Korean and Fix the typo of `quicktour` (#22533) · d59034ff
gabrielwithappy authored Apr 07, 2023
```
translate the autoclass_tutorial and fix the typo of the quicktour
```
d59034ff

06 Apr, 2023 13 commits

fix FSDP version related issues (#22489) · ee8e80a0
Sourab Mangrulkar authored Apr 07, 2023
```
fix fsdp
```
ee8e80a0

Update tiny model summary file for recent models (#22637) · c7ec71ba

Yih-Dar authored Apr 06, 2023



* Update tiny model summary file for recent models

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

c7ec71ba

[`Blip`] Fix slow tests and doctests with correct values (#22632) · ed672864
Younes Belkada authored Apr 06, 2023
```
fix slow tests and doctests
```
ed672864
LlamaTokenizerFast Fix (.., from_slow=True). (#22630) · 6a02e980
Nicolas Patry authored Apr 06, 2023

6a02e980
[`bnb`] 8bit models should not be converted to `DDP` (#22628) · 09a9888f
Younes Belkada authored Apr 06, 2023
```
add safety checker
```
09a9888f

A script to add/update `pipeline_model_mapping` systematically (#22180) · d0b83fe2

Yih-Dar authored Apr 06, 2023



* Auto. add and update pipeline_model_mapping

* Fix style and quality

* Finalize (comments)

* fix

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

d0b83fe2

update_pip_test_mapping (#22606) · fa01127a

Yih-Dar authored Apr 06, 2023



* Add TFBlipForConditionalGeneration

* update pipeline_model_mapping

* Add import

* Revert changes in GPTSanJapaneseTest

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

fa01127a

docs: Fix broken link to generation strategies (#22623) · 321b0908
Connor Henderson authored Apr 06, 2023
```
fix broken link
```
321b0908
Make tiny model creation + pipeline testing more robust (#22500) · 2c22bc79
Yih-Dar authored Apr 06, 2023
```
* Final Tiny things

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
2c22bc79

Backbone add mixin tests (#22542) · 12d51db2

amyeroberts authored Apr 06, 2023

* Add out_indices to backbones, deprecate out_features

* Update - can specify both out_features and out_indices but not both

* Add backbone mixin tests

* Test tidy up

* Add test_backbone for convnext

* Remove redefinition of method

* Update for Dinat and Nat backbones

* Update tests

* Smarter indexing

* Add checks on config creation for backbone

* PR comments

12d51db2

Seq2SeqTrainer: use unwrapped model to retrieve the generation config (#22584) · 48706c71
Joao Gante authored Apr 06, 2023

48706c71
Revert error back into warning for byte fallback conversion. (#22607) · 0aa1153f
Nicolas Patry authored Apr 06, 2023

0aa1153f

Adding Llama FastTokenizer support. (#22264) · 1670be4b

Nicolas Patry authored Apr 06, 2023

* Adding Llama FastTokenizer support.

- Requires https://github.com/huggingface/tokenizers/pull/1183 version
- Only support byte_fallback for llama, raise otherwise (safety net).
- Lots of questions are special tokens

How to test:

```python

from transformers.convert_slow_tokenizer import convert_slow_tokenizer
from transformers import AutoTokenizer
from tokenizers import Tokenizer

tokenizer = AutoTokenizer.from_pretrained("huggingface/llama-7b")

if False:
    new_tokenizer = Tokenizer.from_file("tok.json")
else:
    new_tokenizer = convert_slow_tokenizer(tokenizer)
    new_tokenizer.save("tok.json")

strings = [
    "This is a test",
    "生活的真谛是",
    "生活的真谛是[MASK]。",
    # XXX: This one is problematic because of special tokens
    # "<s> Something something",
]

for string in strings:
    encoded = tokenizer(string)["input_ids"]
    encoded2 = new_tokenizer.encode(string).ids

    assert encoded == encoded2, f"{encoded} != {encoded2}"

    decoded = tokenizer.decode(encoded)
    decoded2 = new_tokenizer.decode(encoded2)

    assert decoded.strip() == decoded2, f"{repr(decoded)} != {repr(decoded2)}"
```

The converter + some test script.

The test script.

Tmp save.

Adding Fast tokenizer + tests.

Adding the tokenization tests.

Correct combination.

Small fix.

Fixing tests.

Fixing with latest update.

Rebased.

fix copies + normalized added tokens  + copies.

Adding doc.

TMP.

Doc + split files.

Doc.

Versions + try import.

Fix Camembert + warnings -> Error.

Fix by ArthurZucker.

Not a decorator.

* Fixing comments.

* Adding more to docstring.

* Doc rewriting.

1670be4b

05 Apr, 2023 13 commits

feat(model parallelism): moving the labels to the same device as the logits... · 15641892
Kaustubh authored Apr 06, 2023
```
feat(model parallelism): moving the labels to the same device as the logits for gpt2 and bart (#22591)
```
15641892
Use native TF checkpoints for the BLIP TF tests (#22593) · e577bd0f
Matt authored Apr 05, 2023
```
* Use native TF checkpoints for the TF tests

* Remove unneeded exceptions
```
e577bd0f

Add DePlot + MatCha on `transformers` (#22528) · 176ceff9

Younes Belkada authored Apr 05, 2023



* add deplot + matcha on `transformers`

* more docs

* correct path

* Update docs/source/en/model_doc/deplot.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix

* use auto processor

* Update docs/source/en/model_doc/matcha.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* make fixup

* Update docs/source/en/model_doc/deplot.mdx
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* add correct names

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

176ceff9

Adding support for BPE merge creation from scores instead of ids. (#22582) · 126eafe3

Nicolas Patry authored Apr 05, 2023

* Adding support for BPE merge creation from scores instead of ids.

* Revert warn -> raise.

* Update src/transformers/convert_slow_tokenizer.py

* Quality.

126eafe3

Fix a typo in one of the BLIP pretrained checkpoint names (#22588) · 12f1a3bb
Matt authored Apr 05, 2023
```
Fixes a typo in one of the BLIP pretrained checkpoint names
```
12f1a3bb

Sync preprocesses before loading the processor at run_speech_recognition_ctc.py (#21926) · d5239bab

Mikel Penagarikano authored Apr 05, 2023

* Update run_speech_recognition_ctc.py

Make sure all processes wait until data is saved before loading the processor from the output_dit

* Make sure all processes wait until data is saved before loading the processor from the output_dit

* Update run_speech_recognition_ctc.py

* Update run_speech_recognition_seq2seq.py

d5239bab

docs: ko: complete `_toctree.yml` (#22581) · f49b0762
Wonhyeong Seo authored Apr 05, 2023
```
Co-authored-by: gabrielwithappy <102908949+gabrielwithappy@users.noreply.github.com>
```
f49b0762

Add thousands separator in training summary (#22583) · 4861c258

Quentin Meeus authored Apr 05, 2023

The logger prints a summary at the beginning of training that displays some info such as number of examples, number of parameters, total number of steps, etc. Those numbers can be quite large and difficult to read. I added a thousand separator to improve readability for the following:
- num_examples
- num_train_epochs
- per_device_train_batch_size
- total_train_batch_size
- max_steps
- num_trainable_params

4861c258

Fix PT-TF equivalence test for GPT1 (#22586) · 2a91a9ef

Matt authored Apr 05, 2023

* Re-enable skipped test and fix the hidden state shape issue

* Actually fix the bug instead of just doing something wrong

2a91a9ef

Tests: disable `accelerate_tests` mark warnings (#22585) · 06842849
Joao Gante authored Apr 05, 2023

06842849
Move back doctest instructions to setup.cfg (#22587) · 6c640f09
Sylvain Gugger authored Apr 05, 2023

6c640f09
Generate: `TextIteratorStreamer` timeout (#22576) · 861ff890
Joao Gante authored Apr 05, 2023

861ff890
Skip failing test · 11fd2c77
Sylvain Gugger authored Apr 04, 2023

11fd2c77

04 Apr, 2023 1 commit

Fix inverted conditional in TF common test! (#22540) · edb704b2

Matt authored Apr 04, 2023

* Fix inverted conditional in TF common test!

* Make the same change in the PT tests file

* Make sure hidden states for GPT2 have the same output shape in PT/TF

* Minor fix to PT implementation of token classification loss

* Skip loss equivalence test for TFHubert because it keeps overflowing to inf

* Compute LM loss for TF the (weird) way it's computed in PT

* Skip loss equivalence test for Wav2Vec2 for the same reason as Hubert

* Fix - don't try to access the hidden states property when output is a tuple

edb704b2