Commits · 692e61e91a0b83f5b847902ed619b7c74c0a5dda · chenpangpang / transformers

29 Jun, 2022 6 commits

Crystina authored Jun 29, 2022



* first draft adding Flax-t5-encoder and Flax-mt5-encoder

* imports

* after make fixup

* flax t5 encoder test

* black on test

* make fix-copies

* clean

* all_model_classes -> tuple

* clean test

* is_encoder_decoder=False in t5-enc tester

* remove file docstring before FlaxT5Encoder

* black

* isort

* commit suggestions on src/transformers/models/t5/modeling_flax_t5.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* commit suggestions on src/transformers/models/t5/modeling_flax_t5.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* remove _get_encoder_module

* self.decoder_seq_length -> self.encoder_seq_length as t5-enc does not have decoder

* bugfix - self.module_class is class itself, not instance;

* docs for mt5 and t5

* call -> __call__ in t5 doc

* FlaxMT5EncoderModel to TYPE_HINT

* run doc-builder to allow change the files
Co-authored-by: Suraj Patil <surajp815@gmail.com>

692e61e9

add MobileViT model (#17354) · fbc7598b

Matthijs Hollemans authored Jun 29, 2022



* add MobileViT

* fixup

* Update README.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* remove empty line
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* use clearer variable names

* rename to MobileViTTransformerLayer

* no longer inherit from nn.Sequential

* fixup

* fixup

* not sure why this got added twice

* rename organization for checkpoints

* fix it up

* Update src/transformers/models/mobilevit/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/mobilevit/configuration_mobilevit.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/mobilevit/configuration_mobilevit.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/mobilevit/configuration_mobilevit.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/models/mobilevit/test_modeling_mobilevit.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/mobilevit/modeling_mobilevit.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/mobilevit/modeling_mobilevit.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/mobilevit/modeling_mobilevit.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/mobilevit/modeling_mobilevit.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* code style improvements

* fixup

* Update docs/source/en/model_doc/mobilevit.mdx
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/en/model_doc/mobilevit.mdx
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/mobilevit/configuration_mobilevit.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/mobilevit/configuration_mobilevit.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* download labels from hub

* rename layers

* rename more layers

* don't compute loss in separate function

* remove some nn.Sequential

* replace nn.Sequential with new MobileViTTransformer class

* replace nn.Sequential with MobileViTMobileNetLayer

* fix pruning since model structure changed

* fixup

* fix doc comment

* remove custom resize from feature extractor

* fix ONNX import

* add to doc tests

* use center_crop from image_utils

* move RGB->BGR flipping into image_utils

* fix broken tests

* wrong type hint

* small tweaks
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

fbc7598b

OPT - Fix Softmax NaN in half precision mode (#17437) · d444edb3
Younes Belkada authored Jun 29, 2022

d444edb3

Add MVP model (#17787) · 3cff4cc5

StevenTang1998 authored Jun 29, 2022

* Add MVP model

* Update README

* Remove useless module

* Update docs

* Fix bugs in tokenizer

* Remove useless test

* Remove useless module

* Update vocab

* Remove specifying

* Remove specifying

* Add #Copied ... statement

* Update paper link

* Remove useless TFMvp

* Add #Copied ... statement

* Fix style in test mvp model

* Fix some typos

* Fix properties of unset special tokens in non verbose mode

* Update paper link

* Update MVP doc

* Update MVP doc

* Fix README

* Fix typos in docs

* Update docs

3cff4cc5

TF implementation of RegNets (#17554) · a7eba831

Aritra Roy Gosthipaty authored Jun 29, 2022



* chore: initial commit

Copied the torch implementation of regnets and porting the code to tf step by step. Also introduced an output layer which was needed for regnets.

* chore: porting the rest of the modules to tensorflow

did not change the documentation yet, yet to try the playground on the model

* Fix initilizations (#1)

* fix: code structure in few cases.

* fix: code structure to align tf models.

* fix: layer naming, bn layer still remains.

* chore: change default epsilon and momentum in bn.

* chore: styling nits.

* fix: cross-loading bn params.

* fix: regnet tf model, integration passing.

* add: tests for TF regnet.

* fix: code quality related issues.

* chore: added rest of the files.

* minor additions..

* fix: repo consistency.

* fix: regnet tf tests.

* chore: reorganize dummy_tf_objects for regnet.

* chore: remove checkpoint var.

* chore: remov unnecessary files.

* chore: run make style.

* Update docs/source/en/model_doc/regnet.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* chore: PR feedback I.

* fix: pt test. thanks to @ydshieh.

* New adaptive pooler (#3)

* feat: new adaptive pooler

Co-authored-by: @Rocketknight1

* chore: remove image_size argument.
Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: matt <rocketknight1@gmail.com>

* Empty-Commit

* chore: remove image_size comment.

* chore: remove playground_tf.py

* chore: minor changes related to spacing.

* chore: make style.

* Update src/transformers/models/regnet/modeling_tf_regnet.py
Co-authored-by: amyeroberts <aeroberts4444@gmail.com>

* Update src/transformers/models/regnet/modeling_tf_regnet.py
Co-authored-by: amyeroberts <aeroberts4444@gmail.com>

* chore: refactored __init__.

* chore: copied from -> taken from./g

* adaptive pool -> global avg pool, channel check.

* chore: move channel check to stem.

* pr comments - minor refactor and add regnets to doc tests.

* Update src/transformers/models/regnet/modeling_tf_regnet.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* minor fix in the xlayer.

* Empty-Commit

* chore: removed from_pt=True.
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: amyeroberts <aeroberts4444@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

a7eba831

TF: XLA beam search + most generation-compatible models are now also... · e6d27ca5

Joao Gante authored Jun 29, 2022

TF: XLA beam search + most generation-compatible models are now also XLA-generate-compatible (#17857)

* working beam search 🎉

* XLA generation compatible with ALL classes

* add xla generation slow test

e6d27ca5

28 Jun, 2022 1 commit

Adding GroupViT Models (#17313) · 6c8f4c9a

Jerry Jiarui XU authored Jun 28, 2022



* add group vit and fixed test (except slow)

* passing slow test

* addressed some comments

* fixed test

* fixed style

* fixed copy

* fixed segmentation output

* fixed test

* fixed relative path

* fixed copy

* add ignore non auto configured

* fixed docstring, add doc

* fixed copies

* Apply suggestions from code review

merge suggestions
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* resolve comment, renaming model

* delete unused attr

* use fix copies

* resolve comments

* fixed attn

* remove unused vars

* refactor tests

* resolve final comments

* add demo notebook

* fixed inconsitent default

* Apply suggestions from code review
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Apply suggestions from code review
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* rename stage->stages

* Create single GroupViTEncoderLayer class

* Update conversion script

* Simplify conversion script

* Remove cross-attention class in favor of GroupViTAttention

* Convert other model as well, add processor to conversion script

* addressing final comment

* fixed args

* Update src/transformers/models/groupvit/modeling_groupvit.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

6c8f4c9a

27 Jun, 2022 3 commits

fix (#17890) · 9a345384

Yih-Dar authored Jun 27, 2022


Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

9a345384

Add a TF in-graph tokenizer for BERT (#17701) · ee0d001d

Matt authored Jun 27, 2022

* Add a TF in-graph tokenizer for BERT

* Add from_pretrained

* Add proper truncation, option handling to match other tokenizers

* Add proper imports and guards

* Add test, fix all the bugs exposed by said test

* Fix truncation of paired texts in graph mode, more test updates

* Small fixes, add a (very careful) test for savedmodel

* Add tensorflow-text dependency, make fixup

* Update documentation

* Update documentation

* make fixup

* Slight changes to tests

* Add some docstring examples

* Update tests

* Update tests and add proper lowercasing/normalization

* make fixup

* Add docstring for padding!

* Mark slow tests

* make fixup

* Fall back to BertTokenizerFast if BertTokenizer is unavailable

* Fall back to BertTokenizerFast if BertTokenizer is unavailable

* make fixup

* Properly handle tensorflow-text dummies

ee0d001d

Fix TF GPT2 test_onnx_runtime_optimize (#17874) · 401fcca6
Yih-Dar authored Jun 27, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
401fcca6

24 Jun, 2022 7 commits

Fix `test_inference_instance_segmentation_head` (#17872) · b03be78a
Yih-Dar authored Jun 24, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
b03be78a
Skip `test_multi_gpu_data_parallel_forward` for `MaskFormer` (#17864) · 494aac65
Yih-Dar authored Jun 24, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
494aac65

Use higher value for hidden_size in Flax BigBird test (#17822) · 0e0f1f46

Yih-Dar authored Jun 24, 2022



* Use higher value for hidden_size in Flax BigBird test

* remove 5e-5
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

0e0f1f46

Add CodeGen model (#17443) · d6b6fb99

rooa authored Jun 24, 2022



* Add CodeGen model

* Add missing key and switch order of super()

* Fix torch.ones init with uint8 instead of bool

* Address comments: copy statements and doc

* update tests

* remove old model parallel

* fix batch gen tests

* fix batch gen test

* update test_gpt2_sample_max_time

* fix codgen test and revert gpt2 test change

* Fix incorrect tie_word_embedding value, typo, URL

* Fix model order in README and styling

* Reorder model list alphabetically

* Set tie_word_embedding to False by default

* Apply suggestions from code review

* Better attn mask name & remove attn masked_bias

* add tokenizer for codegen

* quality

* doc tokenizer

* fix-copies

* add CodeGenTokenizer in converter

* make truncation optional

* add test for truncation

* add copyright

* fix-copies

* fix fast tokenizer decode

* Update src/transformers/models/codegen/tokenization_codegen.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* increase vocab_size in tests
Co-authored-by: patil-suraj <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

d6b6fb99

Fix Splinter test (#17854) · 44749001
Yih-Dar authored Jun 24, 2022
```
* fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
44749001
[tests/VisionEncoderDecoder] import to_2tuple from test utils (#17865) · 73a0496c
Suraj Patil authored Jun 24, 2022

73a0496c

Improve vision models (#17731) · 09178705

NielsRogge authored Jun 24, 2022



* Improve vision models

* Add a lot of improvements

* Remove to_2tuple from swin tests

* Fix TF Swin

* Fix more tests

* Fix copies

* Improve more models

* Fix ViTMAE test

* Add channel check for TF models

* Add proper channel check for TF models

* Apply suggestion from code review

* Apply suggestions from code review

* Add channel check for Flax models, apply suggestion

* Fix bug

* Add tests for greyscale images

* Add test for interpolation of pos encodigns
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

09178705

23 Jun, 2022 2 commits

Nezha Pytorch implementation (#17776) · 7cf52a49

Sijun He authored Jun 24, 2022



* wip

* rebase

* all tests pass

* rebase

* ready for PR

* address comments

* fix styles

* add require_torch to pipeline test

* remove remote image to improve CI consistency

* address comments; fix tf/flax tests

* address comments; fix tf/flax tests

* fix tests; add alias

* repo consistency tests

* Update src/transformers/pipelines/visual_question_answering.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* address comments

* Update src/transformers/pipelines/visual_question_answering.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* merge

* wip

* wip

* wip

* most basic tests passes

* all tests pass now

* relative embedding

* wip

* running make fixup

* remove bert changes

* fix doc

* fix doc

* fix issues

* fix doc

* address comments

* fix CI

* remove redundant copied from

* address comments

* fix broken test
Co-authored-by: Sijun He <sijunhe@Sijuns-MacBook-Pro.local>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

7cf52a49

BLOOM minor changes on tokenizer (#17823) · 18c263c4

Younes Belkada authored Jun 23, 2022



* few fixes:

- hardcode tokenizer padding side
- remove unused args

* few fixes:

- added new attribute on TokenizerTesterMixin
- added new slow test
- remove unused arg on tokenizer class

* make style

* Update src/transformers/models/bloom/tokenization_bloom_fast.py
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>

* make quality

* apply changes

- remove new attribute
- redefine test on the class

* add comments
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>

18c263c4

21 Jun, 2022 4 commits

Add final_layer_norm to OPT model (#17785) · abc400b0

Thomas Wang authored Jun 21, 2022



* Add final_layer_norm to OPT model

* Add JAX and TF version

* Fix Keras name

* Woops

* Allow for non breaking change

* Apply suggestions from code review

* add tests
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

abc400b0

Use 5e-5 For BigBird PT/Flax equivalence tests (#17780) · f47afefb

Yih-Dar authored Jun 21, 2022



* rename to check_pt_flax_outputs

* update check_pt_flax_outputs

* use 5e-5 for BigBird PT/Flax test
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

f47afefb

Prepare transformers for v0.8.0 huggingface-hub release (#17716) · 6a5272b2

Lysandre Debut authored Jun 21, 2022



* Prepare CI for v0.8.0

* pin hfh (revert before merge)

* Revert "pin hfh (revert before merge)"

This reverts commit a0103140e1c77b810ffcb735192968bc03be3e1f.

* Test rc3

* Test latest rc

* Unpin to the RC
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>

6a5272b2

[ViTMAE] Fix docstrings and variable names (#17710) · b681e12d

NielsRogge authored Jun 21, 2022



* Fix docstrings and variable names

* Rename x to something better

* Improve messages

* Fix docstrings and add test for greyscale images
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

b681e12d

20 Jun, 2022 3 commits

Not use -1e4 as attn mask (#17306) · d3cb2888

Yih-Dar authored Jun 20, 2022



* Use torch.finfo(self.dtype).min

* for GPTNeoX

* for Albert

* For Splinter

* Update src/transformers/models/data2vec/modeling_data2vec_audio.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix -inf used in Bart-like models

* Fix a few remaining -inf

* more fix

* clean up

* For CLIP

* For FSMT

* clean up

* fix test

* Add dtype argument and use it for LayoutLMv3

* update FlaxLongT5Attention
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

d3cb2888

Fix cache for GPT-Neo-X (#17764) · fdb12080
Sylvain Gugger authored Jun 20, 2022
```
* Fix cache for GPT-Neo-X

* Add more tests
```
fdb12080
TF: BART compatible with XLA generation (#17479) · 132402d7
Joao Gante authored Jun 20, 2022
```
* Also propagate changes to blenderbot, blenderbot_small, marian, mbart, and pegasus
```
132402d7

14 Jun, 2022 4 commits

fix tolerance for a bloom slow test (#17634) · d453ea61
Younes Belkada authored Jun 14, 2022

d453ea61
[LongT5] disable model parallel test (#17702) · 120649bf
Suraj Patil authored Jun 14, 2022

120649bf

Add `BloomForSequenceClassification` and `BloomForTokenClassification` classes (#17639) · edb672ac

Hailey Schoelkopf authored Jun 14, 2022



* add new bloom classes

* (feat) add bloom classification tests; make style

* style: change import in test

* add some typehints to bloom classes

* merge main into branch

* fix: input checking in bloom seq classification

* fix tests

* change model class tests

* fix few tests

- more tests should pass
- one test left

* make token classifier return hidden states

* style: make BLOOM typehints consistent
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

edb672ac

[LongT5] Rename checkpoitns (#17700) · 53496ac5
Patrick von Platen authored Jun 14, 2022

53496ac5

13 Jun, 2022 1 commit

Add `LongT5` model (#16792) · a72f1c9f

Daniel Stancl authored Jun 13, 2022



* Initial commit

* Make some fixes

* Make PT model full forward pass

* Drop TF & Flax implementation, fix copies etc

* Add Flax model and update some corresponding stuff

* Drop some TF things

* Update config and flax local attn

* Add encoder_attention_type to config

* .

* Update docs

* Do some cleansing

* Fix some issues -> make style; add some docs

* Fix position_bias + mask addition + Update tests

* Fix repo consistency

* Fix model consistency by removing flax operation over attn_mask

* [WIP] Add PT TGlobal LongT5

* .

* [WIP] Add flax tglobal model

* [WIP] Update flax model to use the right attention type in the encoder

* Fix flax tglobal model forward pass

* Make the use of global_relative_attention_bias

* Add test suites for TGlobal model

* Fix minor bugs, clean code

* Fix pt-flax equivalence though not convinced with correctness

* Fix LocalAttn implementation to match the original impl. + update READMEs

* Few updates

* Update: [Flax] improve large model init and loading #16148

* Add ckpt conversion script accoring to #16853 + handle torch device placement

* Minor updates to conversion script.

* Typo: AutoModelForSeq2SeqLM -> FlaxAutoModelForSeq2SeqLM

* gpu support + dtype fix

* Apply some suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* * Remove (de)parallelize stuff
* Edit shape comments
* Update README.md
* make fix-copies

* Remove caching logic for local & tglobal attention

* Apply another batch of suggestions from code review

* Add missing checkpoints
* Format converting scripts
* Drop (de)parallelize links from longT5 mdx

* Fix converting script + revert config file change

* Revert "Remove caching logic for local & tglobal attention"

This reverts commit 2a619828f6ddc3e65bd9bb1725a12b77fa883a46.

* Stash caching logic in Flax model

* Make side relative bias used always

* Drop caching logic in PT model

* Return side bias as it was

* Drop all remaining model parallel logic

* Remove clamp statements

* Move test files to the proper place

* Update docs with new version of hf-doc-builder

* Fix test imports

* Make some minor improvements

* Add missing checkpoints to docs
* Make TGlobal model compatible with torch.onnx.export
* Replace some np.ndarray with jnp.ndarray

* Fix TGlobal for ONNX conversion + update docs

* fix _make_global_fixed_block_ids and masked neg  value

* update flax model

* style and quality

* fix imports

* remove load_tf_weights_in_longt5 from init and fix copies

* add slow test for TGlobal model

* typo fix

* Drop obsolete is_parallelizable and one warning

* Update __init__ files to fix repo-consistency

* fix pipeline test

* Fix some device placements

* [wip]: Update tests -- need to generate summaries to update expected_summary

* Fix quality

* Update LongT5 model card

* Update (slow) summarization tests

* make style

* rename checkpoitns

* finish

* fix flax tests
Co-authored-by: phungvanduy <pvduy23@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: patil-suraj <surajp815@gmail.com>

a72f1c9f

10 Jun, 2022 6 commits
- Avoid GPU OOM for a TF Rag test (#17638) · 224bde91
  Yih-Dar authored Jun 10, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  224bde91
- [Data2Vec] Speed up test (#17660) · b4eef63a
  Patrick von Platen authored Jun 10, 2022
  
  b4eef63a
- [BigBirdFlaxTests] Make tests slow (#17658) · 5e428b71
  Patrick von Platen authored Jun 10, 2022
```
* [BigBirdFlaxTests] Make tests slow

* up

* correct black with new version
```
  5e428b71
- 🐛 Properly raise `RepoNotFoundError` when not authenticated (#17651) · c99ddcc4
  Simon Brandeis authored Jun 10, 2022
```
* Raise RepoNotFoundError in case of 401

* Include changes from revert-17646-skip_repo_not_found

* Add a comment

* 💄 Code quality

* 💚 Update `get_from_cache` test

* 💚 Code quality & skip failing test
```
  c99ddcc4
- Add skip logic for attentions test - Levit (#17633) · fd1e6703
  amyeroberts authored Jun 10, 2022
  
  fd1e6703
- Skip tests until bug is fixed. (#17646) · af4a1eca
  Sylvain Gugger authored Jun 09, 2022
  
  af4a1eca
09 Jun, 2022 2 commits

BLOOM (#17474) · ca2a55e9

Younes Belkada authored Jun 09, 2022



* adding template

* update model

* model update

* update conf for debug model

* update conversion

* update conversion script

* update conversion script

* fix missing keys check

* add tests to test the tokenizer in the local machine

* Change variable name

* add tests on xnli dataset

* add more description

* add descriptions + clearer code

* clearer code

* adding new tests + skipping few tests because of env problems

* change comment

* add dtype on the configuration

* add test embeddings

* add hardcoded test

* fix dtype issue

* adding torch.float16 to config

* adding more metrics (min, max, mean)

* add sum

* now the test passes with almost equal

* add files for conversion - test passes on cpu  gpu

* add final changes

* cleaning code

* add new args in the docstring

* fix one liner function

* remove macros

* remove forward attention

* clean up init funtion

* add comments on the issue

* rm scale mask softmax

* do make style

* fix dtype in init

* fixing for loop on att probs

* fix style with black

* fix style + doc error

* fix and debug CI errors (docs + style)

* some updates

- change new operations
- finally add scaled softmax
- added new args in the config

* make use cache working

* add changes

- save sharded models
- final changes on the modeling script

* add changes

- comment on alibi
- add TODO on seq length

* test commit

- added a text to test the commit
Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>

* final changes

- attention mask change
- generation works on BS176b
Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>

* changes - model + conversion

* move to correct dir

* put ,

* fex fixes

* fix tokenizer autodoc

* fix minor CI issues

* fix minor CI issues

* fix minor CI issues

* fix style issue

* fix minor import issues

* fix few issues

* remove def main on the test

* add require torch

* replace decorator with 'with'

* fix style

* change to bloom

* add quick fix tokenizer

* fix tokenizer file

* fix tokenizer

- merge tests
- small fixes

* fix import issue

* add bloom to readme

* fix consistency

* Update docs/source/en/model_doc/bloom.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply suggestions from code review

fix comment issues on file headers
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix doc issue

* small fix - modeling test

* some changes

- refactor some code
- taking into account reviews
- more tests should pass
- removed pruning tests

* remove useless division

* more tests should pass

* more tests should pass

* more tests should pass

* let's try this one

-add alibi offset
- remove all permutes to make the grad operations work
- finger crossed

* refactor

- refactor code
- style changes
- add new threshold for test

* major changes

- change BLOOM to Bloom
- add quick doc on bloom.mdx
- move embeddings test on modeling test

* modify readme

* small fixes

* small fix

- better threshold for a test

* remove old test file from fetcher

* fix small typo

* major change

- change BloomLMHead to BloomForCausalLM

* remove onnx config

* major changes

- refactor the code
- remove asserts
- change tol for test

* make style

* small change

* adding a slow test + commenting old ones for now

* make style

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* make style

* fix duplicates

* cleaning comments on config

* clean a bit conversion file

* refacor a bit modeling file

* refactor tokenizer file

* fix tokenization test issue

* fix tokenization issue #2

* fix tokenization issue second try

* fix test issue

* make style + add suggestions

* change test fetcher

* try this one

- slow tests should pass
- finger crossed

* possible final changes

* make style

* try fix padding side issue

* fix side

* fix padding issue

* fix ko-readme

* fix config auto

* cleaning modeling file

* keep bloom in caps in ko

* update config docs

* remove pretraining_pp

* remove model parallel

* update config

- add correct config files

* fix duplicates

* fix fetcher

* fix refactor issue

- remove divide function

* try to remove alibi

* small fixes

- fix alibi
- remove seq length
- refactor a bit the code

* put correct values

- fix bos and eos token ids

* fix attention mask loop
Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>

* small fixes:

- remove skip bias add

* small fixes

- fix typo in readme
- fix typos in config

* small changes

- remove a test
- add reconstruction test
- change config

* small changes

- change Scaled Softmax to BloomScaledSoftmax

* small fixes

- fix alibi dtype

* major changes

- removing explicit dtype when loading modules
- fixing test args (torch_dtype=auto)
- add dosctring

* fix readmes

* major changes

- now bloom supports alibi shifting
- refactor a bit the code
- better test tolerance now

* refactor a bit

* refactor a bit

* put correct name on test

* change docstring

* small changes

- fix docstring modeling
- fix test tolerance

* fix small nit

- take dtype from tensors in the conversion script

* minor fix

- fix mdx issue

* minor fix

- change config docstring

* forward contrib credits from PR14084

* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* apply modifications
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* resolve softmax upcast

* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* final changes modeling
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Merge commit 'd156898f

'

* merge commit

* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* apply suggestions

Apply suggestions from Stas comments
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Fix gradient checkpointing
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* add slow but exact

* add accelerate compatibility
Co-authored-by: Nicolas Patry <Narsil@users.noreply.github.com>

* forward contrib credits
Co-authored-by: thomasw21 <thomasw21@users.noreply.github.com>
Co-authored-by: sgugger <sgugger@users.noreply.github.com>
Co-authored-by: patrickvonplaten <patrickvonplaten@users.noreply.github.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: LysandreJik <LysandreJik@users.noreply.github.com>

* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix torch device on tests

* make style

* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix nits

Co-authored-by: patrickvonplaten<patrickvonplaten@users.noreply.github.com>

* remove final nits

* fix doc

- add more details on the doc
- add links to checkpoints

* Update src/transformers/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* apply suggestions
Co-authored-by: sgugger <sgugger@users.noreply.github.com>

* put test torchscript to false

* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: justheuristic <justheuristic@gmail.com>

* fix alibi

- create alibi only once

* add small doc

* make quality

* replace torch.nn

* remove token type emb

* fix fused op + output bias

* add fused op

- now can control fused operation from config

* remove fused op

* make quality

* small changes

- remove unsed args on config
- removed bias gelu file
- make the model torchscriptable
- add torchscript slow tests

* Update src/transformers/models/bloom/modeling_bloom.py

* fix slow

* make style

* add accelerate support

* add bloom to deepspeed tests

* minor changes

* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* minor change

* slow tests pass

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/en/model_doc/bloom.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* minor changes:

- change docstring
- add link to paper
Co-authored-by: Thomwolf <thomwolf@gmail.com>
Co-authored-by: Thomas Wolf <thomas@huggingface.co>
Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: sIncerass <sheng.s@berkeley.edu>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Nicolas Patry <Narsil@users.noreply.github.com>
Co-authored-by: thomasw21 <thomasw21@users.noreply.github.com>
Co-authored-by: sgugger <sgugger@users.noreply.github.com>
Co-authored-by: patrickvonplaten <patrickvonplaten@users.noreply.github.com>
Co-authored-by: LysandreJik <LysandreJik@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: justheuristic <justheuristic@gmail.com>
Co-authored-by: Stas Bekman <stas@stason.org>

ca2a55e9

has_attentions - consistent test skipping logic and tf tests (#17495) · dfc76b25
amyeroberts authored Jun 09, 2022

dfc76b25

08 Jun, 2022 1 commit
- fix `train_new_from_iterator` in the case of byte-level tokenizers (#17549) · ae7bae8f
  SaulLu authored Jun 08, 2022
  
  ae7bae8f