Commits · 6c8f4c9a938a09749ea1b19a5fa2a8dd27e99a29 · chenpangpang / transformers

28 Jun, 2022 5 commits

Adding GroupViT Models (#17313) · 6c8f4c9a

Jerry Jiarui XU authored Jun 28, 2022



* add group vit and fixed test (except slow)

* passing slow test

* addressed some comments

* fixed test

* fixed style

* fixed copy

* fixed segmentation output

* fixed test

* fixed relative path

* fixed copy

* add ignore non auto configured

* fixed docstring, add doc

* fixed copies

* Apply suggestions from code review

merge suggestions
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* resolve comment, renaming model

* delete unused attr

* use fix copies

* resolve comments

* fixed attn

* remove unused vars

* refactor tests

* resolve final comments

* add demo notebook

* fixed inconsitent default

* Apply suggestions from code review
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Apply suggestions from code review
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* rename stage->stages

* Create single GroupViTEncoderLayer class

* Update conversion script

* Simplify conversion script

* Remove cross-attention class in favor of GroupViTAttention

* Convert other model as well, add processor to conversion script

* addressing final comment

* fixed args

* Update src/transformers/models/groupvit/modeling_groupvit.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

6c8f4c9a

Mrbean/codegen onnx (#17903) · b424f0b4
mrbean authored Jun 28, 2022

b424f0b4
Add ONNX support for DETR (#17904) · 76d13de5
regisss authored Jun 28, 2022

76d13de5
Fix `test_number_of_steps_in_training_with_ipex` (#17889) · f717d47f
Yih-Dar authored Jun 28, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
f717d47f
Update expected values in constrained beam search tests (#17887) · 0b0dd977
Yih-Dar authored Jun 28, 2022
```
* fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
0b0dd977

27 Jun, 2022 3 commits

fix (#17890) · 9a345384

Yih-Dar authored Jun 27, 2022


Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

9a345384

Add a TF in-graph tokenizer for BERT (#17701) · ee0d001d

Matt authored Jun 27, 2022

* Add a TF in-graph tokenizer for BERT

* Add from_pretrained

* Add proper truncation, option handling to match other tokenizers

* Add proper imports and guards

* Add test, fix all the bugs exposed by said test

* Fix truncation of paired texts in graph mode, more test updates

* Small fixes, add a (very careful) test for savedmodel

* Add tensorflow-text dependency, make fixup

* Update documentation

* Update documentation

* make fixup

* Slight changes to tests

* Add some docstring examples

* Update tests

* Update tests and add proper lowercasing/normalization

* make fixup

* Add docstring for padding!

* Mark slow tests

* make fixup

* Fall back to BertTokenizerFast if BertTokenizer is unavailable

* Fall back to BertTokenizerFast if BertTokenizer is unavailable

* make fixup

* Properly handle tensorflow-text dummies

ee0d001d

Fix TF GPT2 test_onnx_runtime_optimize (#17874) · 401fcca6
Yih-Dar authored Jun 27, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
401fcca6

24 Jun, 2022 7 commits

Fix `test_inference_instance_segmentation_head` (#17872) · b03be78a
Yih-Dar authored Jun 24, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
b03be78a
Skip `test_multi_gpu_data_parallel_forward` for `MaskFormer` (#17864) · 494aac65
Yih-Dar authored Jun 24, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
494aac65

Use higher value for hidden_size in Flax BigBird test (#17822) · 0e0f1f46

Yih-Dar authored Jun 24, 2022



* Use higher value for hidden_size in Flax BigBird test

* remove 5e-5
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

0e0f1f46

Add CodeGen model (#17443) · d6b6fb99

rooa authored Jun 24, 2022



* Add CodeGen model

* Add missing key and switch order of super()

* Fix torch.ones init with uint8 instead of bool

* Address comments: copy statements and doc

* update tests

* remove old model parallel

* fix batch gen tests

* fix batch gen test

* update test_gpt2_sample_max_time

* fix codgen test and revert gpt2 test change

* Fix incorrect tie_word_embedding value, typo, URL

* Fix model order in README and styling

* Reorder model list alphabetically

* Set tie_word_embedding to False by default

* Apply suggestions from code review

* Better attn mask name & remove attn masked_bias

* add tokenizer for codegen

* quality

* doc tokenizer

* fix-copies

* add CodeGenTokenizer in converter

* make truncation optional

* add test for truncation

* add copyright

* fix-copies

* fix fast tokenizer decode

* Update src/transformers/models/codegen/tokenization_codegen.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* increase vocab_size in tests
Co-authored-by: patil-suraj <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

d6b6fb99

Fix Splinter test (#17854) · 44749001
Yih-Dar authored Jun 24, 2022
```
* fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
44749001
[tests/VisionEncoderDecoder] import to_2tuple from test utils (#17865) · 73a0496c
Suraj Patil authored Jun 24, 2022

73a0496c

Improve vision models (#17731) · 09178705

NielsRogge authored Jun 24, 2022



* Improve vision models

* Add a lot of improvements

* Remove to_2tuple from swin tests

* Fix TF Swin

* Fix more tests

* Fix copies

* Improve more models

* Fix ViTMAE test

* Add channel check for TF models

* Add proper channel check for TF models

* Apply suggestion from code review

* Apply suggestions from code review

* Add channel check for Flax models, apply suggestion

* Fix bug

* Add tests for greyscale images

* Add test for interpolation of pos encodigns
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

09178705

23 Jun, 2022 4 commits

Nezha Pytorch implementation (#17776) · 7cf52a49

Sijun He authored Jun 24, 2022



* wip

* rebase

* all tests pass

* rebase

* ready for PR

* address comments

* fix styles

* add require_torch to pipeline test

* remove remote image to improve CI consistency

* address comments; fix tf/flax tests

* address comments; fix tf/flax tests

* fix tests; add alias

* repo consistency tests

* Update src/transformers/pipelines/visual_question_answering.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* address comments

* Update src/transformers/pipelines/visual_question_answering.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* merge

* wip

* wip

* wip

* most basic tests passes

* all tests pass now

* relative embedding

* wip

* running make fixup

* remove bert changes

* fix doc

* fix doc

* fix issues

* fix doc

* address comments

* fix CI

* remove redundant copied from

* address comments

* fix broken test
Co-authored-by: Sijun He <sijunhe@Sijuns-MacBook-Pro.local>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

7cf52a49

Fix broken test for models with batchnorm (#17841) · 1a7ef334

Matt authored Jun 23, 2022

* Fix tests that broke when models used batchnorm

* Initializing the model twice does not actually...
...give you the same weights each time.
I am good at machine learning.

* Fix speed regression

1a7ef334

BLOOM minor changes on tokenizer (#17823) · 18c263c4

Younes Belkada authored Jun 23, 2022



* few fixes:

- hardcode tokenizer padding side
- remove unused args

* few fixes:

- added new attribute on TokenizerTesterMixin
- added new slow test
- remove unused arg on tokenizer class

* make style

* Update src/transformers/models/bloom/tokenization_bloom_fast.py
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>

* make quality

* apply changes

- remove new attribute
- redefine test on the class

* add comments
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>

18c263c4

Fix properties of unset special tokens in non verbose mode (#17797) · 3eed5530
Guillaume Klein authored Jun 23, 2022
```
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
```
3eed5530

22 Jun, 2022 2 commits
- Offload fixes (#17810) · df8e6804
  Sylvain Gugger authored Jun 22, 2022
```
* Offload fixes

* Add a test
```
  df8e6804
- Flax sharded (#17760) · 16c6eb7c
  Arthur authored Jun 22, 2022
  
  16c6eb7c
21 Jun, 2022 7 commits

Fix `top_k_top_p_filtering` having unexpected behavior (#17744) · 3b00b623

unifyh authored Jun 22, 2022

- Fix `top_k_top_p_filtering` not passing `filter_value` to
   `TopPLogitsWarper` causing any top-p filtered logits to be -inf
   instead of specified value

 - Add corresponding test

3b00b623

Add final_layer_norm to OPT model (#17785) · abc400b0

Thomas Wang authored Jun 21, 2022



* Add final_layer_norm to OPT model

* Add JAX and TF version

* Fix Keras name

* Woops

* Allow for non breaking change

* Apply suggestions from code review

* add tests
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

abc400b0

TF Sharded (#17713) · 7cced021

Arthur authored Jun 21, 2022



* initial commit

* update modeeling tf utils

* quality

* clean and update args

* update

* remove potential bug

* code quality

* update

* update max shard

* update tests for sharding from pretrained

* fix remaining test

* make style

* h5py if tf available

* update and fix test

* fix test

* style

* modified push to hub to support shard for TF

* quick fix

* update code

* merge branch main and style

* Apply suggestions from code review
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* update based on reviews

* update doc

* update and style

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update based on reviews

* fix typo

* style
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

7cced021

Use 5e-5 For BigBird PT/Flax equivalence tests (#17780) · f47afefb

Yih-Dar authored Jun 21, 2022



* rename to check_pt_flax_outputs

* update check_pt_flax_outputs

* use 5e-5 for BigBird PT/Flax test
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

f47afefb

Prepare transformers for v0.8.0 huggingface-hub release (#17716) · 6a5272b2

Lysandre Debut authored Jun 21, 2022



* Prepare CI for v0.8.0

* pin hfh (revert before merge)

* Revert "pin hfh (revert before merge)"

This reverts commit a0103140e1c77b810ffcb735192968bc03be3e1f.

* Test rc3

* Test latest rc

* Unpin to the RC
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>

6a5272b2

[ViTMAE] Fix docstrings and variable names (#17710) · b681e12d

NielsRogge authored Jun 21, 2022



* Fix docstrings and variable names

* Rename x to something better

* Improve messages

* Fix docstrings and add test for greyscale images
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

b681e12d

add onnx support for deberta and debertav2 (#17617) · eb16be41

mrbean authored Jun 21, 2022



* add onnx support for debertav2

* debertav2 -> deberta-v2 in onnx features file

* remove causal lm

* add deberta-v2-xlarge to onnx tests

* use self.type().dtype() in xsoftmax
Co-authored-by: Jingya HUANG <44135271+JingyaHuang@users.noreply.github.com>

* remove hack for deberta

* remove unused imports

* Update src/transformers/models/deberta_v2/configuration_deberta_v2.py
Co-authored-by: Jingya HUANG <44135271+JingyaHuang@users.noreply.github.com>

* use generate dummy inputs

* linter

* add imports

* add support for deberta v1 as well

* deberta does not support multiple choice

* Update src/transformers/models/deberta/configuration_deberta.py
Co-authored-by: Jingya HUANG <44135271+JingyaHuang@users.noreply.github.com>

* Update src/transformers/models/deberta_v2/configuration_deberta_v2.py
Co-authored-by: Jingya HUANG <44135271+JingyaHuang@users.noreply.github.com>

* one line ordered dict

* fire build
Co-authored-by: Jingya HUANG <44135271+JingyaHuang@users.noreply.github.com>

eb16be41

20 Jun, 2022 4 commits

Not use -1e4 as attn mask (#17306) · d3cb2888

Yih-Dar authored Jun 20, 2022



* Use torch.finfo(self.dtype).min

* for GPTNeoX

* for Albert

* For Splinter

* Update src/transformers/models/data2vec/modeling_data2vec_audio.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix -inf used in Bart-like models

* Fix a few remaining -inf

* more fix

* clean up

* For CLIP

* For FSMT

* clean up

* fix test

* Add dtype argument and use it for LayoutLMv3

* update FlaxLongT5Attention
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

d3cb2888

Fix cache for GPT-Neo-X (#17764) · fdb12080
Sylvain Gugger authored Jun 20, 2022
```
* Fix cache for GPT-Neo-X

* Add more tests
```
fdb12080
deprecate is_torch_bf16_available (#17738) · a2d34b7c
Stas Bekman authored Jun 20, 2022
```
* deprecate is_torch_bf16_available

* address suggestions
```
a2d34b7c
TF: BART compatible with XLA generation (#17479) · 132402d7
Joao Gante authored Jun 20, 2022
```
* Also propagate changes to blenderbot, blenderbot_small, marian, mbart, and pegasus
```
132402d7

16 Jun, 2022 1 commit
- Refine Bf16 test for deepspeed (#17734) · 36d46479
  Sylvain Gugger authored Jun 16, 2022
```
* Refine BF16 check in CPU/GPU

* Fixes

* Renames
```
  36d46479
14 Jun, 2022 5 commits

fix tolerance for a bloom slow test (#17634) · d453ea61
Younes Belkada authored Jun 14, 2022

d453ea61
[LongT5] disable model parallel test (#17702) · 120649bf
Suraj Patil authored Jun 14, 2022

120649bf

Add `BloomForSequenceClassification` and `BloomForTokenClassification` classes (#17639) · edb672ac

Hailey Schoelkopf authored Jun 14, 2022



* add new bloom classes

* (feat) add bloom classification tests; make style

* style: change import in test

* add some typehints to bloom classes

* merge main into branch

* fix: input checking in bloom seq classification

* fix tests

* change model class tests

* fix few tests

- more tests should pass
- one test left

* make token classifier return hidden states

* style: make BLOOM typehints consistent
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

edb672ac

[LongT5] Rename checkpoitns (#17700) · 53496ac5
Patrick von Platen authored Jun 14, 2022

53496ac5

Extend Transformers Trainer Class to Enable PyTorch Torchscript for Inference (#17153) · 3b29c9fd

jianan-gu authored Jun 14, 2022



* add jit mode option and model wrap

* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* refine code

* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add ut and refine code

* code refine

* refine code

* add inference doc

* Update src/transformers/trainer.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update src/transformers/trainer.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* add cpu inference performance doc

* Update perf_infer_cpu.mdx

* Update perf_infer_cpu.mdx

* Update performance.mdx

* Update _toctree.yml

* refine jit func naming

* Update _toctree.yml

* Delete perf_infer_gpu_one.mdx

* Update perf_infer_cpu.mdx

* Update docs/source/en/perf_infer_cpu.mdx
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* add none check before jit

* Update docs/source/en/perf_infer_cpu.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/en/perf_infer_cpu.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

3b29c9fd

13 Jun, 2022 2 commits

Add `LongT5` model (#16792) · a72f1c9f

Daniel Stancl authored Jun 13, 2022



* Initial commit

* Make some fixes

* Make PT model full forward pass

* Drop TF & Flax implementation, fix copies etc

* Add Flax model and update some corresponding stuff

* Drop some TF things

* Update config and flax local attn

* Add encoder_attention_type to config

* .

* Update docs

* Do some cleansing

* Fix some issues -> make style; add some docs

* Fix position_bias + mask addition + Update tests

* Fix repo consistency

* Fix model consistency by removing flax operation over attn_mask

* [WIP] Add PT TGlobal LongT5

* .

* [WIP] Add flax tglobal model

* [WIP] Update flax model to use the right attention type in the encoder

* Fix flax tglobal model forward pass

* Make the use of global_relative_attention_bias

* Add test suites for TGlobal model

* Fix minor bugs, clean code

* Fix pt-flax equivalence though not convinced with correctness

* Fix LocalAttn implementation to match the original impl. + update READMEs

* Few updates

* Update: [Flax] improve large model init and loading #16148

* Add ckpt conversion script accoring to #16853 + handle torch device placement

* Minor updates to conversion script.

* Typo: AutoModelForSeq2SeqLM -> FlaxAutoModelForSeq2SeqLM

* gpu support + dtype fix

* Apply some suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* * Remove (de)parallelize stuff
* Edit shape comments
* Update README.md
* make fix-copies

* Remove caching logic for local & tglobal attention

* Apply another batch of suggestions from code review

* Add missing checkpoints
* Format converting scripts
* Drop (de)parallelize links from longT5 mdx

* Fix converting script + revert config file change

* Revert "Remove caching logic for local & tglobal attention"

This reverts commit 2a619828f6ddc3e65bd9bb1725a12b77fa883a46.

* Stash caching logic in Flax model

* Make side relative bias used always

* Drop caching logic in PT model

* Return side bias as it was

* Drop all remaining model parallel logic

* Remove clamp statements

* Move test files to the proper place

* Update docs with new version of hf-doc-builder

* Fix test imports

* Make some minor improvements

* Add missing checkpoints to docs
* Make TGlobal model compatible with torch.onnx.export
* Replace some np.ndarray with jnp.ndarray

* Fix TGlobal for ONNX conversion + update docs

* fix _make_global_fixed_block_ids and masked neg  value

* update flax model

* style and quality

* fix imports

* remove load_tf_weights_in_longt5 from init and fix copies

* add slow test for TGlobal model

* typo fix

* Drop obsolete is_parallelizable and one warning

* Update __init__ files to fix repo-consistency

* fix pipeline test

* Fix some device placements

* [wip]: Update tests -- need to generate summaries to update expected_summary

* Fix quality

* Update LongT5 model card

* Update (slow) summarization tests

* make style

* rename checkpoitns

* finish

* fix flax tests
Co-authored-by: phungvanduy <pvduy23@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: patil-suraj <surajp815@gmail.com>

a72f1c9f

Add Visual Question Answering (VQA) pipeline (#17286) · 66336dc1

Sijun He authored Jun 13, 2022



* wip

* rebase

* all tests pass

* rebase

* ready for PR

* address comments

* fix styles

* add require_torch to pipeline test

* remove remote image to improve CI consistency

* address comments; fix tf/flax tests

* address comments; fix tf/flax tests

* fix tests; add alias

* repo consistency tests

* Update src/transformers/pipelines/visual_question_answering.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* address comments

* Update src/transformers/pipelines/visual_question_answering.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* merge

* Update src/transformers/models/auto/modeling_auto.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* merge
Co-authored-by: Sijun He <sijunhe@Sijuns-MacBook-Pro.local>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

66336dc1