Commits · 31cfcbd3e291b7558a9857843a6caccd13fd26a6 · chenpangpang / transformers

15 Jul, 2021 3 commits
- [doc] performance: batch sizes (#12725) · 31cfcbd3
  Stas Bekman authored Jul 15, 2021
  
  31cfcbd3
- [doc] parallelism: Which Strategy To Use When (#12712) · 68605e9d
  Stas Bekman authored Jul 15, 2021
  
  68605e9d
- Remove framework mention (#12731) · eb4d7ef9
  Lysandre Debut authored Jul 15, 2021
  
  eb4d7ef9
14 Jul, 2021 1 commit
- non-native optimizers are mostly ok with zero-offload (#12690) · 5dd0c956
  Stas Bekman authored Jul 13, 2021
  
  5dd0c956
13 Jul, 2021 3 commits

[Deepspeed] adapt multiple models, add zero_to_fp32 tests (#12477) · 78f5fe14

Stas Bekman authored Jul 13, 2021



* zero_to_fp32 tests

* args change

* remove unnecessary work

* use transformers.trainer_utils.get_last_checkpoint

* document the new features

* cleanup

* wip

* fix fsmt

* add bert

* cleanup

* add xlm-roberta

* electra works

* cleanup

* sync

* split off the model zoo tests

* cleanup

* cleanup

* cleanup

* cleanup

* reformat

* cleanup

* casing

* deepspeed>=0.4.3

* adjust distilbert

* Update docs/source/main_classes/deepspeed.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

78f5fe14

[Flax Generation] Correct inconsistencies PyTorch/Flax (#12662) · cee2d213

Patrick von Platen authored Jul 13, 2021



* fix_torch_device_generate_test

* remove @

* correct greedy search

* save intertmed

* add final logits bias

* correct

* up

* add more tests

* fix another bug

* finish tests

* finish marian tests

* up
Co-authored-by: Patrick von Platen <patrick@huggingface.co>

cee2d213

Wrong model is used in example, should be character instead of subword model (#12676) · 9519f0cd

Jeroen Steggink authored Jul 13, 2021



* Wrong model is used, should be character instead of subword

In the original Google repo for CANINE there was mixup in the model names in the README.md, which was fixed 2 weeks ago. Since this transformer model was created before, it probably resulted in wrong use in this example.

s = subword, c = character

* canine.rst style fix

* Update docs/source/model_doc/canine.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Styling canine.rst

* Added links to model cards.

* Fixed links to model cards.
Co-authored-by: Jeroen Steggink <978411+jsteggink@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

9519f0cd

12 Jul, 2021 2 commits
- fixed docs (#12646) · b90d4993
  Ahmed Khaled authored Jul 12, 2021
  
  b90d4993
- remove documentation (#12657) · da0e9ee6
  Philipp Schmid authored Jul 12, 2021
  
  da0e9ee6
10 Jul, 2021 2 commits

fix anchor (#12620) · 9ee66ada
Stas Bekman authored Jul 09, 2021

9ee66ada

[doc] DP/PP/TP/etc parallelism (#12524) · 0dcc3c86

Stas Bekman authored Jul 09, 2021



* wip

* complete the doc

* missing img

* improve

* correction

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

0dcc3c86

09 Jul, 2021 4 commits

Add TFHubertModel (#12206) · fb65f65e

Will Rice authored Jul 09, 2021

* TFHubert

* Update with TFWav2Vec Bug Fixes

* Add OOV Error

* Feedback changes

* Fix kwargs call

fb65f65e

[FLax] Fix marian docs 2 (#12615) · 934222e3
Patrick von Platen authored Jul 09, 2021
```
* fix_torch_device_generate_test

* remove @

* up
```
934222e3
[Flax Marian] Add marian flax example (#12614) · 165606e5
Patrick von Platen authored Jul 09, 2021
```
* fix_torch_device_generate_test

* remove @

* finish better examples for marian flax
```
165606e5

[Flax] Add flax marian (#12595) · 65e27215

Patrick von Platen authored Jul 09, 2021



* fix_torch_device_generate_test

* remove @

* add marian

* finish make style

* add model

* add docs

* add test

* add integration tests

* up

* solve bug

* correct tests

* correct some tests

* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* correct adapt marian

* finish
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

65e27215

08 Jul, 2021 2 commits

[RFC] Laying down building stone for more flexible ONNX export capabilities (#11786) · 2aa3cd93

Funtowicz Morgan authored Jul 08, 2021

* Laying down building stone for more flexible ONNX export capabilities

* Ability to provide a map of config key to override before exporting.

* Makes it possible to export BART with/without past keys.

* Supports simple mathematical syntax for OnnxVariable.repeated

* Effectively apply value override from onnx config for model

* Supports export with additional features such as with-past for seq2seq

* Store the output path directly in the args for uniform usage across.

* Make BART_ONNX_CONFIG_* constants and fix imports.

* Support BERT model.

* Use tokenizer for more flexibility in defining the inputs of a model.

* Add TODO as remainder to provide the batch/sequence_length as CLI args

* Enable optimizations to be done on the model.

* Enable GPT2 + past

* Improve model validation with outputs containing nested structures

* Enable Roberta

* Enable Albert

* Albert requires opset >= 12

* BERT-like models requires opset >= 12

* Remove double printing.

* Enable XLM-Roberta

* Enable DistilBERT

* Disable optimization by default

* Fix missing setattr when applying optimizer_features

* Add value field to OnnxVariable to define constant input (not from tokenizers)

* Add T5 support.

* Simplify model type retrieval

* Example exporting token_classification pipeline for DistilBERT.

* Refactoring to package `transformers.onnx`

* Solve circular dependency & __main__

* Remove unnecessary imports in `__init__`

* Licences

* Use @Narsil's suggestion to forward the model's configuration to the ONNXConfig to avoid interpolation.

* Onnx export v2 fixes (#12388)

* Tiny fixes
Remove `convert_pytorch` from onnxruntime-less runtimes
Correct reference to model

* Style

* Fix Copied from

* LongFormer ONNX config.

* Removed optimizations

* Remvoe bad merge relicas.

* Remove unused constants.

* Remove some deleted constants from imports.

* Fix unittest to remove usage of PyTorch model for onnx.utils.

* Fix distilbert export

* Enable ONNX export test for supported model.

* Style.

* Fix lint.

* Enable all supported default models.

* GPT2 only has one output

* Fix bad property name when overriding config.

* Added unittests and docstrings.

* Disable with_past tests for now.

* Enable outputs validation for default export.

* Remove graph opt lvls.

* Last commit with on-going past commented.

* Style.

* Disabled `with_past` for now

* Remove unused imports.

* Remove framework argument

* Remove TFPreTrainedModel reference

* Add documentation

* Add onnxruntime tests to CircleCI

* Add test

* Rename `convert_pytorch` to `export`

* Use OrderedDict for dummy inputs

* WIP Wav2Vec2

* Revert "WIP Wav2Vec2"

This reverts commit f665efb04c92525c3530e589029f0ae7afdf603e.

* Style

* Use OrderedDict for I/O

* Style.

* Specify OrderedDict documentation.

* Style :)
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

2aa3cd93

Init pickle (#12567) · 0a6b9048

Sylvain Gugger authored Jul 08, 2021

* Try to pickle transformers

* Deal with special objs better

* Make picklable

0a6b9048

07 Jul, 2021 1 commit

[Flax] Add FlaxMBart (#12236) · 61400e1e

Daniel Stancl authored Jul 07, 2021



* Copy BART to MBart and rename some stuff

* Add copy statements pointing to FlaxBart

* Update/add some common files

* Update shift_tokens_rigth + fix imports

* Fix shift_tokens_right method according to MBart implementation

* Update shift_tokens_right in tests accordingly

* Fix the import issue and update docs file
* make style quality

* Do some minor changes according to patil-suraj suggestions

* Change the order of normalization layer and attention

* Add some copu statementes

* Update generate method and add integration test for mBart

* Make a few updates after a review

Besides, add `lang_code_to_id` to MBartTokenizeFast

* fix-copies; make style quality

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

* fix output type, style

* add copied from

* resolve conflicts
Co-authored-by: Suraj Patil <surajp815@gmail.com>

61400e1e

06 Jul, 2021 2 commits

FlaxGPTNeo (#12493) · 7a259c19

Suraj Patil authored Jul 06, 2021

* flax gpt neo

* fix query scaling

* update generation test

* use flax model for test

7a259c19

[RoFormer] Fix some issues (#12397) · 626a0a01

yujun authored Jul 06, 2021



* add RoFormerTokenizerFast into AutoTokenizer

* fix typo in roformer docs

* make onnx export happy

* update RoFormerConfig embedding_size

* use jieba not rjieba

* fix 12244 and make test_alignement passed

* update ARCHIVE_MAP

* make style & quality & fixup

* update

* make style & quality & fixup

* make style quality fixup

* update

* suggestion from LysandreJik
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* make style

* use rjieba
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

626a0a01

30 Jun, 2021 3 commits

[Flax] Add wav2vec2 (#12271) · 0d1f67e6

Patrick von Platen authored Jun 30, 2021



* fix_torch_device_generate_test

* remove @

* start flax wav2vec2

* save intermediate

* forward pass has correct shape

* add weight norm

* add files

* finish ctc

* make style

* finish gumbel quantizer

* correct docstrings

* correct some more files

* fix vit

* finish quality

* correct tests

* correct docstring

* correct tests

* start wav2vec2 pretraining script

* save intermediate

* start pretraining script

* finalize pretraining script

* finish

* finish

* small typo

* finish

* correct

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* make style

* push
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>

0d1f67e6

Document patch release v4.8.2 · 89073a95
Lysandre authored Jun 30, 2021

89073a95

Add CANINE (#12024) · 6e685978

NielsRogge authored Jun 30, 2021



* First pass

* More progress

* Add support for local attention

* More improvements

* More improvements

* Conversion script working

* Add CanineTokenizer

* Make style & quality

* First draft of integration test

* Remove decoder test

* Improve tests

* Add documentation

* Mostly docs improvements

* Add CanineTokenizer tests

* Fix most tests on GPU, improve upsampling projection

* Address most comments by @dhgarrette

* Remove decoder logic

* Improve Canine tests, improve docs of CanineConfig

* All tokenizer tests passing

* Make fix-copies and fix tokenizer tests

* Fix test_model_outputs_equivalence test

* Apply suggestions from @sgugger's review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Address some more comments

* Add support for hidden_states and attentions of shallow encoders

* Define custom CanineModelOutputWithPooling, tests pass

* First pass

* More progress

* Add support for local attention

* More improvements

* More improvements

* Conversion script working

* Add CanineTokenizer

* Make style & quality

* First draft of integration test

* Remove decoder test

* Improve tests

* Add documentation

* Mostly docs improvements

* Add CanineTokenizer tests

* Fix most tests on GPU, improve upsampling projection

* Address most comments by @dhgarrette

* Remove decoder logic

* Improve Canine tests, improve docs of CanineConfig

* All tokenizer tests passing

* Make fix-copies and fix tokenizer tests

* Fix test_model_outputs_equivalence test

* Apply suggestions from @sgugger's review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Address some more comments

* Make conversion script work for Canine-c too

* Fix tokenizer tests

* Remove file
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

6e685978

29 Jun, 2021 1 commit

[models] respect dtype of the model when instantiating it (#12316) · 7682e977

Stas Bekman authored Jun 28, 2021



* [models] respect dtype of the model when instantiating it

* cleanup

* cleanup

* rework to handle non-float dtype

* fix

* switch to fp32 tiny model

* improve

* use dtype.is_floating_point

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix the doc

* recode to use explicit torch_dtype_auto_detect, torch_dtype args

* docs and tweaks

* docs and tweaks

* docs and tweaks

* merge 2 args, add docs

* fix

* fix

* better doc

* better doc
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

7682e977

25 Jun, 2021 1 commit
- remove extra white space from log format (#12360) · 4a872cae
  Stas Bekman authored Jun 25, 2021
  
  4a872cae
24 Jun, 2021 1 commit
- Document patch release v4.8.1 · 5b1b5635
  Sylvain Gugger authored Jun 24, 2021
  
  5b1b5635
23 Jun, 2021 4 commits

[Deepspeed] new docs (#12077) · 07ae6103

Stas Bekman authored Jun 23, 2021



* document sub_group_size

* style

* install + issues reporting

* style

* style

* Update docs/source/main_classes/deepspeed.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* indent 4

* restore

* style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

07ae6103

v4.9.0.dev0 · 2150dfed
Sylvain Gugger authored Jun 23, 2021

2150dfed
Add mention of the huggingface_hub methods for offline mode (#12320) · ef3dceff
Lysandre Debut authored Jun 23, 2021

ef3dceff

Flax T5 (#12150) · e98233dd

Vasudev Gupta authored Jun 23, 2021



* copy pytorch-t5

* init

* boom boom

* forward pass same

* make generation work

* add more tests

* make test work

* finish normal tests

* make fix-copies

* finish quality

* correct slow example

* correct slow test

* version table

* upload models

* Update tests/test_modeling_flax_t5.py

* correct incorrectly deleted line
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick@huggingface.co>

e98233dd

22 Jun, 2021 5 commits

[docs] performance (#12258) · bfd5da8e

Stas Bekman authored Jun 22, 2021



* initial performance document

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* rewrites based on suggestions

* 8x multiple is for AMP only

* add contribute section
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

bfd5da8e

[trainer] 2 bug fixes and a rename (#12309) · ebe54135
Stas Bekman authored Jun 22, 2021
```
* bug fixes and a rename

* add extended DDP test
```
ebe54135
add FlaxAutoModelForImageClassification in main init (#12298) · 1498eb98
Suraj Patil authored Jun 22, 2021

1498eb98
[tests] multiple improvements (#12294) · 0d97ba8a
Stas Bekman authored Jun 21, 2021
```
* [tests] multiple improvements

* cleanup

* style

* todo to investigate

* fix
```
0d97ba8a

[trainer + examples] set log level from CLI (#12276) · dad414d5

Stas Bekman authored Jun 21, 2021



* set log level from CLI

* add log_level_replica + test + extended docs

* cleanup

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* rename datasets objects to allow datasets module

* improve the doc

* style

* doc improve
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

dad414d5

18 Jun, 2021 2 commits

[t5 doc] make the example work out of the box (#12239) · 2e5dbdf2

Stas Bekman authored Jun 18, 2021



* [run_clm.py] restore caching

* style

* [t5 doc] make the example work out of the box

This PR expands the training example to include the correct model type for the example to work, e.g. with `T5Model` this example will break.

* Update docs/source/model_doc/t5.rst
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* expand the other example
Co-authored-by: Suraj Patil <surajp815@gmail.com>

2e5dbdf2

[Flax] FlaxAutoModelForSeq2SeqLM (#12228) · f74655cd
Suraj Patil authored Jun 18, 2021
```
* add FlaxAutoModelForSeq2SeqLM
```
f74655cd

17 Jun, 2021 3 commits
- Docs for v4.8.0 · 0daadc19
  Lysandre authored Jun 17, 2021
  
  0daadc19
- Release: v4.7.0 · 7a6c9fab
  Lysandre authored Jun 17, 2021
  
  7a6c9fab
- Add link to the course (#12229) · afdd9e36
  Sylvain Gugger authored Jun 17, 2021
  
  afdd9e36