Commits · 0d97ba8a9846b4856f8920bf0c955dbce91e8b4f · chenpangpang / transformers

22 Jun, 2021 2 commits

[tests] multiple improvements (#12294) · 0d97ba8a
Stas Bekman authored Jun 21, 2021
```
* [tests] multiple improvements

* cleanup

* style

* todo to investigate

* fix
```
0d97ba8a

[trainer + examples] set log level from CLI (#12276) · dad414d5

Stas Bekman authored Jun 21, 2021



* set log level from CLI

* add log_level_replica + test + extended docs

* cleanup

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* rename datasets objects to allow datasets module

* improve the doc

* style

* doc improve
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

dad414d5

21 Jun, 2021 4 commits
- reset report_to to none, avoid deprecation warning (#12293) · a4ed074d
  Stas Bekman authored Jun 21, 2021
  
  a4ed074d
- [Flax] Fix flax test save pretrained (#12256) · 4e9a6796
  Patrick von Platen authored Jun 21, 2021
```
* fix_torch_device_generate_test

* remove @

* fix flax save pretrained test
```
  4e9a6796
- [Flax] [WIP] allow loading head model with base model weights (#12255) · eb881674
  Suraj Patil authored Jun 21, 2021
```
* boom boom

* remove flax clip example

* allow loading head model with base model weights

* add test

* fix imports

* disable save, load test for clip

* add test_save_load_to_base
```
  eb881674
- [FlaxClip] fix test from/save pretrained test (#12284) · 8d5b7f36
  Suraj Patil authored Jun 21, 2021
```
* boom boom

* remove flax clip example

* fix from_save_pretrained
```
  8d5b7f36
17 Jun, 2021 2 commits

AutoTokenizer: infer the class from the tokenizer config if possible (#12208) · adb70eda

Sylvain Gugger authored Jun 17, 2021



* AutoTokenizer: infer the class from the tokenizer config if possible

* Add tests

* Update src/transformers/models/auto/tokenization_auto.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

adb70eda

Pipeline update & tests (#12207) · b56848c8
Lysandre Debut authored Jun 17, 2021

b56848c8

16 Jun, 2021 2 commits

Hubert (#11889) · ccca5102

Patrick von Platen authored Jun 16, 2021



* fix_torch_device_generate_test

* remove @

* add hubert

* add first test file

* more docs

* fix bugs

* fix bug

* finish

* finish

* finish docstring

* fix

* fix

* finalize

* add to ignored

* finish

* Apply suggestions from code review

* correct naming

* finish

* fix auto config

* finish

* correct convert script

* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* apply suggestions lysandre & suraj
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>

ccca5102

[Flax] Add Beam Search (#12131) · c3c39f7e

Patrick von Platen authored Jun 16, 2021



* fix_torch_device_generate_test

* remove @

* push new logit processors

* add processors

* save first working version

* save intermediate

* finish

* make style

* make fix-copies

* finish

* Update tests/test_modeling_flax_bart.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>

c3c39f7e

15 Jun, 2021 2 commits
- [testing] ensure concurrent pytest workers use a unique port for torch.dist (#12166) · 6e7cc5cc
  Stas Bekman authored Jun 15, 2021
```
* ensure concurrent pytest workers use a unique port for torch.distributed.launch

* reword
```
  6e7cc5cc
- Ray Tune Integration Updates (#12134) · b9d66f4c
  Amog Kamsetty authored Jun 15, 2021
```
* fix

* fixes

* add back to scheduled tests

* formatting

* Update integrations.py
```
  b9d66f4c
14 Jun, 2021 8 commits

[style] consistent nn. and nn.functional: part 3 `tests` (#12155) · 372ab9cd
Stas Bekman authored Jun 14, 2021
```
* consistent nn. and nn.functional: p3 templates

* restore
```
372ab9cd

Flax Big Bird (#11967) · d9c0d08f

Vasudev Gupta authored Jun 15, 2021



* add flax bert

* bert -> bigbird

* original_full ported

* add debugger

* init block sparse

* fix copies ; gelu_fast -> gelu_new

* block sparse port

* fix block sparse

* block sparse working

* all ckpts working

* fix-copies

* make quality

* init tests

* temporary fix for FlaxBigBirdForMultipleChoice

* skip test_attention_outputs

* fix

* gelu_fast -> gelu_new ; fix multiple choice model

* remove nsp

* fix sequence classifier

* fix

* make quality

* make fix-copies

* finish

* Delete debugger.ipynb

* Update src/transformers/models/big_bird/modeling_flax_big_bird.py

* make style

* finish

* bye bye jit flax tests
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

d9c0d08f

[Flax] Fix flax pt equivalence tests (#12154) · 007be9e4
Patrick von Platen authored Jun 14, 2021
```
* fix_torch_device_generate_test

* remove @

* upload
```
007be9e4

Adding TFWav2Vec2Model (#11617) · d438eee0

Will Rice authored Jun 14, 2021



* [WIP] Add TFWav2Vec2Model

Work in progress for adding a tensorflow version of Wav2Vec2

* feedback changes

* small fix

* Test Feedback Round 1

* Add SpecAugment and CTC Loss

* correct spec augment mask creation

* docstring and correct copyright

* correct bugs

* remove bogus file

* finish tests correction

* del unnecessary layers

* Update src/transformers/models/wav2vec2/modeling_tf_wav2vec2.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* make style

* correct final bug

* Feedback Changes
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

d438eee0

[optim] implement AdafactorSchedule (#12123) · ff7c8168

Stas Bekman authored Jun 14, 2021



* implement AdafactorSchedule

* typo

* fix

* Update src/transformers/optimization.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

ff7c8168

Feature to use the PreTrainedTokenizerFast class as a stand-alone tokenizer (#11810) · 476ba679

SaulLu authored Jun 14, 2021



* feature for tokenizer without slow/legacy version

* format

* modify common test

* add tests

* add PreTrainedTokenizerFast to AutoTokenizer

* format

* change tokenizer common test in order to be able to run test without a slow version

* update tokenizer fast test in order to use `rust_tokenizer_class` attribute instead of `tokenizer_class`

* add autokenizer test

* replace  `if self.tokenizer_class is not None` with ` if self.tokenizer_class is None`

* remove obsolete change in comment

* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/tokenization_utils_fast.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* change `get_main_tokenizer` into `get_tokenizers`

* clarify `get_tokenizers` method

* homogenize with `test_slow_tokenizer` and `test_rust_tokenizer`

* add `test_rust_tokenizer = False` to tokenizer which don't define a fast version

* `test_rust_tokenizer = False` for BertJapaneseTokenizer

* `test_rust_tokenizer = False` for BertJapaneseCharacterTokenizationTest
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

476ba679

FlaxBart (#11537) · 4a51b1dd

Daniel Stancl authored Jun 14, 2021



* Start working on FlaxBart

* Create modeling_flax_bart.py

* Write FlaxBartAttention

* Add FlaxBartEncoderLayer

* Add FlaxBartDecoderLayer and some typing

* Add helepr function for FlaxBart

* shift_tokens_right

* _make_causal_mask

* _expand_mask

* Add PositionalEmbedding and fix init_std naming

* Add FlaxBartPretrainedModel

* Add FlaxBartEncoder

* Add FlaxBartEncoder

* Add FlaxBartEncoder among modules to be imported

* YET WE CANNOT INITIALIZE THAT!! :(

* Make BartEncoder working

Change BartEncoder to instance of nn.Module so far

* Add FlaxBartDecoder

* Add FlaxBartModel

* TODO to make model run -> Prepapre model inputs

* Resolve padding

* Add FlaxBartModel

* Add FlaxBartModel into importable modules

* Remove FlaxBartEncoder and FlaxBartDecoder from importable modules

* make style; not properly working

* make style; make quality not pass due to some import I left

* Remove TODO for padding_idx in nn.Embed so far

* Add FlaxBartForConditionalGeneration

* Incorporate Flax model output classes, i.e. return_dict

* Add another models and incorporate use_cache arg

* Add FlaxBartForSequenceClassification and FlaxBartForQuestionAnswering

* Incorporate use_cache arg from PyTorch implementation

* Add all necessary Flax output utils

* Add FlaxBartForCausalLM; not working yet'

* Add minor improvements; still lacks some functionality

* Update docs, src and tests

* Add support of FlaxBart to docs/source

* Fix some bugs in FlaxBart souce code

* Add some neccessary tests for FlaxBart models - jit_compilation not passing

* Fix tests and add test_head_masking

* Fix tests for @jax.jit computation

* Add test_head_masking

* Migrate FlaxBart tests from jax.numpy to numpy

* Remove FlaxBartForCausalLM

* Clean repo

* fix bart model weight structure

* Fix FlaxBartForSequenceClassification

Slicing is not possible to use below jit, therefore, selecting sentence
representation from hidden_states must be changed.

* Allow FlaxBartForSequenceClassification for testing pt_flax equivalence

* Allow testing for FlaxBartForQA for pt_flax equivalence

* Add a comment to FlaxBartForSequenceClassification + change noise from 1e-3 to 1e-6

* remove past_key_values

* remove inputs_mebeds and make input_ids required

* add position ids

* re-write attention layer

* fix dataclass

* fix pos embeds and attention output

* fix pos embeds

* expose encode method

* expose decode method

* move docstring to top

* add cache for causal attn layer

* remove head masking for now

* s2s greedy search first pass

* boom boom

* fix typos

* fix greedy generate for bart

* use encoder, decoder layers instead of num_hidden_layers

* handle encoder_outputs

* cleanup

* simplify decoding

* more clean-up

* typos

* Change header + add {decoder_,}position_ids into 2 models

* add BartConfig

* fix existing tests

* add encode, decode methods

* Fix shift_tokens_right for JIT compilation + clarify one condition

* fix decode

* encoder => encode

* simplify generate

* add tests for encode and decode

* style

* add tests for cache

* fix equivalence tests

* sample generate now works with seq2seq

* generation tests

* initialize dense layers

* docstring and cleanup

* quality

* remove get/set input_embeddings

* address Patricks suggestions

* decode for every model, remove encoder_outputs from call

* update tests accordingly

* decode returns only decoder outputs and logits

* fix arguments

* doc encode, decode methods

* correct base_model_prefix

* fix test for seq classif model

* fix docs
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>

4a51b1dd

Fix megatron_gpt2 attention block's causal mask (#12007) · ecd6efe7

Guido Novati authored Jun 14, 2021



* Fix megatron_gpt2 attention block's causal mask.

* compatibility with checkpoints created with recent versions of Megatron-LM

* added integration test for the released Megatron-GPT2 model

* code style changes

* added option to megatron conversion script to read from config file
Co-authored-by: Guido Novati <gnovati@nvidia.com>

ecd6efe7

11 Jun, 2021 1 commit
- Fix head masking generate tests (#12110) · e47765d8
  Patrick von Platen authored Jun 11, 2021
```
* fix_torch_device_generate_test

* remove @

* fix tests
```
  e47765d8
10 Jun, 2021 3 commits

Flax VisionTransformer (#11951) · 9a9314f6

Jayendra authored Jun 10, 2021



* adding vit for flax

* added test for Flax-vit and some bug-fixes

* overrided methods where variable changes were necessary for flax_vit test

* added FlaxViTForImageClassification for test

* Update src/transformers/models/vit/modeling_flax_vit.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* made changes suggested in PR

* Adding jax-vit models for autoimport

* swapping num_channels and height,width dimension

* fixing the docstring for torch-like inputs for VIT

* add model to main init

* add docs

* doc, fix-copies

* docstrings

* small test fixes

* fix docs

* fix docstr

* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* style
Co-authored-by: jayendra <jayendra@infocusp.in>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

9a9314f6

Fix a condition in test_generate_with_head_masking (#11911) · 0eaeae2e

Daniel Stancl authored Jun 10, 2021

* Fix a condition in test_generate_with_head_masking

* Fix usage of head_mask in bigbirg_pegasus

* Fix head masking for speech2text

* Resolve copy mismatch + drop unwanted print statement

* Fix the condition

0eaeae2e

CLIPFeatureExtractor should resize images with kept aspect ratio (#11994) · 9d2cee8b

Tobias Norlund authored Jun 10, 2021



* Resize with kept aspect ratio

* Fixed failed test

* Overload center_crop and resize methods instead

* resize should handle non-PIL images

* update slow test

* Tensor => tensor
Co-authored-by: patil-suraj <surajp815@gmail.com>

9d2cee8b

09 Jun, 2021 5 commits

[Wav2Vec2ForPretraining] Correct checkpoints wav2vec2 & fix tests (#12089) · bc6f51e5
Patrick von Platen authored Jun 09, 2021
```
* fix_torch_device_generate_test

* remove @

* fix tests
```
bc6f51e5
rm require_version_examples (#12088) · 61e19198
Stas Bekman authored Jun 09, 2021

61e19198

Wav2Vec2 Pretraining (#11306) · d472bd7b

Anton Lozhkov authored Jun 09, 2021



* Working quantizer forward

* Working quantizer forward

* Clean up unused model parts, test reproducibility

* Working quantizer forward

* Clean up unused model parts, test reproducibility

* Remove custom outputs from the shared ones

* correct conversion

* correct bug

* add first pretrain script

* save intermediate

* static shapes

* save intermediate

* finish first pretrain script version

* more refactor

* remove wanddb

* refactor more

* improve test

* correct perplexity compute bug

* finish model implementation

* add to docs

* finish docs

* finish pretraining script

* finish pretraining script

* remove wandb

* finish PR for merge

* finish config

* finish

* make deepspeed work

* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* apply suggestions

* fix flaky test
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

d472bd7b

[test] support more than 2 gpus (#12074) · b1a8aa94
Stas Bekman authored Jun 09, 2021
```
* support more than 2 gpus

* style
```
b1a8aa94

Add DETR (#11653) · d3eacbb8

NielsRogge authored Jun 09, 2021



* Squash all commits of modeling_detr_v7 branch into one

* Improve docs

* Fix tests

* Style

* Improve docs some more and fix most tests

* Fix slow tests of ViT, DeiT and DETR

* Improve replacement of batch norm

* Restructure timm backbone forward

* Make DetrForSegmentation support any timm backbone

* Fix name of output

* Address most comments by @LysandreJik

* Give better names for variables

* Conditional imports + timm in setup.py

* Address additional comments by @sgugger

* Make style, add require_timm and require_vision to testsé

* Remove train_backbone attribute of DetrConfig, add methods to freeze/unfreeze backbone

* Add png files to fixtures

* Fix type hint

* Add timm to workflows

* Add `BatchNorm2d` to the weight initialization

* Fix retain_grad test

* Replace model checkpoints by Facebook namespace

* Fix name of checkpoint in test

* Add user-friendly message when scipy is not available

* Address most comments by @patrickvonplaten

* Remove return_intermediate_layers attribute of DetrConfig and simplify Joiner

* Better initialization

* Scipy is necessary to get sklearn metrics

* Rename TimmBackbone to DetrTimmConvEncoder and rename DetrJoiner to DetrConvModel

* Make style

* Improve docs and add 2 community notebooks
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

d3eacbb8

08 Jun, 2021 6 commits

[Deepspeed Wav2vec2] integration (#11638) · 11d86d3d

Stas Bekman authored Jun 08, 2021

* wip

* wip - but working with https://github.com/microsoft/DeepSpeed/pull/1044

* cleanup

* workaround

* working 5/8 modes

* solve fp32 distributed zero3

* style

* sync

* sync

* rework

* deprecation

* cleanup

* https://github.com/microsoft/DeepSpeed/pull/1044

 pr was merged

* clean up

* add a guide

* more prose

* more prose

* fix

* more prose

* sub_group_size was too big

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* refactor

* bug fix

* make the true check explicit

* new deepspeed release
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

11d86d3d

[Deepspeed] various fixes (#12058) · 32290d87

Stas Bekman authored Jun 08, 2021

* replace deprecated config

* sub_group_size was too big

* complete deprecation removal

32290d87

Replace legacy tensor.Tensor with torch.tensor/torch.empty (#12027) · f5eec0d8
Mario Šaško authored Jun 08, 2021
```
* Replace legacy torch.Tensor constructor with torch.{tensor, empty}

* Remove torch.Tensor in examples
```
f5eec0d8

Fix tapas issue (#12063) · 70f88eec

NielsRogge authored Jun 08, 2021

* Fix scatter function to be compatible with torch-scatter 2.7.0

* Allow test again

70f88eec

Fix integration tests (#12066) · e56e3140
NielsRogge authored Jun 08, 2021

e56e3140
skip failing test (#12059) · 4abc6dd6
Stas Bekman authored Jun 07, 2021

4abc6dd6

07 Jun, 2021 2 commits

Extend pipelines for automodel tupels (#12025) · 2056f26e

Nicolas Patry authored Jun 07, 2021



* fix_torch_device_generate_test

* remove @

* finish

* refactor

* add test

* fix test

* Attempt at simplification.

* Small fix.

* Fixing non existing AutoModel for TF.

* Naming.

* Remove extra condition.
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>

2056f26e

fix deberta 2 tokenizer integration test (#12017) · 3857f2b4
Philip May authored Jun 07, 2021

3857f2b4

04 Jun, 2021 1 commit

[Deepspeed] Assert on mismatches between ds and hf args (#12021) · 2c73b930

Stas Bekman authored Jun 04, 2021



* wip

* add mismatch validation + test

* renames

* Update docs/source/main_classes/deepspeed.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* renames
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

2c73b930

02 Jun, 2021 2 commits
- [deepspeed] add nvme test skip rule (#11997) · 61c50634
  Stas Bekman authored Jun 02, 2021
```
* add nvme skip rule

* fix
```
  61c50634
- [deepspeed] Move code and doc into standalone files (#11984) · 640318be
  Stas Bekman authored Jun 02, 2021
```
* move code and docs

* style

* moved

* restore
```
  640318be