Commits · 88cc26dcd1e73c16feff1874ffe75c49159e4dcd · chenpangpang / transformers

25 Feb, 2021 7 commits

Ignore unexpected weights from PT conversion (#10397) · 88cc26dc
Lysandre Debut authored Feb 25, 2021

88cc26dc

I-BERT model support (#10153) · 63645b3b

Sehoon Kim authored Feb 26, 2021



* IBertConfig, IBertTokentizer added

* IBert Model names moified

* tokenizer bugfix

* embedding -> QuantEmbedding

* quant utils added

* quant_mode added to configuration

* QuantAct added, Embedding layer + QuantAct addition

* QuantAct added

* unused path removed, QKV quantized

* self attention layer all quantized, except softmax

* temporarl commit

* all liner layers quantized

* quant_utils bugfix

* bugfix: requantization missing

* IntGELU added

* IntSoftmax added

* LayerNorm implemented

* LayerNorm implemented all

* names changed: roberta->ibert

* config not inherit from ROberta

* No support for CausalLM

* static quantization added, quantize_model.py removed

* import modules uncommented

* copyrights fixed

* minor bugfix

* quant_modules, quant_utils merged as one file

* import * fixed

* unused runfile removed

* make style run

* configutration.py docstring fixed

* refactoring: comments removed, function name fixed

* unused dependency removed

* typo fixed

* comments(Copied from), assertion string added

* refactoring: super(..) -> super(), etc.

* refactoring

* refarctoring

* make style

* refactoring

* cuda -> to(x.device)

* weight initialization removed

* QuantLinear set_param removed

* QuantEmbedding set_param removed

* IntLayerNorm set_param removed

* assert string added

* assertion error message fixed

* is_decoder removed

* enc-dec arguments/functions removed

* Converter removed

* quant_modules docstring fixed

* conver_slow_tokenizer rolled back

* quant_utils docstring fixed

* unused aruments e.g. use_cache removed from config

* weight initialization condition fixed

* x_min, x_max initialized with small values to avoid div-zero exceptions

* testing code for ibert

* test emb, linear, gelu, softmax added

* test ln and act added

* style reformatted

* force_dequant added

* error tests overrided

* make style

* Style + Docs

* force dequant tests added

* Fix fast tokenizer in init

* Fix doc

* Remove space

* docstring, IBertConfig, chunk_size

* test_modeling_ibert refactoring

* quant_modules.py refactoring

* e2e integration test added

* tokenizers removed

* IBertConfig added to tokenizer_auto.py

* bugfix

* fix docs & test

* fix style num 2

* final fixes
Co-authored-by: Sehoon Kim <sehoonkim@berkeley.edu>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

63645b3b

[PretrainedFeatureExtractor] + Wav2Vec2FeatureExtractor, Wav2Vec2Processor,... · cb38ffcc

Patrick von Platen authored Feb 25, 2021

[PretrainedFeatureExtractor] + Wav2Vec2FeatureExtractor, Wav2Vec2Processor, Wav2Vec2Tokenizer (#10324)

* push to show

* small improvement

* small improvement

* Update src/transformers/feature_extraction_utils.py

* Update src/transformers/feature_extraction_utils.py

* implement base

* add common tests

* make all tests pass for wav2vec2

* make padding work & add more tests

* finalize feature extractor utils

* add call method to feature extraction

* finalize feature processor

* finish tokenizer

* finish general processor design

* finish tests

* typo

* remove bogus file

* finish docstring

* add docs

* finish docs

* small fix

* correct docs

* save intermediate

* load changes

* apply changes

* apply changes to doc

* change tests

* apply surajs recommend

* final changes

* Apply suggestions from code review

* fix typo

* fix import

* correct docstring

cb38ffcc

Remove unused variable in example for Q&A (#10392) · 9dc78257
abhishek thakur authored Feb 25, 2021

9dc78257

Bugfix: Removal of padding_idx in BartLearnedPositionalEmbedding (#10200) · 894db670

mingruimingrui authored Feb 25, 2021



* Assumption of padding_idx <2 might not stand

* Use offset instead of 2

* Fix with black

* Change behavior to warning instead for backward compatibility.

* Fix with black

* Remove warning

* Make padding_idx non-required

* padding_idx fix for blenderbot

* padding_idx fix for blenderbot_small

* padding_idx fix for led

* padding_idx fix for mbart

* Remove extra whitespaces

* padding_idx fix for template

* Fix padding_idx passed to nn.Embedding mistake

* Fixed padding_idx passed to positional embedding in template

* Remove padding_idx from pytorch learned positional embeddings

* Remove accidentally added quotes

* Remove padding_idx from tf learned positional embeddings

* Remove zeroing of weights in __init__
Co-authored-by: Wang Ming Rui <mingrui.wang@C02CJTUYMD6M.local>

894db670

Only run model templates tests once (#10388) · 55fe80d0
Lysandre Debut authored Feb 25, 2021

55fe80d0
Run GA on every push even on forks (#10383) · 22bd047e
Lysandre Debut authored Feb 25, 2021

22bd047e

24 Feb, 2021 6 commits

v4.3.3 docs · 35918443
Lysandre authored Feb 24, 2021

35918443
[trainer] move secondary methods into a separate file (#10363) · bdbb2c75
Stas Bekman authored Feb 24, 2021
```
* move secondary methods into a separate file

* cleanup

* style
```
bdbb2c75

fix deprecated ref to `tokenizer.max_len` (#10220) · 5f2a3d72

Poedator authored Feb 24, 2021

This is to fix deprecated reference to `tokenizer.max_len` with `tokenizer.model_max_length` - similar to [issue 8739](https://github.com/huggingface/transformers/issues/8739) and [PR 8604](https://github.com/huggingface/transformers/pull/8604). 
Example [here](https://colab.research.google.com/gist/poedator/f8776349e5c625ce287fc6fcd312fa1e/tokenizer-max_len-error-in-transformers_glue.ipynb). The error happens when `glue_convert_examples_to_features` is called without `max_length` parameter specified. In that case line 119 with wrong reference gets called. This simple fix should  do it.

5f2a3d72

Rework casts (#10274) · cdcdd5f0
Julien Plu authored Feb 24, 2021

cdcdd5f0

ConvBERT fix torch <> tf weights conversion (#10314) · 2d458b2c

abhishek thakur authored Feb 24, 2021



* convbert conversion test

* fin

* fin

* fin

* clean up tf<->pt conversion

* remove from_pt
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>

2d458b2c

[Trainer/Deepspeed] handle get_last_lr() before first step() (#10362) · 3437d121

Stas Bekman authored Feb 23, 2021

* handle get_last_lr() before first step()

* abstract away the lr getting logic

* cleanup

* add test

* move to utils

3437d121

23 Feb, 2021 3 commits
- [bert-base-german-cased] cp to hardcoded urls (#10353) · 4a1ab7cb
  Julien Chaumond authored Feb 23, 2021
  
  4a1ab7cb
- Fix broken examples/seq2seq/README.md markdown (#10344) · 23e87c27
  Akmal authored Feb 23, 2021
  
  23e87c27
- Easier self-scheduled debugging · 83f890dd
  Lysandre authored Feb 23, 2021
  
  83f890dd
22 Feb, 2021 9 commits

Fix evaluation with label smoothing in Trainer (#10338) · 461e8cac
Sylvain Gugger authored Feb 22, 2021

461e8cac

[trainer] add Trainer methods for metrics logging and saving (#10266) · 622a8c59

Stas Bekman authored Feb 22, 2021



* make logging and saving trainer built-in

* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

622a8c59

Loading from last checkpoint functionality in Trainer.train (#10334) · 94d8767b

Tanmay Garg authored Feb 23, 2021

Enhance resume_from_checkpoint argument of Trainer.train to accept
bool type. If True given, last saved checkpoint in self.args.output_dir
will be loaded. (#10280)

94d8767b

[Trainer] implement gradient_accumulation_steps support in DeepSpeed integration (#10310) · eab0afc1
Stas Bekman authored Feb 22, 2021
```
* implement gradient_accumulation_steps support in DeepSpeed integration

* typo

* cleanup

* cleanup
```
eab0afc1
defensive programming + expand/correct README (#10295) · f991daed
Stas Bekman authored Feb 22, 2021

f991daed

Deprecate prepare_seq2seq_batch (#10287) · 9e147d31

Sylvain Gugger authored Feb 22, 2021



* Deprecate prepare_seq2seq_batch

* Fix last tests

* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* More review comments
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>

9e147d31

Add note to resize token embeddings matrix when adding new tokens to voc (#10331) · e73a3e18
Lysandre Debut authored Feb 22, 2021

e73a3e18
Making TF Longformer-like models compliant with AMP (#10233) · 19e737b9
Julien Plu authored Feb 22, 2021
```
* AMP

* Add LED

* Apply style

* Fix longformer
```
19e737b9

DeBERTa-v2 fixes (#10328) · cd8c4c3f

Lysandre Debut authored Feb 22, 2021


Co-authored-by: Pengcheng He <penhe@microsoft.com>
Co-authored-by: Pengcheng He <penhe@microsoft.com>

cd8c4c3f

21 Feb, 2021 1 commit

fix typo in conversion script (#10316) · 88605f37

tagucci authored Feb 22, 2021



* fix typo in conversion script

* style
Co-authored-by: Stas Bekman <stas@stason.org>

88605f37

20 Feb, 2021 3 commits
- don't fail when there are no zombies (#10308) · cdd31b4d
  Stas Bekman authored Feb 20, 2021
  
  cdd31b4d
- Fix style · a2e37974
  Sylvain Gugger authored Feb 20, 2021
  
  a2e37974
- fixes #10303 (#10304) · a0dfc2d3
  cronoik authored Feb 20, 2021
  
  a0dfc2d3
19 Feb, 2021 11 commits
- Integrate DeBERTa v2(the 1.5B model surpassed human performance on Su… (#10018) · 9a7e6372
  Pengcheng He authored Feb 19, 2021
```
* Integrate DeBERTa v2(the 1.5B model surpassed human performance on SuperGLUE); Add DeBERTa v2 900M,1.5B models;

* DeBERTa-v2

* Fix v2 model loading issue (#10129)

* Doc members

* Update src/transformers/models/deberta/modeling_deberta.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Address Sylvain's comments

* Address Patrick's comments
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Style
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
```
  9a7e6372
- Fix example links in the task summary (#10291) · f6e53e3c
  Sylvain Gugger authored Feb 19, 2021
  
  f6e53e3c
- Move the TF NER example (#10276) · 536aee99
  Julien Plu authored Feb 19, 2021
  
  536aee99
- Zero shot distillation script cuda patch (#10284) · cbadb524
  Joe Davison authored Feb 19, 2021
  
  cbadb524
- Kill any run-away pytest processes (#10281) · f1299f50
  Stas Bekman authored Feb 19, 2021
  
  f1299f50
- Introduce logging_strategy training argument (#10267) (#10267) · 709c86b5
  Tanmay Garg authored Feb 19, 2021
```
Introduce logging_strategy training argument
in TrainingArguments and TFTrainingArguments. (#9838)
```
  709c86b5
- Making TF OpenAI GPT model compliant with AMP and XLA (#10261) · 34df26ec
  Julien Plu authored Feb 19, 2021
```
* Fix AMP and XLA

* Remove useless var
```
  34df26ec
- Making TF TransfoXL model compliant with AMP (#10264) · 3e116ed3
  Julien Plu authored Feb 19, 2021
```
* Fix AMP

* Apply style

* Remove unused import
```
  3e116ed3
- Fix XLA and AMP (#10262) · 86caeb76
  Julien Plu authored Feb 19, 2021
  
  86caeb76
- Making TF MPNet model compliant with XLA (#10260) · 3d72d47f
  Julien Plu authored Feb 19, 2021
```
* Fix XLA

* Rework cast

* Apply style
```
  3d72d47f
- Making TF MobileBert model compliant with AMP (#10259) · fb56bf25
  Julien Plu authored Feb 19, 2021
```
* Fix AMP

* Trigger CI

* Rework cast
```
  fb56bf25