Commits · 45fc8c7951f978c0f8f13c8bab52c744cd5c4784 · chenpangpang / transformers

09 Apr, 2021 7 commits
- Make `get_special_tokens_mask` consider all tokens (#11163) · 45fc8c79
  Sylvain Gugger authored Apr 09, 2021
  
  45fc8c79
- Update README.md (#11161) · 60607465
  Saviour Owolabi authored Apr 09, 2021
```
Corrected a typo ('Downlowd' to 'Download')
```
  60607465
- Fix LogitsProcessor documentation (#11130) · b9b60c16
  Keisuke Hirota authored Apr 09, 2021
```
* Change duplicated LogitsProcessor to LogitsWarper in LogitsProcessorList document

* Write more detailed information about LogitsProcessor's scores argument

* apply suggestion from review

* style
Co-authored-by: Suraj Patil <surajp815@gmail.com>
```
  b9b60c16
- [Community notebooks] Add Wav2Vec notebook for creating captions for YT Clips (#11142) · 8b78a32b
  Niklas Muennighoff authored Apr 09, 2021
```
* Add Wav2Vec Inference notebook

* Update docs/source/community.md
Co-authored-by: Suraj Patil <surajp815@gmail.com>
```
  8b78a32b
- typo (#11152) · 0311ba21
  Stas Bekman authored Apr 08, 2021
```
* typo

* style
```
  0311ba21
- Merge branch 'master' of github.com:huggingface/transformers · 269c9638
  Sylvain Gugger authored Apr 08, 2021
  
  269c9638
- Skip Megatron tests for now · d31c7b10
  Sylvain Gugger authored Apr 08, 2021
  
  d31c7b10
08 Apr, 2021 14 commits

[setup] make fairscale and deepspeed setup extras (#11151) · c2e0fd52

Stas Bekman authored Apr 08, 2021



* make fairscale and deepspeed setup extras

* fix default

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* no reason not to ask for the good version

* update the CIs
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

c2e0fd52

Add support for multiple models for one config in auto classes (#11150) · ba8b1f47
Sylvain Gugger authored Apr 08, 2021
```
* Add support for multiple models for one config in auto classes

* Use get_values everywhere

* Prettier doc
```
ba8b1f47
[setup] extras[docs] must include 'all' (#11148) · 97ccf67b
Stas Bekman authored Apr 08, 2021
```
* extras[doc] must include 'all'

* fix

* better

* regroup
```
97ccf67b

[tests] relocate core integration tests (#11146) · 66446909

Stas Bekman authored Apr 08, 2021

* relocate core integration tests

* add sys.path context manager

* cleanup

* try

* try2

* fix path

* doc

* style

* add dep

* add 2 more deps

66446909

Run mlm pad to multiple for fp16 (#11128) · 6c40e497
Andrea Cappelli authored Apr 08, 2021
```
* Add mlm collator pad to multiple option (#10627)

* Use padding to 8x in run mlm (#10627)
```
6c40e497
Don't duplicate logs in TensorBoard and handle --use_env (#11141) · dfed4ec2
Sylvain Gugger authored Apr 08, 2021

dfed4ec2
Updates SageMaker docs for updating DLCs (#11140) · 9c9b8e70
Philipp Schmid authored Apr 08, 2021

9c9b8e70
Add fairscale and deepspeed back to the CI (#11147) · ba2cf5f9
Lysandre Debut authored Apr 08, 2021
```
* Add fairscale and deepspeed back to the CI

* Add deepspeed to single GPU tests
```
ba2cf5f9
[trainer] solve "scheduler before optimizer step" warning (#11144) · 1ed24afe
Stas Bekman authored Apr 08, 2021
```
* solve "scheduler before optimizer step" warning

* style

* correct the state evaluation test
```
1ed24afe

Add nvidia megatron models (#10911) · 02ec02d6

Julien Demouth authored Apr 08, 2021



* Add support for NVIDIA Megatron models

* Add support for NVIDIA Megatron GPT2 and BERT

Add the megatron_gpt2 model. That model reuses the existing GPT2 model. This
commit includes a script to convert a Megatron-GPT2 checkpoint downloaded
from NVIDIA GPU Cloud. See examples/megatron-models/README.md for details.

Add the megatron_bert model. That model is implemented as a modification of
the existing BERT model in Transformers. This commit includes a script to
convert a Megatron-BERT checkpoint downloaded from NVIDIA GPU Cloud. See
examples/megatron-models/README.md for details.

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Remove model.half in tests + add "# Copied ..."

Remove the model.half() instruction which makes tests fail on the CPU.

Add a comment "# Copied ..." before many classes in the model to enable automatic
tracking in CI between the new Megatron classes and the original Bert ones.

* Fix issues

* Fix Flax/TF tests

* Fix copyright

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update docs/source/model_doc/megatron_bert.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/model_doc/megatron_gpt2.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Resolve most of 'sgugger' comments

* Fix conversion issue + Run make fix-copies/quality/docs

* Apply suggestions from code review

* Causal LM & merge

* Fix init

* Add CausalLM to last auto class
Co-authored-by: Julien Demouth <jdemouth@nvidia.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

02ec02d6

[DeepSpeed] ZeRO Stage 3 (#10753) · c6d66484

Stas Bekman authored Apr 08, 2021



* synced gpus

* fix

* fix

* need to use t5-small for quality tests

* notes

* complete merge

* fix a disappearing std stream problem

* start zero3 tests

* wip

* tune params

* sorting out the pre-trained model loading

* reworking generate loop wip

* wip

* style

* fix tests

* split the tests

* refactor tests

* wip

* parameterized

* fix

* workout the resume from non-ds checkpoint pass + test

* cleanup

* remove no longer needed code

* split getter/setter functions

* complete the docs

* suggestions

* gpus and their compute capabilities link

* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* style

* remove invalid paramgd

* automatically configure zero3 params that rely on hidden size

* make _get_resized_embeddings zero3-aware

* add test exercising resize_token_embeddings()

* add docstring
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

c6d66484

[run_clm] clarify why we get the tokenizer warning on long input (#11145) · acc851e1

Stas Bekman authored Apr 08, 2021



* clarify why we get the warning here

* Update examples/language-modeling/run_clm.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* wording

* style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

acc851e1

Typo fix of the name of BertLMHeadModel in BERT doc (#11133) · 5bf5d50c
Yusuke Mori authored Apr 08, 2021

5bf5d50c

Fix typing error in Trainer class (prediction_step) (#11138) · f8e90d6f

Jannis Born authored Apr 08, 2021

* fix: docstrings in prediction_step

* ci: Satisfy line length requirements

* ci: character length requirements

f8e90d6f

07 Apr, 2021 11 commits
- Fix and refactor check_repo (#11127) · ffe07617
  Sylvain Gugger authored Apr 07, 2021
  
  ffe07617
- Adds use_auth_token with pipelines (#11123) · 3fd7eee1
  Philipp Schmid authored Apr 07, 2021
```
* added model_kwargs to infer_framework_from_model

* added model_kwargs to tokenizer

* added use_auth_token as named parameter

* added dynamic get for use_auth_token
```
  3fd7eee1
- [versions] handle version requirement ranges (#11110) · 1c151283
  Stas Bekman authored Apr 07, 2021
```
* handle version requirement ranges

* add mixed requirement test

* cleanup
```
  1c151283
- fix tests (#11109) · 7442801d
  Vasudev Gupta authored Apr 07, 2021
  
  7442801d
- Adds a note to resize the token embedding matrix when adding special … (#11120) · c0d97cee
  Lysandre Debut authored Apr 07, 2021
```
* Adds a note to resize the token embedding matrix when adding special tokens

* Remove superfluous space
```
  c0d97cee
- Some styling of the training table in Notebooks (#11118) · 02f7c2fe
  Sylvain Gugger authored Apr 07, 2021
  
  02f7c2fe
- Dummies multi backend (#11100) · 11505fa1
  Sylvain Gugger authored Apr 07, 2021
```
* Replaces requires_xxx by one generic method

* Quality and update check_dummies

* Fix inits check

* Post-merge cleanup
```
  11505fa1
- [examples] fix white space (#11099) · 424419f5
  Stas Bekman authored Apr 07, 2021
```
these get concatenated without whitespace, so fix it
```
  424419f5
- fix: The 'warn' method is deprecated (#11105) · c9035e45
  Stas Bekman authored Apr 07, 2021
```
* The 'warn' method is deprecated

* fix test
```
  c9035e45
- GPTNeo: handle padded wte (#11079) · 247bed38
  Leo Gao authored Apr 07, 2021
```
* GPTNeo: handle padded wte

* Switch to config.vocab_size

* apply review suggestion
Co-authored-by: Suraj Patil <surajp815@gmail.com>
```
  247bed38
- dead link fixed (#11103) · 083ad7d4
  cronoik authored Apr 07, 2021
  
  083ad7d4
06 Apr, 2021 8 commits

Style · fd338abd
Sylvain Gugger authored Apr 06, 2021

fd338abd

accelerate question answering examples with no trainer (#11091) · aef4cf8c

SHYAM SUNDER KUMAR authored Apr 07, 2021



* accelerate question answering examples with no trainer

* removed train and eval flags also fixed fill np array function

* Update examples/question-answering/run_qa_beam_search_no_trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/question-answering/run_qa_no_trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

aef4cf8c

Auto feature extractor (#11097) · 403d530e

Sylvain Gugger authored Apr 06, 2021

* AutoFeatureExtractor

* Init and first tests

* Tests

* Damn you gitignore

* Quality

* Defensive test for when not all backends are here

* Use pattern for Speech2Text models

403d530e

[doc] gpt-neo (#11098) · 520198f5
Stas Bekman authored Apr 06, 2021
```
make the example work
```
520198f5
Development on v4.6.0dev0 · 9853c5dd
Lysandre authored Apr 06, 2021

9853c5dd
Release v4.5.0 · 4906a29f
Lysandre authored Apr 06, 2021

4906a29f

[WIP] GPT Neo cleanup (#10985) · 2a8115f0

Suraj Patil authored Apr 06, 2021

* better names

* add attention mixin

* all slow tests in one class

* make helper methods static so we can test

* add local attention tests

* better names

* doc

* apply review suggestions

2a8115f0

added new merged Trainer test (#11090) · 76800fb8
Philipp Schmid authored Apr 06, 2021

76800fb8