Commits · fdb120805c93d101efc03ec716e9153562054db7 · chenpangpang / transformers

20 Jun, 2022 3 commits
- Fix cache for GPT-Neo-X (#17764) · fdb12080
  Sylvain Gugger authored Jun 20, 2022
```
* Fix cache for GPT-Neo-X

* Add more tests
```
  fdb12080
- deprecate is_torch_bf16_available (#17738) · a2d34b7c
  Stas Bekman authored Jun 20, 2022
```
* deprecate is_torch_bf16_available

* address suggestions
```
  a2d34b7c
- TF: BART compatible with XLA generation (#17479) · 132402d7
  Joao Gante authored Jun 20, 2022
```
* Also propagate changes to blenderbot, blenderbot_small, marian, mbart, and pegasus
```
  132402d7
18 Jun, 2022 2 commits

Attempt to change Push CI to workflow_run (#17753) · 6589e510

Yih-Dar authored Jun 18, 2022



* Use workflow_run event for push CI

* change to workflow_run

* Add comments
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

6589e510

Added translation of index.mdx to Portuguese Issue #16824 (#17565) · 0d92798b

Rafael Zimmer authored Jun 17, 2022



* Added translation of installation.mdx to Portuguese, as well
as default templates of _toctree.yml and _config.py

* [ build_documentation.yml ] - Updated doc_builder to build
documentation in Portuguese.
[ pipeline_tutorial.mdx ] - Created translation for the pipeline_tutorial.mdx.

* [ build_pr_documentation.yml ] - Added pt language to pr_documentation builder.

[ pipeline_tutorial.mdx ] - Grammar changes.

* [ accelerate.mdx ] - Translated to Portuguese the acceleration tutorial.

* [ multilingual.mdx ] - Added portuguese translation for multilingual tutorial.

[ training.mdx ] - Added portuguese translation for training tutorial.

* [ preprocessing.mdx ] - WIP

* Update _toctree.yml

* Adding Pré-processamento to _toctree.yml

* Update accelerate.mdx

* Nits and eliminate preprocessing file while it is ready

* [ index.mdx ] - Translated to Portuguese the index apresentation page.

* [ docs/source/pt ] - Updated _toctree.yml to match newest translations.

* Fix build_pr_documentation.yml

* Fix index nits

* nits in _toctree
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

0d92798b

17 Jun, 2022 6 commits

Save huggingface checkpoint as artifact in mlflow callback (#17686) · 522a9ece

Swetha Mandava authored Jun 17, 2022



* Fix eval to compute rouge correctly for rouge_score

* styling

* moving sentence tokenization to utils from run_eval

* saving ckpt in mlflow

* use existing format of args

* fix documentation
Co-authored-by: Swetha Mandava <smandava@nvidia.com>

522a9ece

Migrate HFDeepSpeedConfig from trfrs to accelerate (#17623) · 21a77242

Sourab Mangrulkar authored Jun 17, 2022



* Migrate HFDeepSpeedConfig from trfrs to accelerate

* add `accelerate` to testing dep

* addressing comments

* addressing comments

Using `_shared_state` and avoiding object creation. This is necessary as `notebook_launcher` in `launcers.py` checks `len(AcceleratorState._shared_state)>0` to throw an error.

* resolving comments

1. Use simple API from accelerate to manage the deepspeed config integration
2. Update the related documentation

* reverting changes and addressing comments

* docstring correction

* addressing nits

* addressing nits

* addressing nits 3

* bumping up the accelerate version to 0.10.0

* resolving import

* update setup.py to include deepspeed dependencies

* Update dependency_versions_table.py

* fixing imports

* reverting changes to CI dependencies for "run_tests_pipelines_tf*" tests

These changes didn't help with resolving the failures and I believe this needs to be addressed in another PR.

* removing `accelerate` as hard dependency

Resolves issues related to CI Tests

* adding `accelerate` as dependency for building docs

resolves failure in Build PR Documentation test

* adding `accelerate` as dependency in "dev" to resolve doc build issue

* resolving comments

1. adding `accelerate` to extras["all"]
2. Including check for accelerate too before import HFDeepSpeedConfig from there
Co-Authored-By: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* resolving comments
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

21a77242

Bump notebook in /examples/research_projects/lxmert (#17743) · e44a569f

dependabot[bot] authored Jun 17, 2022

Bumps [notebook](http://jupyter.org

) from 6.4.10 to 6.4.12.

---
updated-dependencies:
- dependency-name: notebook
  dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

e44a569f

Bump notebook in /examples/research_projects/visual_bert (#17742) · 5089a2d4

dependabot[bot] authored Jun 17, 2022

Bumps [notebook](http://jupyter.org

) from 6.4.10 to 6.4.12.

---
updated-dependencies:
- dependency-name: notebook
  dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

5089a2d4

feat: add num_workers arg to DataLoader (#17751) · 2d7c1bb1
greg2451 authored Jun 17, 2022

2d7c1bb1

Enable PyTorch nightly build CI (#17335) · ca169dbd

Yih-Dar authored Jun 17, 2022



* nightly build pytorch CI

* fix working dir

* change time and event name
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

ca169dbd

16 Jun, 2022 5 commits
- Remove needless file · 3c7e56fb
  Sylvain Gugger authored Jun 16, 2022
  
  3c7e56fb
- v4.21.0.dev0 · 7c6ec195
  Sylvain Gugger authored Jun 16, 2022
  
  7c6ec195
- Refine Bf16 test for deepspeed (#17734) · 36d46479
  Sylvain Gugger authored Jun 16, 2022
```
* Refine BF16 check in CPU/GPU

* Fixes

* Renames
```
  36d46479
- Fix tf shared embedding (#17730) · f44e2c2b
  Arthur authored Jun 16, 2022
```
* fix the naming

* from pt in test for now

* make style

* slow test and removed from_pt
```
  f44e2c2b
- Fix mask token in the example (#17725) · 2eadb7e5
  Jiayi Pan authored Jun 16, 2022
```
VIsualBert uses bert-base-uncased tokenizer, therefore, instead of {mask}, the mask token should be [MASK]
```
  2eadb7e5
15 Jun, 2022 8 commits
- Sort the model doc Toc Alphabetically (#17723) · 3981ee86
  Sylvain Gugger authored Jun 15, 2022
  
  3981ee86
- normalize keys_to_ignore (#17722) · 66f89332
  Stas Bekman authored Jun 15, 2022
  
  66f89332
- CLI: Add flag to push TF weights directly into main (#17720) · c3c62b5d
  Joao Gante authored Jun 15, 2022
```
* Add flag to push weights directly into main
```
  c3c62b5d
- Update requirements.txt (#17719) · 6ebeeeef
  Jeff Rasley authored Jun 15, 2022
  
  6ebeeeef
- Revert "Change push CI to run on workflow_run event (#17692)" (#17717) · 50415b84
  Yih-Dar authored Jun 15, 2022
```
This reverts commit b76290f4.
```
  50415b84
- [Wav2Vec2Conformer] Official release (#17709) · 7f14839f
  Patrick von Platen authored Jun 15, 2022
```
* [Wav2Vec2Conformer] Official release

* remove from not-in-readme
```
  7f14839f
- Documentation: RemBERT fixes (#17641) · 242cc6e2
  Stefan Schweter authored Jun 15, 2022
```
* rembert: fix python codeblock

* rembert: use correct google/rembert checkpoint name in documentation

* rembert: use correct google/rembert checkpoint name in TF documentation
```
  242cc6e2
- Change push CI to run on workflow_run event (#17692) · b76290f4
  Yih-Dar authored Jun 15, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  b76290f4
14 Jun, 2022 10 commits

fix tolerance for a bloom slow test (#17634) · d453ea61
Younes Belkada authored Jun 14, 2022

d453ea61
[LongT5] disable model parallel test (#17702) · 120649bf
Suraj Patil authored Jun 14, 2022

120649bf

FX function refactor (#17625) · 7ec9128e

Michael Benayoun authored Jun 14, 2022



* Function refactor

* Update src/transformers/utils/fx.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

7ec9128e

Add `BloomForSequenceClassification` and `BloomForTokenClassification` classes (#17639) · edb672ac

Hailey Schoelkopf authored Jun 14, 2022



* add new bloom classes

* (feat) add bloom classification tests; make style

* style: change import in test

* add some typehints to bloom classes

* merge main into branch

* fix: input checking in bloom seq classification

* fix tests

* change model class tests

* fix few tests

- more tests should pass
- one test left

* make token classifier return hidden states

* style: make BLOOM typehints consistent
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

edb672ac

Swin main layer (#17693) · bd43151a
amyeroberts authored Jun 14, 2022
```
* Swin models call TFSwinMainLayer

* Tidy up
```
bd43151a

Include a comment to reflect Amy's contributions (#17689) · 3960ce91

Sayak Paul authored Jun 14, 2022



* Add note on amy's contribution.
Co-authored-by: Amy Roberts <aeroberts4444@gmail.com>

* remove non-tech comment.

Co-authored by: Amy Roberts <aeroberts4444@gmail.com>
Co-authored-by: Amy Roberts <aeroberts4444@gmail.com>

3960ce91

Rag end2end new (#17650) · 9068fa6c

Shamane Siri authored Jun 15, 2022

* check

* update the RAG-end2end with new PL and RAY

* removed unwanted comments

9068fa6c

[LongT5] Rename checkpoitns (#17700) · 53496ac5
Patrick von Platen authored Jun 14, 2022

53496ac5

Extend Transformers Trainer Class to Enable PyTorch Torchscript for Inference (#17153) · 3b29c9fd

jianan-gu authored Jun 14, 2022



* add jit mode option and model wrap

* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* refine code

* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add ut and refine code

* code refine

* refine code

* add inference doc

* Update src/transformers/trainer.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update src/transformers/trainer.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* add cpu inference performance doc

* Update perf_infer_cpu.mdx

* Update perf_infer_cpu.mdx

* Update performance.mdx

* Update _toctree.yml

* refine jit func naming

* Update _toctree.yml

* Delete perf_infer_gpu_one.mdx

* Update perf_infer_cpu.mdx

* Update docs/source/en/perf_infer_cpu.mdx
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* add none check before jit

* Update docs/source/en/perf_infer_cpu.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/en/perf_infer_cpu.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

3b29c9fd

Fix doc builder Dockerfile (#17435) · df15703b

Yih-Dar authored Jun 14, 2022



* Fix doc builder Dockerfile
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

df15703b

13 Jun, 2022 6 commits

Add `LongT5` model (#16792) · a72f1c9f

Daniel Stancl authored Jun 13, 2022



* Initial commit

* Make some fixes

* Make PT model full forward pass

* Drop TF & Flax implementation, fix copies etc

* Add Flax model and update some corresponding stuff

* Drop some TF things

* Update config and flax local attn

* Add encoder_attention_type to config

* .

* Update docs

* Do some cleansing

* Fix some issues -> make style; add some docs

* Fix position_bias + mask addition + Update tests

* Fix repo consistency

* Fix model consistency by removing flax operation over attn_mask

* [WIP] Add PT TGlobal LongT5

* .

* [WIP] Add flax tglobal model

* [WIP] Update flax model to use the right attention type in the encoder

* Fix flax tglobal model forward pass

* Make the use of global_relative_attention_bias

* Add test suites for TGlobal model

* Fix minor bugs, clean code

* Fix pt-flax equivalence though not convinced with correctness

* Fix LocalAttn implementation to match the original impl. + update READMEs

* Few updates

* Update: [Flax] improve large model init and loading #16148

* Add ckpt conversion script accoring to #16853 + handle torch device placement

* Minor updates to conversion script.

* Typo: AutoModelForSeq2SeqLM -> FlaxAutoModelForSeq2SeqLM

* gpu support + dtype fix

* Apply some suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* * Remove (de)parallelize stuff
* Edit shape comments
* Update README.md
* make fix-copies

* Remove caching logic for local & tglobal attention

* Apply another batch of suggestions from code review

* Add missing checkpoints
* Format converting scripts
* Drop (de)parallelize links from longT5 mdx

* Fix converting script + revert config file change

* Revert "Remove caching logic for local & tglobal attention"

This reverts commit 2a619828f6ddc3e65bd9bb1725a12b77fa883a46.

* Stash caching logic in Flax model

* Make side relative bias used always

* Drop caching logic in PT model

* Return side bias as it was

* Drop all remaining model parallel logic

* Remove clamp statements

* Move test files to the proper place

* Update docs with new version of hf-doc-builder

* Fix test imports

* Make some minor improvements

* Add missing checkpoints to docs
* Make TGlobal model compatible with torch.onnx.export
* Replace some np.ndarray with jnp.ndarray

* Fix TGlobal for ONNX conversion + update docs

* fix _make_global_fixed_block_ids and masked neg  value

* update flax model

* style and quality

* fix imports

* remove load_tf_weights_in_longt5 from init and fix copies

* add slow test for TGlobal model

* typo fix

* Drop obsolete is_parallelizable and one warning

* Update __init__ files to fix repo-consistency

* fix pipeline test

* Fix some device placements

* [wip]: Update tests -- need to generate summaries to update expected_summary

* Fix quality

* Update LongT5 model card

* Update (slow) summarization tests

* make style

* rename checkpoitns

* finish

* fix flax tests
Co-authored-by: phungvanduy <pvduy23@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: patil-suraj <surajp815@gmail.com>

a72f1c9f

Add FP16 Support for SageMaker Model Parallel (#17386) · 1690094b

haohanchen-yagao authored Jun 13, 2022

* Add FP16 supporot for sagemaker model parallel

* minor fix

* fix indentation

* handle mix precision exception for smmp

* minor fix

* remove amp implementation on SMMP

* remove redundant stuff

* reformat trainer

* restyling

* reformat

1690094b

enable cpu distribution training using mpirun (#17570) · 4aabf9b5

Wang, Yi authored Jun 14, 2022



* enable cpu distribution training using mpirun

*command like
*    mpirun -n 2 python3 run_qa.py --no_cuda --xpu_backend ccl xxxx
*MASTER_ADDR and MASTER_PORT should be set as env
*export MASTER_ADDR=127.0.0.1
*export MASTER_PORT=29500
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* fix according to the review comment
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* use accelerate logic for cpu distribution training to set "RANK","LOCAL_RANK","WORLD_SIZE" environment
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

4aabf9b5

Add Ray's scope to training arguments (#17629) · 457d4a32

Bram Vanroy authored Jun 13, 2022



* allow scope from trainer arg

* add ray_scope to training args

* escape double quotes

* make style && quality

* attempt to solve doc style issues

* splitting up URLs for style

* make fixup

* Update src/transformers/training_args.py
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>

* make style
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>

457d4a32

Update modeling_gpt_neox.py (#17575) · 54833886

Will Frey authored Jun 13, 2022

I'm guessing that the intention was to have the `_no_split_modules` class attribute for `GPTNeoXPreTrainedModel` to be set to `["GPTNeoXLayer"]`, akin to how its set as `["GPTJBlock"]` for `GPTJPreTrainedModel`.

If this is incorrect, please feel free to just close the PR.

Thanks!

54833886

Fix dtype getter (#17668) · a1344dbf

Sylvain Gugger authored Jun 13, 2022

* Fix dtype getters

* Proper fix for dtype getter

* Style and commant

* Always use last for consistency

* Quality

a1344dbf