Commits · 96d833b211a35bc48c3f9174042a796bb110a66b · chenpangpang / transformers

"vscode:/vscode.git/clone" did not exist on "37373ef2bb396f35a8899db117058bf10511299c"

04 Jul, 2022 8 commits

Return scalar losses instead of per-sample means (#18013) · 96d833b2

Matt authored Jul 04, 2022

* Return scalar losses instead of per-sample means

* Make loss shape (1,) instead of scalar

* Allow scalar losses in test_loss_computation

* Allow scalar losses in test_loss_computation

* Allow scalar losses in test_loss_computation

* Remove XLA loss function for RAG

96d833b2

sort list of models (#18011) · 6cb19540
Matthijs Hollemans authored Jul 04, 2022

6cb19540
Replace BloomTokenizer by BloomTokenizerFast in doc (#18005) · 7498db06
regisss authored Jul 04, 2022

7498db06
Fix typo in error message in generation_utils (#18000) · 3cfdefaa
regisss authored Jul 04, 2022

3cfdefaa

Refactor to inherit from nn.Module instead of nn.ModuleList (#17501) · cf2578ae

amyeroberts authored Jul 04, 2022

* Refactor to inherit from nn.Module instead of nn.ModuleList

* Fix typo

* Empty to trigger CI re-run

Blender Bot tests failing (should be unrelated to this PR) and pass locally). I don't have sufficient permisisons to re-run the CI workflow (totally or from failed)

cf2578ae

Add TF ResNet model (#17427) · 77ea5130

amyeroberts authored Jul 04, 2022



* Rought TF conversion outline

* Tidy up

* Fix padding differences between layers

* Add back embedder - whoops

* Match test file to main

* Match upstream test file

* Correctly pass and assign image_size parameter
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Add in MainLayer

* Correctly name layer

* Tidy up AdaptivePooler

* Small tidy-up

More accurate type hints and remove whitespaces

* Change AdaptiveAvgPool

Use the AdaptiveAvgPool implementation by @Rocketknight1, which correctly pools if the output shape does not evenly divide by input shape c.f. https://github.com/huggingface/transformers/pull/17554/files/9e26607e22aa8d069c86b50196656012ff0ce62a#r900109509

Co-authored-by: From: matt <rocketknight1@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Use updated AdaptiveAvgPool
Co-authored-by: matt <rocketknight1@gmail.com>

* Make AdaptiveAvgPool compatible with CPU

* Remove image_size from configuration

* Fixup

* Tensorflow -> TensorFlow

* Fix pt references in tests

* Apply suggestions from code review - grammar and wording
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Add TFResNet to doc tests

* PR comments - GlobalAveragePooling and clearer comments

* Remove unused import

* Add in keepdims argument

* Add num_channels check

* grammar fix: by -> of
Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Remove transposes - keep NHWC throughout forward pass

* Fixup look sharp

* Add missing layer names

* Final tidy up - remove from_pt now weights on hub
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

77ea5130

Add link to existing documentation (#17931) · 7b18702c
Lysandre Debut authored Jul 04, 2022

7b18702c
only a stupid typo, but it can lead to confusion (#17930) · a045cbd6
Dobatymo authored Jul 04, 2022

a045cbd6

01 Jul, 2022 14 commits

Exclude Databricks from notebook env only if the runtime is below 11.0 (#17988) · 49c8c67f

David Heryanto authored Jul 02, 2022

* Exclude Databricks from notebook env only if the runtime is below 11.0

* Dummy commit to trigger CI

* Empty commit to trigger CI

* Empty commit to trigger CI

* Empty commit to trigger CI

* Empty commit to trigger CI

* Empty commit to trigger CI

* Empty commit to trigger CI

* Empty commit to trigger CI

49c8c67f

Shifting labels for causal LM when using label smoother (#17987) · 6890d196

seungeunrho authored Jul 02, 2022



* Shifting labels for causal LM when using label smoother

When training CausalLM, loss is computed within model's foward() function and
labels are shifted internally. However, if label smoothing is applied, loss is
computed in trainer's compute_loss function and labels are not shifted.
This causes unintended confusion during the alignment of labels and corresponding
inputs. This commit is for resolving this confusion.

Resolves #17960

On branch shift_labels_for_causalLM
Changes to be committed:
	modified:   src/transformers/trainer.py
	modified:   src/transformers/trainer_pt_utils.py

* Update trainer.py

* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

6890d196

Restore original task in test_warning_logs (#17985) · 6f0723a9
Yih-Dar authored Jul 01, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
6f0723a9
Ensure PT model is in evaluation mode and lightweight forward pass done (#17970) · 009171d1
amyeroberts authored Jul 01, 2022

009171d1

XLA train step fixes (#17973) · d6cec458

Matt authored Jul 01, 2022

* Copy inputs to train and test step before modifying them, as this breaks things

* Add XLA tests, fix our loss functions to be XLA-compatible

* make fixup

* Update loss computation test to expect vector of per-sample losses

* Patch loss for TFLED

* Patch loss for TFAlbert

* Add a tf_legacy_loss config flag that enables old loss functions

* Stop using config.get() because it's not a dict

* Skip loss computation test for RAG because its loss is very strange and I'm afraid to rewrite it

* make fixup

* Add XLA-compatible RAG loss

* Fix dtype of loss mask for TFAlbert

* Fix test for XLNet too because it overrides the default one

* make fixup

* Fix config test

* No more depending on GPU NaN behaviour

* Add test, avoid potential zero division

* Fix test item assignment

* Fix loss computation masking test

* make fixup

* Fix dtype bugs

d6cec458

[Flax] Add remat (gradient checkpointing) (#17843) · 485bbe79

Sanchit Gandhi authored Jul 01, 2022

* [Flax] Add remat (gradient checkpointing)

* fix variable naming in test

* flip: checkpoint using a method

* fix naming

* fix class naming

* apply PVP's suggestions from code review

* make fix-copies

* fix big-bird, electra, roberta

* cookie-cutter

* fix flax big-bird

* move test to common

485bbe79

higher atol to avoid flaky trainer test failure (#17979) · 664688b9
Yih-Dar authored Jul 01, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
664688b9
Fix FlaxBigBirdEmbeddings (#17842) · 8bb2c387
Yih-Dar authored Jul 01, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
8bb2c387

add ONNX support for BLOOM (#17961) · b68d408f

Nouamane Tazi authored Jul 01, 2022



* add onnx support for BLOOM

* use TYPE_CHECKING for type annotations

* fix past_shape for bloom (different from gpt2)

* use logical_or instead of `+` for onnx support

* bigger `atol_for_validation` for larger bloom models

* copied -> taken because it's no longer an exact copy

* remove "copied from" comment
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

b68d408f

fixing fsdp autowrap functionality (#17922) · 462b7f3a

Sourab Mangrulkar authored Jul 01, 2022

* fixing fsdp autowrap functionality

* update version and quality

* update torch version to latest stable version

462b7f3a

fix `bias` keyword argument in TFDebertaEmbeddings (#17940) · 3a064bd4
Wissam Antoun authored Jul 01, 2022

3a064bd4
Update expected values in CodeGen tests (#17888) · 569b679a
Yih-Dar authored Jul 01, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
569b679a
Fix typo in perf_train_gpu_one.mdx (#17983) · cb425024
Billy Cao authored Jul 01, 2022

cb425024

skip some gpt_neox tests that require 80G RAM (#17923) · 14fb8a63

Yih-Dar authored Jul 01, 2022



* skip some gpt_neox tests that require 80G RAM

* remove tests

* fix quality
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

14fb8a63

30 Jun, 2022 8 commits

feat: add pipeline registry abstraction (#17905) · 49cd736a

Aaron Pham authored Jun 30, 2022



* feat: add pipeline registry abstraction

- added `PipelineRegistry` abstraction
- updates `add_new_pipeline.mdx` (english docs) to reflect the api addition
- migrate `check_task` and `get_supported_tasks` from
  transformers/pipelines/__init__.py to
  transformers/pipelines/base.py#PipelineRegistry.{check_task,get_supported_tasks}
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* fix: update with upstream/main

chore: Apply suggestions from sgugger's code review
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* chore: PR updates

- revert src/transformers/dependency_versions_table.py from upstream/main
- updates pipeline registry to use global variables
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* tests: add tests for pipeline registry
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* tests: add test for output warning.
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* chore: fmt and cleanup unused imports
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* fix: change imports to top of the file and address comments
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

49cd736a

Add ONNX support for LayoutLMv3 (#17953) · 9cb7cef2

regisss authored Jun 30, 2022

* Add ONNX support for LayoutLMv3

* Update docstrings

* Update empty description in docstring

* Fix imports and type hints

9cb7cef2

skip some ipex tests until it works with torch 1.12 (#17964) · fe140464
Yih-Dar authored Jun 30, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
fe140464

CLI: convert sharded PT models (#17959) · 91e1f24e

Joao Gante authored Jun 30, 2022

* sharded conversion; add flag to control max hidden error

* better hidden name matching

* Add test: load TF from PT shards

* fix test (PT data must be local)

91e1f24e

Fix number of examples for iterable dataset in distributed training (#17951) · f25457b2
Sylvain Gugger authored Jun 30, 2022

f25457b2

[Pipelines] Add revision tag to all default pipelines (#17667) · e4d25885

Patrick von Platen authored Jun 30, 2022



* trigger test failure

* upload revision poc

* Update src/transformers/pipelines/base.py
Co-authored-by: Julien Chaumond <julien@huggingface.co>

* up

* add test

* correct some stuff

* Update src/transformers/pipelines/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* correct require flag
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

e4d25885

Unifying training argument type annotations (#17934) · 4f8361af
Jannis Born authored Jun 30, 2022
```
* doc: Unify training arg type annotations

* wip: extracting enum type from Union

* blackening
```
4f8361af
Fix GPT-NeoX-20B past handling, attention computation (#17811) · 205bc415
Jason Phang authored Jun 30, 2022
```
* Fix GPT-NeoX-20B past handling, swap attention computation to hopefully avoid NaN, update docs

* 20B tests
```
205bc415

29 Jun, 2022 10 commits

Flax t5 Encoder (#17784) · 692e61e9

Crystina authored Jun 29, 2022



* first draft adding Flax-t5-encoder and Flax-mt5-encoder

* imports

* after make fixup

* flax t5 encoder test

* black on test

* make fix-copies

* clean

* all_model_classes -> tuple

* clean test

* is_encoder_decoder=False in t5-enc tester

* remove file docstring before FlaxT5Encoder

* black

* isort

* commit suggestions on src/transformers/models/t5/modeling_flax_t5.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* commit suggestions on src/transformers/models/t5/modeling_flax_t5.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* remove _get_encoder_module

* self.decoder_seq_length -> self.encoder_seq_length as t5-enc does not have decoder

* bugfix - self.module_class is class itself, not instance;

* docs for mt5 and t5

* call -> __call__ in t5 doc

* FlaxMT5EncoderModel to TYPE_HINT

* run doc-builder to allow change the files
Co-authored-by: Suraj Patil <surajp815@gmail.com>

692e61e9

Fix #17893, removed dead code (#17917) · eb1493b1

Clémentine Fourrier authored Jun 29, 2022

* Removed dead position_id code, fix #17893

* Removed unused var

* Now ignores removed (dead) dict key for backward comp

eb1493b1

add MobileViT model (#17354) · fbc7598b

Matthijs Hollemans authored Jun 29, 2022



* add MobileViT

* fixup

* Update README.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* remove empty line
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* use clearer variable names

* rename to MobileViTTransformerLayer

* no longer inherit from nn.Sequential

* fixup

* fixup

* not sure why this got added twice

* rename organization for checkpoints

* fix it up

* Update src/transformers/models/mobilevit/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/mobilevit/configuration_mobilevit.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/mobilevit/configuration_mobilevit.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/mobilevit/configuration_mobilevit.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/models/mobilevit/test_modeling_mobilevit.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/mobilevit/modeling_mobilevit.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/mobilevit/modeling_mobilevit.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/mobilevit/modeling_mobilevit.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/mobilevit/modeling_mobilevit.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* code style improvements

* fixup

* Update docs/source/en/model_doc/mobilevit.mdx
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/en/model_doc/mobilevit.mdx
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/mobilevit/configuration_mobilevit.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/mobilevit/configuration_mobilevit.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* download labels from hub

* rename layers

* rename more layers

* don't compute loss in separate function

* remove some nn.Sequential

* replace nn.Sequential with new MobileViTTransformer class

* replace nn.Sequential with MobileViTMobileNetLayer

* fix pruning since model structure changed

* fixup

* fix doc comment

* remove custom resize from feature extractor

* fix ONNX import

* add to doc tests

* use center_crop from image_utils

* move RGB->BGR flipping into image_utils

* fix broken tests

* wrong type hint

* small tweaks
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

fbc7598b

Fix prepare_tf_dataset when drop_remainder is not supplied (#17950) · 5feac3d0
Matt authored Jun 29, 2022

5feac3d0
ExplicitEnum subclass str (JSON dump compatible) (#17933) · bc019b0e
Bram Vanroy authored Jun 29, 2022
```
* ExplicitEnum subclass str (JSON dump compatible)

* allow union if one of the types is str
```
bc019b0e
PyTorch 1.12.0 for scheduled CI (#17949) · b089cca3
Yih-Dar authored Jun 29, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
b089cca3
OPT - Fix Softmax NaN in half precision mode (#17437) · d444edb3
Younes Belkada authored Jun 29, 2022

d444edb3
Use explicit torch version in deepspeed CI (#17942) · 9fe2403b
Yih-Dar authored Jun 29, 2022
```
* use explicit torch version
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
9fe2403b
fix regexes with escape sequence (#17943) · 4c722e9e
Stas Bekman authored Jun 29, 2022

4c722e9e
Fix all is_torch_tpu_available issues (#17936) · 7c4c6f60
Zachary Mueller authored Jun 29, 2022
```
* Fix all is_torch_tpu_available 
```
7c4c6f60