Commits · b0520f594cff0ee4552bc1a4a046b8ae0ef36f84 · chenpangpang / transformers

11 Jul, 2022 4 commits

Skip failing tests · b0520f59
Sylvain Gugger authored Jul 11, 2022

b0520f59

Fix RESOURCE_EXHAUSTED error when dealing with large datasets in Flax example scripts (#18069) · 1e8140ca

Duong A. Nguyen authored Jul 11, 2022

* Fix RESOURCE_EXHAUSTED error for large datasets on Flax example scripts

* using np.permutation for creating batch_idx

* train_samples_idx -> training_samples_idx

* fix type hints

1e8140ca

Fix torchscript tests for GPT-NeoX (#18012) · ac98a88f

Yih-Dar authored Jul 11, 2022



* fix dtype issue in _attn

* fix RotaryEmbedding

* fix RotaryEmbedding 2

* clean up
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

ac98a88f

Fix some typos. (#17560) · 95113d13

Yulv-git authored Jul 11, 2022



* Fix some typos.
Signed-off-by: Yulv-git <yulvchi@qq.com>

* Fix typo.
Signed-off-by: Yulv-git <yulvchi@qq.com>

* make fixup.

95113d13

10 Jul, 2022 1 commit
- [bloom] fix alibi device placement (#18087) · ad28ca29
  Stas Bekman authored Jul 10, 2022
  
  ad28ca29
08 Jul, 2022 4 commits

Make predict() close progress bars after finishing (#17952) (#18078) · 8b332a6a

neverix authored Jul 08, 2022

* Make Trainer.predict call on_evaluate (#17952)

* Add on_predict

* Small fix

* Small and different fix

* Add tests

8b332a6a

Update localized READMES when template is filled. (#18062) · 7c046c5c
Sylvain Gugger authored Jul 08, 2022

7c046c5c

Fix type issue in using bucketing with Trainer (#18051) · 94ca7d2f

BOSEOP KIM authored Jul 09, 2022



* Fix type issue in using bucketing with Trainer

- Fix type issues in LengthGrouperSampler,
  DistributedLengthGroupedSampler

refs: #18003

* Change logging type in LengthGroupedSampler

- Change `logger.warning` to `logger.info`
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Change logging type in DistributedLengthGroupedSampler

- Change `logger.warning` to `logger.info`
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Remove adundant clause in LengthGroupedSampler

- Use `elif`
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Remove adundant clause in DistributedLengthGroupedSampler

- Use `elif`
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply black, isort to modified codes in the script
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

94ca7d2f

Fix slow CI by pinning resampy (#18077) · 9bd39685
Sylvain Gugger authored Jul 08, 2022
```
* Fix slow CI by pinning resampy

* Actually put it in the speech dependencies
```
9bd39685

07 Jul, 2022 5 commits
- Drop columns after loading samples in prepare_tf_dataset (#17967) · de46cde1
  Matt authored Jul 07, 2022
```
* Drop columns after loading samples, rather than before, to avoid breaking transforms

* make fixup

* Add workaround so this PR can work with current datasets version
```
  de46cde1
- [Generate Tests] Make sure no tokens are force-generated (#18053) · 2544c143
  Patrick von Platen authored Jul 07, 2022
  
  2544c143
- Added Command for windows VENV activation in installation docs (#18008) · 91c4a3ab
  varshith authored Jul 07, 2022
```
* Added command for windows VENV activation

* changed linux and macos  specification
```
  91c4a3ab
- Sort doc toc (#18034) · 1b749a7f
  Sylvain Gugger authored Jul 07, 2022
```
* Add script to sort doc ToC

* Style and fixes

* Add check to quality job
```
  1b749a7f
- Place inputs on device when include_inputs_for_metrics is True (#18046) · 1b5ea747
  Sylvain Gugger authored Jul 07, 2022
  
  1b5ea747
06 Jul, 2022 6 commits
- Skip failing test until @gante fix it. · 870ff9e1
  Sylvain Gugger authored Jul 06, 2022
  
  870ff9e1
- Doc to dataset (#18037) · 2e90c3df
  Sylvain Gugger authored Jul 06, 2022
```
* Link to the Datasets doc

* Remove unwanted file
```
  2e90c3df
- Protect `TFGenerationMixin.seed_generator` so it's not created at import (#18044) · be79cd7d
  Matt authored Jul 06, 2022
  
  be79cd7d
- TF: GPT-J compatible with XLA generation (#17986) · 360719a6
  Joao Gante authored Jul 06, 2022
  
  360719a6
- Fix T5 incorrect weight decay in Trainer and official summarization example (#18002) · bf37e5c7
  ADAning authored Jul 06, 2022
```
* Add ALL_LAYERNORM_LAYERS for LayerNorm

* fix bug of appending layer norm
```
  bf37e5c7
- Squash commits (#17981) · 22edb68d
  NielsRogge authored Jul 06, 2022
```
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
```
  22edb68d
05 Jul, 2022 4 commits
- Enable Past CI (#17919) · f6814372
  Yih-Dar authored Jul 05, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  f6814372
- Fix T5/mT5 tests (#18029) · 5ae087cf
  Matt authored Jul 05, 2022
  
  5ae087cf
- [Flax] Bump to v0.4.1 (#17966) · ec07eccc
  Sanchit Gandhi authored Jul 05, 2022
  
  ec07eccc
- Update expected values in DecisionTransformerModelIntegrationTest (#18016) · 97db5b42
  Yih-Dar authored Jul 05, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  97db5b42
04 Jul, 2022 10 commits

TF: T5 can now handle a padded past (i.e. XLA generation) (#17969) · f0982682
Joao Gante authored Jul 04, 2022
```
* get the right slicing index for position_bias
```
f0982682
fixed calculation of ctc loss in TFWav2Vec2ForCTC (#18014) · e3139ad3
Sreyan Ghosh authored Jul 04, 2022
```
Co-authored-by: Sreyan-G@NVIDIA <sreyang@nvidia.com>
```
e3139ad3

Return scalar losses instead of per-sample means (#18013) · 96d833b2

Matt authored Jul 04, 2022

* Return scalar losses instead of per-sample means

* Make loss shape (1,) instead of scalar

* Allow scalar losses in test_loss_computation

* Allow scalar losses in test_loss_computation

* Allow scalar losses in test_loss_computation

* Remove XLA loss function for RAG

96d833b2

sort list of models (#18011) · 6cb19540
Matthijs Hollemans authored Jul 04, 2022

6cb19540
Replace BloomTokenizer by BloomTokenizerFast in doc (#18005) · 7498db06
regisss authored Jul 04, 2022

7498db06
Fix typo in error message in generation_utils (#18000) · 3cfdefaa
regisss authored Jul 04, 2022

3cfdefaa

Refactor to inherit from nn.Module instead of nn.ModuleList (#17501) · cf2578ae

amyeroberts authored Jul 04, 2022

* Refactor to inherit from nn.Module instead of nn.ModuleList

* Fix typo

* Empty to trigger CI re-run

Blender Bot tests failing (should be unrelated to this PR) and pass locally). I don't have sufficient permisisons to re-run the CI workflow (totally or from failed)

cf2578ae

Add TF ResNet model (#17427) · 77ea5130

amyeroberts authored Jul 04, 2022



* Rought TF conversion outline

* Tidy up

* Fix padding differences between layers

* Add back embedder - whoops

* Match test file to main

* Match upstream test file

* Correctly pass and assign image_size parameter
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Add in MainLayer

* Correctly name layer

* Tidy up AdaptivePooler

* Small tidy-up

More accurate type hints and remove whitespaces

* Change AdaptiveAvgPool

Use the AdaptiveAvgPool implementation by @Rocketknight1, which correctly pools if the output shape does not evenly divide by input shape c.f. https://github.com/huggingface/transformers/pull/17554/files/9e26607e22aa8d069c86b50196656012ff0ce62a#r900109509

Co-authored-by: From: matt <rocketknight1@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Use updated AdaptiveAvgPool
Co-authored-by: matt <rocketknight1@gmail.com>

* Make AdaptiveAvgPool compatible with CPU

* Remove image_size from configuration

* Fixup

* Tensorflow -> TensorFlow

* Fix pt references in tests

* Apply suggestions from code review - grammar and wording
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Add TFResNet to doc tests

* PR comments - GlobalAveragePooling and clearer comments

* Remove unused import

* Add in keepdims argument

* Add num_channels check

* grammar fix: by -> of
Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Remove transposes - keep NHWC throughout forward pass

* Fixup look sharp

* Add missing layer names

* Final tidy up - remove from_pt now weights on hub
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

77ea5130

Add link to existing documentation (#17931) · 7b18702c
Lysandre Debut authored Jul 04, 2022

7b18702c
only a stupid typo, but it can lead to confusion (#17930) · a045cbd6
Dobatymo authored Jul 04, 2022

a045cbd6

01 Jul, 2022 6 commits

Exclude Databricks from notebook env only if the runtime is below 11.0 (#17988) · 49c8c67f

David Heryanto authored Jul 02, 2022

* Exclude Databricks from notebook env only if the runtime is below 11.0

* Dummy commit to trigger CI

* Empty commit to trigger CI

* Empty commit to trigger CI

* Empty commit to trigger CI

* Empty commit to trigger CI

* Empty commit to trigger CI

* Empty commit to trigger CI

* Empty commit to trigger CI

49c8c67f

Shifting labels for causal LM when using label smoother (#17987) · 6890d196

seungeunrho authored Jul 02, 2022



* Shifting labels for causal LM when using label smoother

When training CausalLM, loss is computed within model's foward() function and
labels are shifted internally. However, if label smoothing is applied, loss is
computed in trainer's compute_loss function and labels are not shifted.
This causes unintended confusion during the alignment of labels and corresponding
inputs. This commit is for resolving this confusion.

Resolves #17960

On branch shift_labels_for_causalLM
Changes to be committed:
	modified:   src/transformers/trainer.py
	modified:   src/transformers/trainer_pt_utils.py

* Update trainer.py

* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

6890d196

Restore original task in test_warning_logs (#17985) · 6f0723a9
Yih-Dar authored Jul 01, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
6f0723a9
Ensure PT model is in evaluation mode and lightweight forward pass done (#17970) · 009171d1
amyeroberts authored Jul 01, 2022

009171d1

XLA train step fixes (#17973) · d6cec458

Matt authored Jul 01, 2022

* Copy inputs to train and test step before modifying them, as this breaks things

* Add XLA tests, fix our loss functions to be XLA-compatible

* make fixup

* Update loss computation test to expect vector of per-sample losses

* Patch loss for TFLED

* Patch loss for TFAlbert

* Add a tf_legacy_loss config flag that enables old loss functions

* Stop using config.get() because it's not a dict

* Skip loss computation test for RAG because its loss is very strange and I'm afraid to rewrite it

* make fixup

* Add XLA-compatible RAG loss

* Fix dtype of loss mask for TFAlbert

* Fix test for XLNet too because it overrides the default one

* make fixup

* Fix config test

* No more depending on GPU NaN behaviour

* Add test, avoid potential zero division

* Fix test item assignment

* Fix loss computation masking test

* make fixup

* Fix dtype bugs

d6cec458

[Flax] Add remat (gradient checkpointing) (#17843) · 485bbe79

Sanchit Gandhi authored Jul 01, 2022

* [Flax] Add remat (gradient checkpointing)

* fix variable naming in test

* flip: checkpoint using a method

* fix naming

* fix class naming

* apply PVP's suggestions from code review

* make fix-copies

* fix big-bird, electra, roberta

* cookie-cutter

* fix flax big-bird

* move test to common

485bbe79