Commits · a462fc9232eb9e04f9bc9c710d8c9a3ac21056de · chenpangpang / transformers

11 Jul, 2022 8 commits

Bloom Optimize operations (#17866) · a462fc92

Younes Belkada authored Jul 11, 2022



* fix tolerance for a bloom slow test

* enhance alibi padding

- get rid of for loops
- deals better with padded batched input
- avoid useless cpu/gpu communication when creating alibi
Co-authored-by: justheuristic <justheuristic@gmail.com>

* optimize attention mask

* fix scaled softmax limit values

* optimize building alibi tensor
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>

* fix attention_mask shape when it's None

* minor fixes

- fix docstring + arg names

* remove colons in docstring

* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* apply suggestion

* remove unsued arg

* refactor a bit

- use [:, None] for consistency

* refactor attention block
Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>

* quick fixes

* first attempt

* refactor attention block and fix all tests except "test_simple_generation"

- added comments to better explain attention block

* remove debug lines and add TODO comment

* change `torch.bmm` to `torch.baddbmm`
- fixes `test_simple_generation`but breaks `test_batch_generation_padd`

* styling

* all tests are passing now
- use `bmm`
- add explanation for `allow_fp16_reduced_precision_reduction`
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>

* styling
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>

* fix support for accelerate
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* remove attn softmax in fp32

* refactor comments

* refactor a bit

- remove warning message
- remove print on test

* refer to pytorch t5

* change the slow tests

- do the tests in fp32
- remove some comments
- keep large comments

* update expected output for `test_simple_generation`
- we now test using fp32

* make style + change comments a bit

* fix dtype padd test
Co-authored-by: justheuristic <justheuristic@gmail.com>
Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

a462fc92

Mark slow test as such · 5ff6f853
Sylvain Gugger authored Jul 11, 2022

5ff6f853
Add filename to info diaplyed when downloading things in from_pretrained (#18099) · b1b8222d
Sylvain Gugger authored Jul 11, 2022

b1b8222d
Fix image segmentation and object detection pipeline tests (#18100) · 6c8017a5
Sylvain Gugger authored Jul 11, 2022

6c8017a5
Skip failing tests · b0520f59
Sylvain Gugger authored Jul 11, 2022

b0520f59

Fix RESOURCE_EXHAUSTED error when dealing with large datasets in Flax example scripts (#18069) · 1e8140ca

Duong A. Nguyen authored Jul 11, 2022

* Fix RESOURCE_EXHAUSTED error for large datasets on Flax example scripts

* using np.permutation for creating batch_idx

* train_samples_idx -> training_samples_idx

* fix type hints

1e8140ca

Fix torchscript tests for GPT-NeoX (#18012) · ac98a88f

Yih-Dar authored Jul 11, 2022



* fix dtype issue in _attn

* fix RotaryEmbedding

* fix RotaryEmbedding 2

* clean up
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

ac98a88f

Fix some typos. (#17560) · 95113d13

Yulv-git authored Jul 11, 2022



* Fix some typos.
Signed-off-by: Yulv-git <yulvchi@qq.com>

* Fix typo.
Signed-off-by: Yulv-git <yulvchi@qq.com>

* make fixup.

95113d13

10 Jul, 2022 1 commit
- [bloom] fix alibi device placement (#18087) · ad28ca29
  Stas Bekman authored Jul 10, 2022
  
  ad28ca29
08 Jul, 2022 4 commits

Make predict() close progress bars after finishing (#17952) (#18078) · 8b332a6a

neverix authored Jul 08, 2022

* Make Trainer.predict call on_evaluate (#17952)

* Add on_predict

* Small fix

* Small and different fix

* Add tests

8b332a6a

Update localized READMES when template is filled. (#18062) · 7c046c5c
Sylvain Gugger authored Jul 08, 2022

7c046c5c

Fix type issue in using bucketing with Trainer (#18051) · 94ca7d2f

BOSEOP KIM authored Jul 09, 2022



* Fix type issue in using bucketing with Trainer

- Fix type issues in LengthGrouperSampler,
  DistributedLengthGroupedSampler

refs: #18003

* Change logging type in LengthGroupedSampler

- Change `logger.warning` to `logger.info`
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Change logging type in DistributedLengthGroupedSampler

- Change `logger.warning` to `logger.info`
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Remove adundant clause in LengthGroupedSampler

- Use `elif`
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Remove adundant clause in DistributedLengthGroupedSampler

- Use `elif`
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply black, isort to modified codes in the script
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

94ca7d2f

Fix slow CI by pinning resampy (#18077) · 9bd39685
Sylvain Gugger authored Jul 08, 2022
```
* Fix slow CI by pinning resampy

* Actually put it in the speech dependencies
```
9bd39685

07 Jul, 2022 5 commits
- Drop columns after loading samples in prepare_tf_dataset (#17967) · de46cde1
  Matt authored Jul 07, 2022
```
* Drop columns after loading samples, rather than before, to avoid breaking transforms

* make fixup

* Add workaround so this PR can work with current datasets version
```
  de46cde1
- [Generate Tests] Make sure no tokens are force-generated (#18053) · 2544c143
  Patrick von Platen authored Jul 07, 2022
  
  2544c143
- Added Command for windows VENV activation in installation docs (#18008) · 91c4a3ab
  varshith authored Jul 07, 2022
```
* Added command for windows VENV activation

* changed linux and macos  specification
```
  91c4a3ab
- Sort doc toc (#18034) · 1b749a7f
  Sylvain Gugger authored Jul 07, 2022
```
* Add script to sort doc ToC

* Style and fixes

* Add check to quality job
```
  1b749a7f
- Place inputs on device when include_inputs_for_metrics is True (#18046) · 1b5ea747
  Sylvain Gugger authored Jul 07, 2022
  
  1b5ea747
06 Jul, 2022 6 commits
- Skip failing test until @gante fix it. · 870ff9e1
  Sylvain Gugger authored Jul 06, 2022
  
  870ff9e1
- Doc to dataset (#18037) · 2e90c3df
  Sylvain Gugger authored Jul 06, 2022
```
* Link to the Datasets doc

* Remove unwanted file
```
  2e90c3df
- Protect `TFGenerationMixin.seed_generator` so it's not created at import (#18044) · be79cd7d
  Matt authored Jul 06, 2022
  
  be79cd7d
- TF: GPT-J compatible with XLA generation (#17986) · 360719a6
  Joao Gante authored Jul 06, 2022
  
  360719a6
- Fix T5 incorrect weight decay in Trainer and official summarization example (#18002) · bf37e5c7
  ADAning authored Jul 06, 2022
```
* Add ALL_LAYERNORM_LAYERS for LayerNorm

* fix bug of appending layer norm
```
  bf37e5c7
- Squash commits (#17981) · 22edb68d
  NielsRogge authored Jul 06, 2022
```
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
```
  22edb68d
05 Jul, 2022 4 commits
- Enable Past CI (#17919) · f6814372
  Yih-Dar authored Jul 05, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  f6814372
- Fix T5/mT5 tests (#18029) · 5ae087cf
  Matt authored Jul 05, 2022
  
  5ae087cf
- [Flax] Bump to v0.4.1 (#17966) · ec07eccc
  Sanchit Gandhi authored Jul 05, 2022
  
  ec07eccc
- Update expected values in DecisionTransformerModelIntegrationTest (#18016) · 97db5b42
  Yih-Dar authored Jul 05, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  97db5b42
04 Jul, 2022 10 commits

TF: T5 can now handle a padded past (i.e. XLA generation) (#17969) · f0982682
Joao Gante authored Jul 04, 2022
```
* get the right slicing index for position_bias
```
f0982682
fixed calculation of ctc loss in TFWav2Vec2ForCTC (#18014) · e3139ad3
Sreyan Ghosh authored Jul 04, 2022
```
Co-authored-by: Sreyan-G@NVIDIA <sreyang@nvidia.com>
```
e3139ad3

Return scalar losses instead of per-sample means (#18013) · 96d833b2

Matt authored Jul 04, 2022

* Return scalar losses instead of per-sample means

* Make loss shape (1,) instead of scalar

* Allow scalar losses in test_loss_computation

* Allow scalar losses in test_loss_computation

* Allow scalar losses in test_loss_computation

* Remove XLA loss function for RAG

96d833b2

sort list of models (#18011) · 6cb19540
Matthijs Hollemans authored Jul 04, 2022

6cb19540
Replace BloomTokenizer by BloomTokenizerFast in doc (#18005) · 7498db06
regisss authored Jul 04, 2022

7498db06
Fix typo in error message in generation_utils (#18000) · 3cfdefaa
regisss authored Jul 04, 2022

3cfdefaa

Refactor to inherit from nn.Module instead of nn.ModuleList (#17501) · cf2578ae

amyeroberts authored Jul 04, 2022

* Refactor to inherit from nn.Module instead of nn.ModuleList

* Fix typo

* Empty to trigger CI re-run

Blender Bot tests failing (should be unrelated to this PR) and pass locally). I don't have sufficient permisisons to re-run the CI workflow (totally or from failed)

cf2578ae

Add TF ResNet model (#17427) · 77ea5130

amyeroberts authored Jul 04, 2022



* Rought TF conversion outline

* Tidy up

* Fix padding differences between layers

* Add back embedder - whoops

* Match test file to main

* Match upstream test file

* Correctly pass and assign image_size parameter
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Add in MainLayer

* Correctly name layer

* Tidy up AdaptivePooler

* Small tidy-up

More accurate type hints and remove whitespaces

* Change AdaptiveAvgPool

Use the AdaptiveAvgPool implementation by @Rocketknight1, which correctly pools if the output shape does not evenly divide by input shape c.f. https://github.com/huggingface/transformers/pull/17554/files/9e26607e22aa8d069c86b50196656012ff0ce62a#r900109509

Co-authored-by: From: matt <rocketknight1@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Use updated AdaptiveAvgPool
Co-authored-by: matt <rocketknight1@gmail.com>

* Make AdaptiveAvgPool compatible with CPU

* Remove image_size from configuration

* Fixup

* Tensorflow -> TensorFlow

* Fix pt references in tests

* Apply suggestions from code review - grammar and wording
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Add TFResNet to doc tests

* PR comments - GlobalAveragePooling and clearer comments

* Remove unused import

* Add in keepdims argument

* Add num_channels check

* grammar fix: by -> of
Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Remove transposes - keep NHWC throughout forward pass

* Fixup look sharp

* Add missing layer names

* Final tidy up - remove from_pt now weights on hub
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

77ea5130

Add link to existing documentation (#17931) · 7b18702c
Lysandre Debut authored Jul 04, 2022

7b18702c
only a stupid typo, but it can lead to confusion (#17930) · a045cbd6
Dobatymo authored Jul 04, 2022

a045cbd6

01 Jul, 2022 2 commits

Exclude Databricks from notebook env only if the runtime is below 11.0 (#17988) · 49c8c67f

David Heryanto authored Jul 02, 2022

* Exclude Databricks from notebook env only if the runtime is below 11.0

* Dummy commit to trigger CI

* Empty commit to trigger CI

* Empty commit to trigger CI

* Empty commit to trigger CI

* Empty commit to trigger CI

* Empty commit to trigger CI

* Empty commit to trigger CI

* Empty commit to trigger CI

49c8c67f

Shifting labels for causal LM when using label smoother (#17987) · 6890d196

seungeunrho authored Jul 02, 2022



* Shifting labels for causal LM when using label smoother

When training CausalLM, loss is computed within model's foward() function and
labels are shifted internally. However, if label smoothing is applied, loss is
computed in trainer's compute_loss function and labels are not shifted.
This causes unintended confusion during the alignment of labels and corresponding
inputs. This commit is for resolving this confusion.

Resolves #17960

On branch shift_labels_for_causalLM
Changes to be committed:
	modified:   src/transformers/trainer.py
	modified:   src/transformers/trainer_pt_utils.py

* Update trainer.py

* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

6890d196