Commits · 195133363e0475b6c0f3fde09263137c43bbded7 · chenpangpang / transformers

13 Jul, 2022 2 commits

supported python versions reference (#18116) · 19513336

Sebastian Sosa authored Jul 13, 2022



* supported python versions reference

* Update CONTRIBUTING.md

removing commit hash from link
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

19513336

TF: unpack_inputs decorator independent from main_input_name (#18110) · 20509ab0
Joao Gante authored Jul 13, 2022

20509ab0

12 Jul, 2022 6 commits

TF: remove graph mode distinction when processing boolean options (#18102) · fcefa200
Joao Gante authored Jul 12, 2022

fcefa200

Fix BLOOM dtype (#17995) · bc34c211

Niklas Muennighoff authored Jul 12, 2022

* Add fp16 option

* Fix BLOOM dtype

* Formatting

* Remove torch_dtype arg

* Revert formatting

* Apply formatting

* Add n_embed backward compat

bc34c211

CLI: reenable `pt_to_tf` test (#18108) · 981714ef
Joao Gante authored Jul 12, 2022

981714ef

Report value for a step instead of epoch. (#18095) · f5221c06

wei zhao authored Jul 12, 2022



* Report value for a step instead of epoch.

Report an objective function value for a step instead of epoch to optuna.
I made this modification for the following reason:
If "eval_steps" is less than steps per epoch, there maybe warnings like this: "optuna/trial/_trial.py:592: UserWarning: The reported value is ignored because this `step` 0 is already reported.". So "step" are more appropriate than "epoch" here.

* MOD: make style.
Co-authored-by: zhaowei01 <zhaowei01@yuanfudao.com>

f5221c06

speed up test (#18106) · d4ebd4e1
Sijun He authored Jul 12, 2022

d4ebd4e1

Enhance IPEX integration in Trainer (#18072) · b7d8bd37

jianan-gu authored Jul 12, 2022



* enhance ipex import

* refine codes

* refine style

* add link

* style
Co-authored-by: Stas Bekman <stas@stason.org>

b7d8bd37

11 Jul, 2022 8 commits

Bloom Optimize operations (#17866) · a462fc92

Younes Belkada authored Jul 11, 2022



* fix tolerance for a bloom slow test

* enhance alibi padding

- get rid of for loops
- deals better with padded batched input
- avoid useless cpu/gpu communication when creating alibi
Co-authored-by: justheuristic <justheuristic@gmail.com>

* optimize attention mask

* fix scaled softmax limit values

* optimize building alibi tensor
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>

* fix attention_mask shape when it's None

* minor fixes

- fix docstring + arg names

* remove colons in docstring

* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* apply suggestion

* remove unsued arg

* refactor a bit

- use [:, None] for consistency

* refactor attention block
Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>

* quick fixes

* first attempt

* refactor attention block and fix all tests except "test_simple_generation"

- added comments to better explain attention block

* remove debug lines and add TODO comment

* change `torch.bmm` to `torch.baddbmm`
- fixes `test_simple_generation`but breaks `test_batch_generation_padd`

* styling

* all tests are passing now
- use `bmm`
- add explanation for `allow_fp16_reduced_precision_reduction`
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>

* styling
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>

* fix support for accelerate
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* remove attn softmax in fp32

* refactor comments

* refactor a bit

- remove warning message
- remove print on test

* refer to pytorch t5

* change the slow tests

- do the tests in fp32
- remove some comments
- keep large comments

* update expected output for `test_simple_generation`
- we now test using fp32

* make style + change comments a bit

* fix dtype padd test
Co-authored-by: justheuristic <justheuristic@gmail.com>
Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

a462fc92

Mark slow test as such · 5ff6f853
Sylvain Gugger authored Jul 11, 2022

5ff6f853
Add filename to info diaplyed when downloading things in from_pretrained (#18099) · b1b8222d
Sylvain Gugger authored Jul 11, 2022

b1b8222d
Fix image segmentation and object detection pipeline tests (#18100) · 6c8017a5
Sylvain Gugger authored Jul 11, 2022

6c8017a5
Skip failing tests · b0520f59
Sylvain Gugger authored Jul 11, 2022

b0520f59

Fix RESOURCE_EXHAUSTED error when dealing with large datasets in Flax example scripts (#18069) · 1e8140ca

Duong A. Nguyen authored Jul 11, 2022

* Fix RESOURCE_EXHAUSTED error for large datasets on Flax example scripts

* using np.permutation for creating batch_idx

* train_samples_idx -> training_samples_idx

* fix type hints

1e8140ca

Fix torchscript tests for GPT-NeoX (#18012) · ac98a88f

Yih-Dar authored Jul 11, 2022



* fix dtype issue in _attn

* fix RotaryEmbedding

* fix RotaryEmbedding 2

* clean up
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

ac98a88f

Fix some typos. (#17560) · 95113d13

Yulv-git authored Jul 11, 2022



* Fix some typos.
Signed-off-by: Yulv-git <yulvchi@qq.com>

* Fix typo.
Signed-off-by: Yulv-git <yulvchi@qq.com>

* make fixup.

95113d13

10 Jul, 2022 1 commit
- [bloom] fix alibi device placement (#18087) · ad28ca29
  Stas Bekman authored Jul 10, 2022
  
  ad28ca29
08 Jul, 2022 4 commits

Make predict() close progress bars after finishing (#17952) (#18078) · 8b332a6a

neverix authored Jul 08, 2022

* Make Trainer.predict call on_evaluate (#17952)

* Add on_predict

* Small fix

* Small and different fix

* Add tests

8b332a6a

Update localized READMES when template is filled. (#18062) · 7c046c5c
Sylvain Gugger authored Jul 08, 2022

7c046c5c

Fix type issue in using bucketing with Trainer (#18051) · 94ca7d2f

BOSEOP KIM authored Jul 09, 2022



* Fix type issue in using bucketing with Trainer

- Fix type issues in LengthGrouperSampler,
  DistributedLengthGroupedSampler

refs: #18003

* Change logging type in LengthGroupedSampler

- Change `logger.warning` to `logger.info`
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Change logging type in DistributedLengthGroupedSampler

- Change `logger.warning` to `logger.info`
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Remove adundant clause in LengthGroupedSampler

- Use `elif`
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Remove adundant clause in DistributedLengthGroupedSampler

- Use `elif`
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply black, isort to modified codes in the script
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

94ca7d2f

Fix slow CI by pinning resampy (#18077) · 9bd39685
Sylvain Gugger authored Jul 08, 2022
```
* Fix slow CI by pinning resampy

* Actually put it in the speech dependencies
```
9bd39685

07 Jul, 2022 5 commits
- Drop columns after loading samples in prepare_tf_dataset (#17967) · de46cde1
  Matt authored Jul 07, 2022
```
* Drop columns after loading samples, rather than before, to avoid breaking transforms

* make fixup

* Add workaround so this PR can work with current datasets version
```
  de46cde1
- [Generate Tests] Make sure no tokens are force-generated (#18053) · 2544c143
  Patrick von Platen authored Jul 07, 2022
  
  2544c143
- Added Command for windows VENV activation in installation docs (#18008) · 91c4a3ab
  varshith authored Jul 07, 2022
```
* Added command for windows VENV activation

* changed linux and macos  specification
```
  91c4a3ab
- Sort doc toc (#18034) · 1b749a7f
  Sylvain Gugger authored Jul 07, 2022
```
* Add script to sort doc ToC

* Style and fixes

* Add check to quality job
```
  1b749a7f
- Place inputs on device when include_inputs_for_metrics is True (#18046) · 1b5ea747
  Sylvain Gugger authored Jul 07, 2022
  
  1b5ea747
06 Jul, 2022 6 commits
- Skip failing test until @gante fix it. · 870ff9e1
  Sylvain Gugger authored Jul 06, 2022
  
  870ff9e1
- Doc to dataset (#18037) · 2e90c3df
  Sylvain Gugger authored Jul 06, 2022
```
* Link to the Datasets doc

* Remove unwanted file
```
  2e90c3df
- Protect `TFGenerationMixin.seed_generator` so it's not created at import (#18044) · be79cd7d
  Matt authored Jul 06, 2022
  
  be79cd7d
- TF: GPT-J compatible with XLA generation (#17986) · 360719a6
  Joao Gante authored Jul 06, 2022
  
  360719a6
- Fix T5 incorrect weight decay in Trainer and official summarization example (#18002) · bf37e5c7
  ADAning authored Jul 06, 2022
```
* Add ALL_LAYERNORM_LAYERS for LayerNorm

* fix bug of appending layer norm
```
  bf37e5c7
- Squash commits (#17981) · 22edb68d
  NielsRogge authored Jul 06, 2022
```
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
```
  22edb68d
05 Jul, 2022 4 commits
- Enable Past CI (#17919) · f6814372
  Yih-Dar authored Jul 05, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  f6814372
- Fix T5/mT5 tests (#18029) · 5ae087cf
  Matt authored Jul 05, 2022
  
  5ae087cf
- [Flax] Bump to v0.4.1 (#17966) · ec07eccc
  Sanchit Gandhi authored Jul 05, 2022
  
  ec07eccc
- Update expected values in DecisionTransformerModelIntegrationTest (#18016) · 97db5b42
  Yih-Dar authored Jul 05, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  97db5b42
04 Jul, 2022 4 commits
- TF: T5 can now handle a padded past (i.e. XLA generation) (#17969) · f0982682
  Joao Gante authored Jul 04, 2022
```
* get the right slicing index for position_bias
```
  f0982682
- fixed calculation of ctc loss in TFWav2Vec2ForCTC (#18014) · e3139ad3
  Sreyan Ghosh authored Jul 04, 2022
```
Co-authored-by: Sreyan-G@NVIDIA <sreyang@nvidia.com>
```
  e3139ad3
- Return scalar losses instead of per-sample means (#18013) · 96d833b2
  Matt authored Jul 04, 2022
```
* Return scalar losses instead of per-sample means

* Make loss shape (1,) instead of scalar

* Allow scalar losses in test_loss_computation

* Allow scalar losses in test_loss_computation

* Allow scalar losses in test_loss_computation

* Remove XLA loss function for RAG
```
  96d833b2
- sort list of models (#18011) · 6cb19540
  Matthijs Hollemans authored Jul 04, 2022
  
  6cb19540