Commits · 6561fbcc6e6d6e1a29fb848dc34710aa25feae78 · chenpangpang / transformers

18 Jul, 2022 7 commits

Update TF(Vision)EncoderDecoderModel PT/TF equivalence tests (#18073) · 6561fbcc

Yih-Dar authored Jul 18, 2022


Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

6561fbcc

Fix expected loss values in some (m)T5 tests (#18177) · cb19c2af
Yih-Dar authored Jul 18, 2022
```
* fix expected loss values
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
cb19c2af

[HPO] update to sigopt new experiment api (#18147) · 7417f3ac

Wang, Yi authored Jul 18, 2022

* [HPO] update to sigopt new experiment api
* follow https://docs.sigopt.com/experiments

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* [HPO] use new API if sigopt version >= 8.0.0
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

7417f3ac

add ONNX support for LeVit (#18154) · 8c14b342
gcheron authored Jul 18, 2022
```
Co-authored-by: Guilhem Chéron <guilhemc@authentifier.com>
```
8c14b342

NLLB tokenizer (#18126) · c1c79b06

Lysandre Debut authored Jul 18, 2022



* NLLB tokenizer

* Apply suggestions from code review - Thanks Stefan!
Co-authored-by: Stefan Schweter <stefan@schweter.it>

* Final touches

* Style :)

* Update docs/source/en/model_doc/nllb.mdx
Co-authored-by: Stefan Schweter <stefan@schweter.it>

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* PR reviews

* Auto models
Co-authored-by: Stefan Schweter <stefan@schweter.it>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

c1c79b06

Fix incorrect type hint for lang (#18161) · a4f97e6c
John Giorgi authored Jul 18, 2022

a4f97e6c
Fix check for falsey inputs in run_summarization (#18155) · c46d39f3
John Giorgi authored Jul 18, 2022

c46d39f3

15 Jul, 2022 2 commits

Adding support for `device_map` directly in `pipeline(..)` function. (#17902) · ccc08978

Nicolas Patry authored Jul 15, 2022

* Adding support for `device_map` directly in `pipeline(..)` function.

* Updating the docstring.

* Adding a better docstring

* Put back type hints.

* Blacked. (`make fixup` didn't work ??!!)

ccc08978

Fixing a hard to trigger bug for `text-generation` pipeline. (#18131) · fca66ec4
Nicolas Patry authored Jul 15, 2022
```
* Fixing a bug where attention mask was not passed to generate.

* Fixing zero-size prompts.

* Comment on top.
```
fca66ec4

13 Jul, 2022 9 commits

Add TF DeiT implementation (#17806) · 8581a798

amyeroberts authored Jul 13, 2022



* Initial TF DeiT implementation

* Fix copies naming issues

* Fix up + docs

* Properly same main layer

* Name layers properly

* Initial TF DeiT implementation

* Fix copies naming issues

* Fix up + docs

* Properly same main layer

* Name layers properly

* Fixup

* Fix import

* Fix import

* Fix import

* Fix weight loading for tests whilst not on hub

* Add doc tests and remove to_2tuple

* Add back to_2tuple
Removing to_2tuple results in many downstream changes needed because of the copies checks

* Incorporate updates in Improve vision models #17731 PR

* Don't hard code num_channels

* Copy PyTorch DeiT embeddings and remove pytorch operations with mask

* Fix patch embeddings & tidy up

* Update PixelShuffle to move logic into class layer

* Update doc strings - remove PT references

* Use NHWC format in internal layers

* Fix up

* Use linear activation layer

* Remove unused import

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Move dataclass to top of file

* Remove from_pt now weights on hub

* Fixup
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Amy Roberts <amyeroberts@users.noreply.github.com>

8581a798

Enable torchdynamo with torch_tensorrt(fx path) (#17765) · 7ea6ccc2

Wei authored Jul 13, 2022



* enable fx2trt

* Update perf_train_gpu_one.mdx

* Update perf_train_gpu_one.mdx

* add lib check

* update

* format

* update

* fix import check

* fix isort

* improve doc

* refactor ctx manager

* fix isort

* black format

* isort fix

* fix format

* update args

* update black

* cleanups

* Update perf_train_gpu_one.mdx

* code refactor

* code refactor to init

* remove redundancy

* isort

* replace self.args with args
Co-authored-by: Stas Bekman <stas@stason.org>

7ea6ccc2

Make sharded checkpoints work in offline mode (#18125) · 37aeb578
Sylvain Gugger authored Jul 13, 2022
```
* Make sharded checkpoints work in offline mode

* Add test
```
37aeb578
Revert "Make sharded checkpoints work in offline mode" · 0a21a485
Sylvain Gugger authored Jul 13, 2022
```
This reverts commit 3564c657.
```
0a21a485
Make sharded checkpoints work in offline mode · 3564c657
Sylvain Gugger authored Jul 13, 2022

3564c657

add dataset split and config to model-index in TrainingSummary.from_trainer (#18064) · 56e6487c

lmagne authored Jul 13, 2022



* added metadata to training summary

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

56e6487c

Add summarization name mapping for MultiNews (#18117) · fde22c75
John Giorgi authored Jul 13, 2022
```
* Add summarization name mapping for MultiNews

* Add summarization name mapping for MultiNews
```
fde22c75

supported python versions reference (#18116) · 19513336

Sebastian Sosa authored Jul 13, 2022



* supported python versions reference

* Update CONTRIBUTING.md

removing commit hash from link
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

19513336

TF: unpack_inputs decorator independent from main_input_name (#18110) · 20509ab0
Joao Gante authored Jul 13, 2022

20509ab0

12 Jul, 2022 6 commits

TF: remove graph mode distinction when processing boolean options (#18102) · fcefa200
Joao Gante authored Jul 12, 2022

fcefa200

Fix BLOOM dtype (#17995) · bc34c211

Niklas Muennighoff authored Jul 12, 2022

* Add fp16 option

* Fix BLOOM dtype

* Formatting

* Remove torch_dtype arg

* Revert formatting

* Apply formatting

* Add n_embed backward compat

bc34c211

CLI: reenable `pt_to_tf` test (#18108) · 981714ef
Joao Gante authored Jul 12, 2022

981714ef

Report value for a step instead of epoch. (#18095) · f5221c06

wei zhao authored Jul 12, 2022



* Report value for a step instead of epoch.

Report an objective function value for a step instead of epoch to optuna.
I made this modification for the following reason:
If "eval_steps" is less than steps per epoch, there maybe warnings like this: "optuna/trial/_trial.py:592: UserWarning: The reported value is ignored because this `step` 0 is already reported.". So "step" are more appropriate than "epoch" here.

* MOD: make style.
Co-authored-by: zhaowei01 <zhaowei01@yuanfudao.com>

f5221c06

speed up test (#18106) · d4ebd4e1
Sijun He authored Jul 12, 2022

d4ebd4e1

Enhance IPEX integration in Trainer (#18072) · b7d8bd37

jianan-gu authored Jul 12, 2022



* enhance ipex import

* refine codes

* refine style

* add link

* style
Co-authored-by: Stas Bekman <stas@stason.org>

b7d8bd37

11 Jul, 2022 8 commits

Bloom Optimize operations (#17866) · a462fc92

Younes Belkada authored Jul 11, 2022



* fix tolerance for a bloom slow test

* enhance alibi padding

- get rid of for loops
- deals better with padded batched input
- avoid useless cpu/gpu communication when creating alibi
Co-authored-by: justheuristic <justheuristic@gmail.com>

* optimize attention mask

* fix scaled softmax limit values

* optimize building alibi tensor
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>

* fix attention_mask shape when it's None

* minor fixes

- fix docstring + arg names

* remove colons in docstring

* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* apply suggestion

* remove unsued arg

* refactor a bit

- use [:, None] for consistency

* refactor attention block
Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>

* quick fixes

* first attempt

* refactor attention block and fix all tests except "test_simple_generation"

- added comments to better explain attention block

* remove debug lines and add TODO comment

* change `torch.bmm` to `torch.baddbmm`
- fixes `test_simple_generation`but breaks `test_batch_generation_padd`

* styling

* all tests are passing now
- use `bmm`
- add explanation for `allow_fp16_reduced_precision_reduction`
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>

* styling
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>

* fix support for accelerate
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* remove attn softmax in fp32

* refactor comments

* refactor a bit

- remove warning message
- remove print on test

* refer to pytorch t5

* change the slow tests

- do the tests in fp32
- remove some comments
- keep large comments

* update expected output for `test_simple_generation`
- we now test using fp32

* make style + change comments a bit

* fix dtype padd test
Co-authored-by: justheuristic <justheuristic@gmail.com>
Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

a462fc92

Mark slow test as such · 5ff6f853
Sylvain Gugger authored Jul 11, 2022

5ff6f853
Add filename to info diaplyed when downloading things in from_pretrained (#18099) · b1b8222d
Sylvain Gugger authored Jul 11, 2022

b1b8222d
Fix image segmentation and object detection pipeline tests (#18100) · 6c8017a5
Sylvain Gugger authored Jul 11, 2022

6c8017a5
Skip failing tests · b0520f59
Sylvain Gugger authored Jul 11, 2022

b0520f59

Fix RESOURCE_EXHAUSTED error when dealing with large datasets in Flax example scripts (#18069) · 1e8140ca

Duong A. Nguyen authored Jul 11, 2022

* Fix RESOURCE_EXHAUSTED error for large datasets on Flax example scripts

* using np.permutation for creating batch_idx

* train_samples_idx -> training_samples_idx

* fix type hints

1e8140ca

Fix torchscript tests for GPT-NeoX (#18012) · ac98a88f

Yih-Dar authored Jul 11, 2022



* fix dtype issue in _attn

* fix RotaryEmbedding

* fix RotaryEmbedding 2

* clean up
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

ac98a88f

Fix some typos. (#17560) · 95113d13

Yulv-git authored Jul 11, 2022



* Fix some typos.
Signed-off-by: Yulv-git <yulvchi@qq.com>

* Fix typo.
Signed-off-by: Yulv-git <yulvchi@qq.com>

* make fixup.

95113d13

10 Jul, 2022 1 commit
- [bloom] fix alibi device placement (#18087) · ad28ca29
  Stas Bekman authored Jul 10, 2022
  
  ad28ca29
08 Jul, 2022 4 commits

Make predict() close progress bars after finishing (#17952) (#18078) · 8b332a6a

neverix authored Jul 08, 2022

* Make Trainer.predict call on_evaluate (#17952)

* Add on_predict

* Small fix

* Small and different fix

* Add tests

8b332a6a

Update localized READMES when template is filled. (#18062) · 7c046c5c
Sylvain Gugger authored Jul 08, 2022

7c046c5c

Fix type issue in using bucketing with Trainer (#18051) · 94ca7d2f

BOSEOP KIM authored Jul 09, 2022



* Fix type issue in using bucketing with Trainer

- Fix type issues in LengthGrouperSampler,
  DistributedLengthGroupedSampler

refs: #18003

* Change logging type in LengthGroupedSampler

- Change `logger.warning` to `logger.info`
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Change logging type in DistributedLengthGroupedSampler

- Change `logger.warning` to `logger.info`
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Remove adundant clause in LengthGroupedSampler

- Use `elif`
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Remove adundant clause in DistributedLengthGroupedSampler

- Use `elif`
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply black, isort to modified codes in the script
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

94ca7d2f

Fix slow CI by pinning resampy (#18077) · 9bd39685
Sylvain Gugger authored Jul 08, 2022
```
* Fix slow CI by pinning resampy

* Actually put it in the speech dependencies
```
9bd39685

07 Jul, 2022 3 commits
- Drop columns after loading samples in prepare_tf_dataset (#17967) · de46cde1
  Matt authored Jul 07, 2022
```
* Drop columns after loading samples, rather than before, to avoid breaking transforms

* make fixup

* Add workaround so this PR can work with current datasets version
```
  de46cde1
- [Generate Tests] Make sure no tokens are force-generated (#18053) · 2544c143
  Patrick von Platen authored Jul 07, 2022
  
  2544c143
- Added Command for windows VENV activation in installation docs (#18008) · 91c4a3ab
  varshith authored Jul 07, 2022
```
* Added command for windows VENV activation

* changed linux and macos  specification
```
  91c4a3ab