Commits · cecf9f9b2748c760287473ef9c1b61bddc787f58 · chenpangpang / transformers

"docs/source/en/training.md" did not exist on "77321481247787c97568c3b9f64b19e22351bab8"

24 Aug, 2022 3 commits
- fix pipeline_tutorial.mdx doctest (#18717) · cecf9f9b
  Yih-Dar authored Aug 24, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  cecf9f9b
- Add minor doc-string change to include hp_name param in hyperparameter_search (#18700) · a442884b
  Constantin Hütterer authored Aug 24, 2022
```
* Add minor doc-string change to include hp_name

* fix: missing type-information for kwargs

* fix: missing white-space in hyperparameter_search doc-strings
```
  a442884b
- Update perf_infer_gpu_many.mdx (#18744) · c12dbdc2
  Mishig Davaadorj authored Aug 24, 2022
  
  c12dbdc2
23 Aug, 2022 4 commits
- CLI: Don't check the model head when there is no model head (#18733) · 6faf2832
  Joao Gante authored Aug 23, 2022
  
  6faf2832
- improve `add_tokens` docstring (#18687) · 43869808
  SaulLu authored Aug 23, 2022
```
* improve add_tokens documentation

* format
```
  43869808
- Removing warning of model type for `microsoft/tapex-base-finetuned-wtq` (#18711) · 891704b3
  Nicolas Patry authored Aug 23, 2022
```
and friends.
```
  891704b3
- Unpin detectron2 (#18727) · 84beb8a4
  Yih-Dar authored Aug 23, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  84beb8a4
22 Aug, 2022 3 commits
- remove check for main process for trackers initialization (#18706) · d90a36d1
  Atharva Ingle authored Aug 22, 2022
  
  d90a36d1
- Add missing tokenizer tests - Longformer (#17677) · 0f257a87
  tgadeliya authored Aug 22, 2022
  
  0f257a87
- Fix Data2VecVision ONNX test (#18587) · 3fa45dbd
  Yih-Dar authored Aug 22, 2022
```
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  3fa45dbd
19 Aug, 2022 3 commits
- [Hotfix] pin detectron2 5aeb252 to avoid test fix (#18701) · 30992ef0
  Yih-Dar authored Aug 20, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  30992ef0
- Temp fix for broken detectron2 import (#18699) · 1f3c2282
  Patrick von Platen authored Aug 19, 2022
```
* add first generation tutorial

* [Circle CI] Temporary fix for broken detectron2 import

* remove generation
```
  1f3c2282
- Generate: add missing `**model_kwargs` in sample tests (#18696) · e95d433d
  Joao Gante authored Aug 19, 2022
  
  e95d433d
18 Aug, 2022 12 commits

`model.tie_weights()` should be applied after `accelerator.prepare()` (#18676) · e54a1b49

Atharva Ingle authored Aug 18, 2022

* `model.tie_weights()` should be applied after `accelerator.prepare`

Weight tying should be done after the model has been moved to XLA device as mentioned on PyTorch/XLA Troubleshooting guide [here](https://github.com/pytorch/xla/blob/master/TROUBLESHOOTING.md#xla-tensor-quirks)

* format code

e54a1b49

Add an examples folder for code downstream tasks (#18679) · bbbb453e

Loubna Ben Allal authored Aug 18, 2022

* add examples subfolder

* mention examples in codeparrot readme

* use Trainer optimizer and scheduler type and add output_dir as argument

* add example of text-to-python and python-to-text models

* mention the downstream examples in the readme

* fix typo

bbbb453e

[bnb] Move documentation (#18671) · a123eee9

Younes Belkada authored Aug 18, 2022



* fix bnb documentation

- move bnb documentation to `infer_gpu_many`

* small refactoring

- added text on infer_gpu_one
- added a small note on infer_gpu_many
- added customized multi gpu example on infer_gpu_many

* Update docs/source/en/perf_infer_gpu_many.mdx
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* apply suggestions
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

a123eee9

Add evaluate to examples requirements (#18666) · 358fc186
Zachary Mueller authored Aug 18, 2022

358fc186
Fix breaking change in `onnxruntime` for ONNX quantization (#18336) · d243112b
Severin Simmler authored Aug 18, 2022
```
* Fix quantization

* Save model

* Remove unused comments

* Fix formatting
```
d243112b
Fix repo consistency (#18682) · 5987c637
lewtun authored Aug 18, 2022

5987c637
Rename second input dimension from "sequence" to "num_channels" for CV models (#17976) · 76454b08
regisss authored Aug 18, 2022

76454b08
Rename method to avoid clash with property (#18677) · 780253ce
amyeroberts authored Aug 18, 2022

780253ce
Ping `detectron2` for CircleCI tests (#18680) · 2c947d29
Yih-Dar authored Aug 18, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
2c947d29
Generate: validate model_kwargs on FLAX (and catch typos in generate arguments) (#18653) · a541d974
Joao Gante authored Aug 18, 2022

a541d974

[LongT5] Correct docs long t5 (#18669) · 0ea53822

Patrick von Platen authored Aug 18, 2022

* add first generation tutorial

* [LongT5 Docs] Correct docs

* correct expected string

* remove incorrect file

0ea53822

Allow users to force TF availability (#18650) · 582c5371
Matt authored Aug 18, 2022
```
* Allow users to force TF availability

* Correctly name the envvar!
```
582c5371

17 Aug, 2022 4 commits

Update feature extractor methods to enable type cast before normalize (#18499) · 49e44b21

amyeroberts authored Aug 17, 2022

* Update methods to optionally rescale
This is necessary to allow for casting our images / videos to numpy arrays within the feature extractors' call. We want to do this to make sure the behaviour is as expected when flags like  are False. If some transformations aren't applied, then the output type can't be unexpected e.g. a list of PIL images instead of numpy arrays.

* Cast images to numpy arrays in call to enable consistent behaviour with different configs

* Remove accidental clip changes

* Update tests to reflect the scaling logic
We write a generic  function to handle rescaling of our arrays. In order for the API to be intuitive, we take some factor c and rescale the image values by that. This means, the rescaling done in normalize and to_numpy_array are now done with array * (1/255) instead of array / 255. This leads to small differences in the resulting image. When testing, this was in the order of 1e-8, and so deemed OK

49e44b21

Fix matmul inputs dtype (#18585) · 86d0b26d
Jingya HUANG authored Aug 17, 2022

86d0b26d

Fix Yolos ONNX export test (#18606) · c99e9846

Yih-Dar authored Aug 17, 2022


Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

c99e9846

Examples: add Bloom support for token classification (#18632) · 358478e7

Stefan Schweter authored Aug 17, 2022

* examples: add Bloom support for token classification (FLAX, PyTorch and TensorFlow)

* examples: remove support for Bloom in token classication (FLAX and TensorFlow currently have no support for it)

358478e7

16 Aug, 2022 7 commits

[bnb] Minor modifications (#18631) · 6d175c11

Younes Belkada authored Aug 17, 2022



* bnb minor modifications

- refactor documentation
- add troubleshooting README
- add PyPi library on DockerFile

* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

* put in one block

- put bash instructions in one block

* update readme

- refactor a bit hardware requirements

* change text a bit

* Apply suggestions from code review
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* apply suggestions
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* add link to paper

* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update tests/mixed_int8/README.md

* Apply suggestions from code review

* refactor a bit

* add instructions Turing & Amperer
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* add A6000

* clarify a bit

* remove small part

* Update tests/mixed_int8/README.md
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

6d175c11

Update run_translation_no_trainer.py (#18637) · 25e651a2

zhoutang776 authored Aug 16, 2022

* Update run_translation_no_trainer.py

found an error in selecting `no_decay` parameters and some small modifications when the user continues to train from a checkpoint

* fixs `no_decay` and `resume_step` issue

1. change `no_decay` list
2. if use continue to train their model from provided checkpoint, the `resume_step` will not be initialized properly if `args.gradient_accumulation_steps != 1`

25e651a2

Update longt5.mdx (#18634) · a27195b1
flozi00 authored Aug 16, 2022

a27195b1
TF: Fix generation repetition penalty with XLA (#18648) · fd9aa82b
Joao Gante authored Aug 16, 2022

fd9aa82b
Add checks for some workflow jobs (#18583) · 81ab1112
Yih-Dar authored Aug 16, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
81ab1112
Change scheduled CIs to use torch 1.12.1 (#18644) · 510c2a0b
Yih-Dar authored Aug 16, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
510c2a0b

mac m1 `mps` integration (#18598) · 9cf27468

Sourab Mangrulkar authored Aug 16, 2022



* mac m1 `mps` integration

* Update docs/source/en/main_classes/trainer.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* addressing comments

* Apply suggestions from code review
Co-authored-by: Dan Saattrup Nielsen <47701536+saattrupdan@users.noreply.github.com>

* resolve comment
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Dan Saattrup Nielsen <47701536+saattrupdan@users.noreply.github.com>

9cf27468

14 Aug, 2022 1 commit

Flax Remat for LongT5 (#17994) · d6eeb871

Karim Foda authored Aug 14, 2022



* [Flax] Add remat (gradient checkpointing)

* fix variable naming in test

* flip: checkpoint using a method

* fix naming

* fix class naming

* apply PVP's suggestions from code review

* add gradient_checkpointing to examples

* Add gradient_checkpointing to run_mlm_flax

* Add remat to longt5

* Add gradient checkpointing test longt5

* Fix args errors

* Fix remaining tests

* Make fixup & quality fixes

* replace kwargs

* remove unecessary kwargs

* Make fixup changes

* revert long_t5_flax changes

* Remove return_dict and copy to LongT5

* Remove test_gradient_checkpointing
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>

d6eeb871

12 Aug, 2022 3 commits

small change (#18584) · 1ccd2515
Younes Belkada authored Aug 12, 2022

1ccd2515

[fsmt] deal with -100 indices in decoder ids (#18592) · b3ff7c68

Stas Bekman authored Aug 12, 2022

* [fsmt] deal with -100 indices in decoder ids

Fixes: https://github.com/huggingface/transformers/issues/17945

decoder ids get the default index -100, which breaks the model - like t5 and many other models add a fix to replace -100 with the correct pad index. 

For some reason this use case hasn't been used with this model until recently - so this issue was there since the beginning it seems.

Any suggestions to how to add a simple test here? or perhaps we have something similar already? user's script is quite massive.

* style

b3ff7c68

[doc] fix anchors (#18591) · 37c59918

Stas Bekman authored Aug 12, 2022

the manual anchors end up being duplicated with automatically added anchors and no longer work.

37c59918