Commits · 4c295a265b19ed5a423f5135d1db87714f06b82e · chenpangpang / transformers

29 Mar, 2023 12 commits
- Update release instructions (#22454) · 4c295a26
  Sylvain Gugger authored Mar 29, 2023
  
  4c295a26
- Avoid using personal HF token in CI (#22453) · 97440e9c
  Yih-Dar authored Mar 29, 2023
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  97440e9c
- Update Neptune docs (#22452) · 173193cc
  Sabine authored Mar 29, 2023
  
  173193cc
- Revert "Fix --bf16 option support for Neuron after PR #22300" (#22451) · 5e89a435
  jeffhataws authored Mar 29, 2023
```
This reverts commit fd81746dbec5f17c8285a0fdc72ca4b4c025cc33.
```
  5e89a435
- [`Pix2Struct`] Fix slow test (#22448) · b844f8a9
  Younes Belkada authored Mar 29, 2023
```
fix slow test
```
  b844f8a9
- Revert "Error (also in original) model, scaling only q matrix not qk.T dot... · 55dae94c
  Sylvain Gugger authored Mar 29, 2023
```
Revert "Error (also in original) model, scaling only q matrix not qk.T dot product (qk.T/sqrt(dim_per_head))" (#22444)

Revert "Error (also in original) model, scaling only q matrix not qk.T dot product (qk.T/sqrt(dim_per_head)) (#21627)"

This reverts commit bad83008.
```
  55dae94c
- Use real tokenizers if tiny version(s) creation has issue(s) (#22428) · 8894b817
  Yih-Dar authored Mar 29, 2023
```
Fix some tiny model creation issues
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  8894b817
- Don't hard error when cache version can't be converted to int (#22427) · 9b494a15
  Sylvain Gugger authored Mar 29, 2023
  
  9b494a15
- [`Generate`] Add conditional generation for multimodal models (#22424) · 8252e24a
  Younes Belkada authored Mar 29, 2023
```
* add conditional generation

* add comments
```
  8252e24a
- [`bnb`] fix bnb failing test (#22439) · 33f4cb10
  Younes Belkada authored Mar 29, 2023
```
* fix bnb failing test

* fix

* fix

* fixup
```
  33f4cb10
- Hyperparameter search reporting to W&B (#22440) · fab1de72
  Nolwenn Bernard authored Mar 29, 2023
```
Fixes #22429
```
  fab1de72
- Add clean_up_tokenization_spaces to config (#22341) · 8d9c3836
  Arthur authored Mar 29, 2023
```
* add draft changes

* fix failing wav2vec

* style

* make sure that the argument is saved + add tests

* style

* fixup

* update test

* default clean_up_tokenization_spaces to False for Bloom and Llama

* Update code based on review
Co-authored-by: Nicolas Patry <patry.nicolas@gmail.com>

* style

* quality

---------
Co-authored-by: Nicolas Patry <patry.nicolas@gmail.com>
```
  8d9c3836
28 Mar, 2023 4 commits

MBart: Fix docs and doctests (#22422) · b29fd697
Joao Gante authored Mar 28, 2023
```
Fix docs and doctests
```
b29fd697

[performance] ensure `causal_mask` is created directly on device (#22378) · ae5fc2db

Jeff Rasley authored Mar 28, 2023

* ensure causal_mask is created directly on device

* add copy tag to opt, update bart implementation

* add device to all _make_causal_mask copies

* formatting fixes

* more manual fixes due to unlinked versions of _prepare_decoder_attention_mask

ae5fc2db

Fix bug in perplexity guide calculations and update perplexity numbers. Fixes #22348 (#22411) · ed57c979
fpgaminer authored Mar 28, 2023
```
Fix bug in perplexity guide calculations and update perplexity numbers.
```
ed57c979

Bump redis from 4.1.4 to 4.5.3 in /examples/research_projects/decision_transformer (#22410) · 32ff0640

dependabot[bot] authored Mar 27, 2023

Bump redis in /examples/research_projects/decision_transformer

Bumps [redis](https://github.com/redis/redis-py) from 4.1.4 to 4.5.3.
- [Release notes](https://github.com/redis/redis-py/releases)
- [Changelog](https://github.com/redis/redis-py/blob/master/CHANGES)
- [Commits](https://github.com/redis/redis-py/compare/v4.1.4...v4.5.3

)

---
updated-dependencies:
- dependency-name: redis
  dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

32ff0640

27 Mar, 2023 17 commits

[neptune] fix checkpoint bug with relative out_dir (#22102) · 3ec7a476

Kshiteej K authored Mar 28, 2023



* [neptune] fix checkpoint bug with relative out_dir

* update imports

* reformat with black

* check neptune without imports

* fix typing-related issue

* run black on code

* use os.path.sep instead of raw \

* simplify imports and remove type annotation

* make ruff happy

* apply review suggestions

---------
Co-authored-by: Aleksander Wojnarowicz <alwojnarowicz@gmail.com>

3ec7a476

[WIP]`NLLB-MoE` Adds the moe model (#22024) · 19ade242

Arthur authored Mar 27, 2023

* Initial commit

* update modeling code

* update doc

* add functions necessary

* fix impotrs

* revert changes

* fixup

* more styling to get going

* remove standalone encoder

* update code

* styling

* fix config and model

* update code and some refactoring

* make more tests pass

* Adding NLLB-200 - MoE - 54.5B for no language left behind
Fixes #21300

* fix mor common tests

* styke

* update testing file

* update

* update

* Router2 doc

* update check config with sparse layer

* add dummy router

* update current conversion script

* create on the fly conversion script

* Fixup

* style

* style 2

* fix empty return

* fix return

* Update default config sparse layers

* easier to create sparse layers

* update

* update conversion script

* update modeling

* add to toctree

* styling

* make ruff happy

* update docstring

* update conversion script

* update, will break tests but impelemting top2

* update

* ❗local groups are supported here

* ⚠️ Support for local groups is now removed ⚠️

This is because it has to work with model parallelism that we do not support

* finish simplificaiton

* Fix forward

* style

* fixup

* Update modelling and test, refactoring

* update tests

* remove final layer)norm as it is done in the FF

* routing works! Logits test added

* nit in test

* remove top1router

* style

* make sure sparse are tested. Had to change route_tokens a liottle bit

* add support for unslip models when converting

* fixup

* style

* update test s

* update test

* REFACTOR

* encoder outputs match!

* style

* update testing

* 🎉encoder and decoder logits match 🎉



* styleing

* update tests

* cleanup tests

* fix router test and CIs

* cleanup

* cleanup test styling

* fix tests

* Finally the generation tests match!

* cleanup

* update test

* style testing file

* remove script

* cleanup

* more cleanup

* nits

* update

* NLLB tokenizer is wrong and will be fixed soon

* use LongTensors

* update tests

* revert some small changes

* fix second expert sampling and batch prioritized routing

* update tests

* finish last tests

* make ruff happy

* update

* ruff again

* style

* Update docs/source/en/model_doc/nllb-moe.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Updates based on review

* style and fix import issue

* nit

* more nits

* cleanup

* styling

* update test_seconde_expert_policy

* fix name

* last nit on the markdown examples

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

19ade242

Fix quality · 057e1d74
Sylvain Gugger authored Mar 27, 2023

057e1d74

Hardware Auto-Setup for Examples (#22319) · f02e3a2b

Donny Greenberg authored Mar 27, 2023

* Add initial remote hardware auto-setup docs

* Fix a few typos and clarify some language

* Add missing dependency

* Update self-hosted launch script with Sylvain's comments.

* Formatting.

* Trigger CI

* Style

f02e3a2b

Trainer: missing None check (#22404) · 738944c9
Joao Gante authored Mar 27, 2023
```
missing None check
```
738944c9
Trainer: move Seq2SeqTrainer imports under the typing guard (#22401) · 53155b52
Joao Gante authored Mar 27, 2023

53155b52

[Pix2Struct] Add support to resize embeddings (#22394) · 0e708178

NielsRogge authored Mar 27, 2023

* First draft

* Fix integration test

* Remove script

* Fix test and typos

* Fix one more test

* Skip tied embeddings test

* Remove line

* Address comments

0e708178

Transformers env safetensors (#22400) · f6b80a01
Sylvain Gugger authored Mar 27, 2023
```
* Report safetensors version in transformers-cli env

* Styling

* Trigger CI maybe
```
f6b80a01
[`bnb`] Force `requires_grad` to be `False` (#22396) · d324b70f
Younes Belkada authored Mar 27, 2023
```
for rg to be `False`
```
d324b70f
Generate: support for left-padding on GPTNeoX and Llama (#22382) · 7dcd8703
Joao Gante authored Mar 27, 2023

7dcd8703

Seq2seq trainer generation config arg (#22323) · 5506d049

Nathan Fradet authored Mar 27, 2023



* seq2seq trainer and training arguments accepting GenerationConfig arg

* seq2seq Trainer and training arguments docstring fixes

* Update training_args_seq2seq.py docstring
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fixing trainer_seq2seq.py docstring
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* seq2seq trainer: legacy gen args back & GenerationConfig created at init

* Seq2seq trainer: fix in case gen_config.max_new_tokens is None
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* seq2seq trainer: adding legacy arg retrocompatibility

* seq2seq trainer and training arguments accepting GenerationConfig arg

* seq2seq Trainer and training arguments docstring fixes

* Update training_args_seq2seq.py docstring
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fixing trainer_seq2seq.py docstring
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* seq2seq trainer: legacy gen args back & GenerationConfig created at init

* Seq2seq trainer: fix in case gen_config.max_new_tokens is None
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* seq2seq trainer: adding legacy arg retrocompatibility

* seq2seq trainer: evaluate and predict untouched

* Apply suggestions from code review
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* seq2seq trainer: adding init args, keeping IDEs hints

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

5506d049

Wav2Vec2ProcessorWithLM can return N best hypotheses now (#22235) · 03966cac

Vladislav Sokolovskii authored Mar 27, 2023



* Wav2Vec2ProcessorWithLM can return N best hypotheses now
Signed-off-by: Vladislav Sokolovskii <vladislav@parrothq.com>

* Wav2Vec2ProcessorWithLM n_best cannot be None
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Batch decoding can return  N best hypotheses now

batch_decode was extended with the same functionality as decode
function, N best hypotheses per sample can be returned
Signed-off-by: Vladislav Sokolovskii <vladislav@parrothq.com>

---------
Signed-off-by: Vladislav Sokolovskii <vladislav@parrothq.com>
Co-authored-by: Vladislav Sokolovskii <vladislav@parrothq.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

03966cac

load_in_8bit now respects 'balanced' device maps in multi-gpu environments (#22377) · 66d1eee6
кѳѳsнī authored Mar 27, 2023
```
balanced 8bit memory
```
66d1eee6
Adapt find_tied_parameters to handle breaking change in Accelerate (#22360) · 8cfc6678
Sylvain Gugger authored Mar 27, 2023

8cfc6678
Translated documentation in italian (#22388) · 204737fc
Nicola Procopio authored Mar 27, 2023
```
* updated toctree

* added and translated mdx documents
```
204737fc

Changed world_size() to get_world_size() bugfix (#22381) · d5c2c71c

Charlie-Bell authored Mar 27, 2023

Edited one line in src/transormers/generation/utils.py. Changed dist.world_size() to dist.get_world_size() since world_size() doesn't exist in pytorch.dist.

d5c2c71c

TensorFlow: additional missing `cmake` dependencies in CI (#22383) · c746eb16
Joao Gante authored Mar 27, 2023
```
* missing cmake

* more cmake
```
c746eb16

24 Mar, 2023 7 commits

[safetensors] don't use in `torch<1.10` (#22370) · cae78c46
Stas Bekman authored Mar 24, 2023
```
* [safetensors] don't use in pt<1.10

* better fix
```
cae78c46
Fix TF pipeline job · cfab34e1
Sylvain Gugger authored Mar 24, 2023

cfab34e1
[Trainer] add disclaimer that full_determinism is slow (#22368) · 500fce07
Stas Bekman authored Mar 24, 2023

500fce07

Resnet flax (#21472) · a0cbbba3

Shubhamai authored Mar 25, 2023



* [WIP] flax resnet

* added pretrained flax models, results reproducible

* Added pretrained flax models, results reproducible

* working on tests

* no real code change, just some comments

* [flax] adding support for batch norm layers

* fixing bugs related to pt+flax integration

* removing loss from modeling flax output class

* fixing classifier tests

* fixing comments, model output

* cleaning comments

* review changes

* review changes

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* renaming Flax to PyTorch

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

a0cbbba3

TensorFlow: pin maximum version to 2.12 (#22364) · 88dae78f
Joao Gante authored Mar 24, 2023

88dae78f
Improve error message (#22361) · 3a7f5fa9
Samuel Bubán authored Mar 24, 2023
```
* Improve error message

* Fix consistency
```
3a7f5fa9

Pin tensorflow-text to go with tensorflow (#22362) · 6587125c

Sylvain Gugger authored Mar 24, 2023

* Pin tensorflow-text to go with tensorflow

* Make it more convenient to pin TensorFlow

* setup don't like f-strings

6587125c