Commits · 5e89a435c86544ff77a7d51b0944e9ff1b68b5e7 · chenpangpang / transformers

29 Mar, 2023 9 commits
- Revert "Fix --bf16 option support for Neuron after PR #22300" (#22451) · 5e89a435
  jeffhataws authored Mar 29, 2023
```
This reverts commit fd81746dbec5f17c8285a0fdc72ca4b4c025cc33.
```
  5e89a435
- [`Pix2Struct`] Fix slow test (#22448) · b844f8a9
  Younes Belkada authored Mar 29, 2023
```
fix slow test
```
  b844f8a9
- Revert "Error (also in original) model, scaling only q matrix not qk.T dot... · 55dae94c
  Sylvain Gugger authored Mar 29, 2023
```
Revert "Error (also in original) model, scaling only q matrix not qk.T dot product (qk.T/sqrt(dim_per_head))" (#22444)

Revert "Error (also in original) model, scaling only q matrix not qk.T dot product (qk.T/sqrt(dim_per_head)) (#21627)"

This reverts commit bad83008.
```
  55dae94c
- Use real tokenizers if tiny version(s) creation has issue(s) (#22428) · 8894b817
  Yih-Dar authored Mar 29, 2023
```
Fix some tiny model creation issues
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  8894b817
- Don't hard error when cache version can't be converted to int (#22427) · 9b494a15
  Sylvain Gugger authored Mar 29, 2023
  
  9b494a15
- [`Generate`] Add conditional generation for multimodal models (#22424) · 8252e24a
  Younes Belkada authored Mar 29, 2023
```
* add conditional generation

* add comments
```
  8252e24a
- [`bnb`] fix bnb failing test (#22439) · 33f4cb10
  Younes Belkada authored Mar 29, 2023
```
* fix bnb failing test

* fix

* fix

* fixup
```
  33f4cb10
- Hyperparameter search reporting to W&B (#22440) · fab1de72
  Nolwenn Bernard authored Mar 29, 2023
```
Fixes #22429
```
  fab1de72
- Add clean_up_tokenization_spaces to config (#22341) · 8d9c3836
  Arthur authored Mar 29, 2023
```
* add draft changes

* fix failing wav2vec

* style

* make sure that the argument is saved + add tests

* style

* fixup

* update test

* default clean_up_tokenization_spaces to False for Bloom and Llama

* Update code based on review
Co-authored-by: Nicolas Patry <patry.nicolas@gmail.com>

* style

* quality

---------
Co-authored-by: Nicolas Patry <patry.nicolas@gmail.com>
```
  8d9c3836
28 Mar, 2023 4 commits

MBart: Fix docs and doctests (#22422) · b29fd697
Joao Gante authored Mar 28, 2023
```
Fix docs and doctests
```
b29fd697

[performance] ensure `causal_mask` is created directly on device (#22378) · ae5fc2db

Jeff Rasley authored Mar 28, 2023

* ensure causal_mask is created directly on device

* add copy tag to opt, update bart implementation

* add device to all _make_causal_mask copies

* formatting fixes

* more manual fixes due to unlinked versions of _prepare_decoder_attention_mask

ae5fc2db

Fix bug in perplexity guide calculations and update perplexity numbers. Fixes #22348 (#22411) · ed57c979
fpgaminer authored Mar 28, 2023
```
Fix bug in perplexity guide calculations and update perplexity numbers.
```
ed57c979

Bump redis from 4.1.4 to 4.5.3 in /examples/research_projects/decision_transformer (#22410) · 32ff0640

dependabot[bot] authored Mar 27, 2023

Bump redis in /examples/research_projects/decision_transformer

Bumps [redis](https://github.com/redis/redis-py) from 4.1.4 to 4.5.3.
- [Release notes](https://github.com/redis/redis-py/releases)
- [Changelog](https://github.com/redis/redis-py/blob/master/CHANGES)
- [Commits](https://github.com/redis/redis-py/compare/v4.1.4...v4.5.3

)

---
updated-dependencies:
- dependency-name: redis
  dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

32ff0640

27 Mar, 2023 17 commits

[neptune] fix checkpoint bug with relative out_dir (#22102) · 3ec7a476

Kshiteej K authored Mar 28, 2023



* [neptune] fix checkpoint bug with relative out_dir

* update imports

* reformat with black

* check neptune without imports

* fix typing-related issue

* run black on code

* use os.path.sep instead of raw \

* simplify imports and remove type annotation

* make ruff happy

* apply review suggestions

---------
Co-authored-by: Aleksander Wojnarowicz <alwojnarowicz@gmail.com>

3ec7a476

[WIP]`NLLB-MoE` Adds the moe model (#22024) · 19ade242

Arthur authored Mar 27, 2023

* Initial commit

* update modeling code

* update doc

* add functions necessary

* fix impotrs

* revert changes

* fixup

* more styling to get going

* remove standalone encoder

* update code

* styling

* fix config and model

* update code and some refactoring

* make more tests pass

* Adding NLLB-200 - MoE - 54.5B for no language left behind
Fixes #21300

* fix mor common tests

* styke

* update testing file

* update

* update

* Router2 doc

* update check config with sparse layer

* add dummy router

* update current conversion script

* create on the fly conversion script

* Fixup

* style

* style 2

* fix empty return

* fix return

* Update default config sparse layers

* easier to create sparse layers

* update

* update conversion script

* update modeling

* add to toctree

* styling

* make ruff happy

* update docstring

* update conversion script

* update, will break tests but impelemting top2

* update

* ❗local groups are supported here

* ⚠️ Support for local groups is now removed ⚠️

This is because it has to work with model parallelism that we do not support

* finish simplificaiton

* Fix forward

* style

* fixup

* Update modelling and test, refactoring

* update tests

* remove final layer)norm as it is done in the FF

* routing works! Logits test added

* nit in test

* remove top1router

* style

* make sure sparse are tested. Had to change route_tokens a liottle bit

* add support for unslip models when converting

* fixup

* style

* update test s

* update test

* REFACTOR

* encoder outputs match!

* style

* update testing

* 🎉encoder and decoder logits match 🎉



* styleing

* update tests

* cleanup tests

* fix router test and CIs

* cleanup

* cleanup test styling

* fix tests

* Finally the generation tests match!

* cleanup

* update test

* style testing file

* remove script

* cleanup

* more cleanup

* nits

* update

* NLLB tokenizer is wrong and will be fixed soon

* use LongTensors

* update tests

* revert some small changes

* fix second expert sampling and batch prioritized routing

* update tests

* finish last tests

* make ruff happy

* update

* ruff again

* style

* Update docs/source/en/model_doc/nllb-moe.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Updates based on review

* style and fix import issue

* nit

* more nits

* cleanup

* styling

* update test_seconde_expert_policy

* fix name

* last nit on the markdown examples

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

19ade242

Fix quality · 057e1d74
Sylvain Gugger authored Mar 27, 2023

057e1d74

Hardware Auto-Setup for Examples (#22319) · f02e3a2b

Donny Greenberg authored Mar 27, 2023

* Add initial remote hardware auto-setup docs

* Fix a few typos and clarify some language

* Add missing dependency

* Update self-hosted launch script with Sylvain's comments.

* Formatting.

* Trigger CI

* Style

f02e3a2b

Trainer: missing None check (#22404) · 738944c9
Joao Gante authored Mar 27, 2023
```
missing None check
```
738944c9
Trainer: move Seq2SeqTrainer imports under the typing guard (#22401) · 53155b52
Joao Gante authored Mar 27, 2023

53155b52

[Pix2Struct] Add support to resize embeddings (#22394) · 0e708178

NielsRogge authored Mar 27, 2023

* First draft

* Fix integration test

* Remove script

* Fix test and typos

* Fix one more test

* Skip tied embeddings test

* Remove line

* Address comments

0e708178

Transformers env safetensors (#22400) · f6b80a01
Sylvain Gugger authored Mar 27, 2023
```
* Report safetensors version in transformers-cli env

* Styling

* Trigger CI maybe
```
f6b80a01
[`bnb`] Force `requires_grad` to be `False` (#22396) · d324b70f
Younes Belkada authored Mar 27, 2023
```
for rg to be `False`
```
d324b70f
Generate: support for left-padding on GPTNeoX and Llama (#22382) · 7dcd8703
Joao Gante authored Mar 27, 2023

7dcd8703

Seq2seq trainer generation config arg (#22323) · 5506d049

Nathan Fradet authored Mar 27, 2023



* seq2seq trainer and training arguments accepting GenerationConfig arg

* seq2seq Trainer and training arguments docstring fixes

* Update training_args_seq2seq.py docstring
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fixing trainer_seq2seq.py docstring
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* seq2seq trainer: legacy gen args back & GenerationConfig created at init

* Seq2seq trainer: fix in case gen_config.max_new_tokens is None
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* seq2seq trainer: adding legacy arg retrocompatibility

* seq2seq trainer and training arguments accepting GenerationConfig arg

* seq2seq Trainer and training arguments docstring fixes

* Update training_args_seq2seq.py docstring
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fixing trainer_seq2seq.py docstring
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* seq2seq trainer: legacy gen args back & GenerationConfig created at init

* Seq2seq trainer: fix in case gen_config.max_new_tokens is None
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* seq2seq trainer: adding legacy arg retrocompatibility

* seq2seq trainer: evaluate and predict untouched

* Apply suggestions from code review
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* seq2seq trainer: adding init args, keeping IDEs hints

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

5506d049

Wav2Vec2ProcessorWithLM can return N best hypotheses now (#22235) · 03966cac

Vladislav Sokolovskii authored Mar 27, 2023



* Wav2Vec2ProcessorWithLM can return N best hypotheses now
Signed-off-by: Vladislav Sokolovskii <vladislav@parrothq.com>

* Wav2Vec2ProcessorWithLM n_best cannot be None
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Batch decoding can return  N best hypotheses now

batch_decode was extended with the same functionality as decode
function, N best hypotheses per sample can be returned
Signed-off-by: Vladislav Sokolovskii <vladislav@parrothq.com>

---------
Signed-off-by: Vladislav Sokolovskii <vladislav@parrothq.com>
Co-authored-by: Vladislav Sokolovskii <vladislav@parrothq.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

03966cac

load_in_8bit now respects 'balanced' device maps in multi-gpu environments (#22377) · 66d1eee6
кѳѳsнī authored Mar 27, 2023
```
balanced 8bit memory
```
66d1eee6
Adapt find_tied_parameters to handle breaking change in Accelerate (#22360) · 8cfc6678
Sylvain Gugger authored Mar 27, 2023

8cfc6678
Translated documentation in italian (#22388) · 204737fc
Nicola Procopio authored Mar 27, 2023
```
* updated toctree

* added and translated mdx documents
```
204737fc

Changed world_size() to get_world_size() bugfix (#22381) · d5c2c71c

Charlie-Bell authored Mar 27, 2023

Edited one line in src/transormers/generation/utils.py. Changed dist.world_size() to dist.get_world_size() since world_size() doesn't exist in pytorch.dist.

d5c2c71c

TensorFlow: additional missing `cmake` dependencies in CI (#22383) · c746eb16
Joao Gante authored Mar 27, 2023
```
* missing cmake

* more cmake
```
c746eb16

24 Mar, 2023 10 commits

[safetensors] don't use in `torch<1.10` (#22370) · cae78c46
Stas Bekman authored Mar 24, 2023
```
* [safetensors] don't use in pt<1.10

* better fix
```
cae78c46
Fix TF pipeline job · cfab34e1
Sylvain Gugger authored Mar 24, 2023

cfab34e1
[Trainer] add disclaimer that full_determinism is slow (#22368) · 500fce07
Stas Bekman authored Mar 24, 2023

500fce07

Resnet flax (#21472) · a0cbbba3

Shubhamai authored Mar 25, 2023



* [WIP] flax resnet

* added pretrained flax models, results reproducible

* Added pretrained flax models, results reproducible

* working on tests

* no real code change, just some comments

* [flax] adding support for batch norm layers

* fixing bugs related to pt+flax integration

* removing loss from modeling flax output class

* fixing classifier tests

* fixing comments, model output

* cleaning comments

* review changes

* review changes

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* renaming Flax to PyTorch

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

a0cbbba3

TensorFlow: pin maximum version to 2.12 (#22364) · 88dae78f
Joao Gante authored Mar 24, 2023

88dae78f
Improve error message (#22361) · 3a7f5fa9
Samuel Bubán authored Mar 24, 2023
```
* Improve error message

* Fix consistency
```
3a7f5fa9

Pin tensorflow-text to go with tensorflow (#22362) · 6587125c

Sylvain Gugger authored Mar 24, 2023

* Pin tensorflow-text to go with tensorflow

* Make it more convenient to pin TensorFlow

* setup don't like f-strings

6587125c

Update docker files to use official torch 2.0.0 (#22357) · 01203475

Yih-Dar authored Mar 24, 2023



* update docker files to use official torch 2.0.0

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

01203475

Add Mega: Moving Average Equipped Gated Attention (#21766) · 57f25f4b

Mitch Naylor authored Mar 24, 2023



* add mega file structure and plain pytorch version of mega source code

* added config class with old naming conventions

* filled in mega documentation

* added config class and embeddings with optional token types

* updated notes

* starting the conversion process, deleted intermediate and added use_cache back to config

* renamed config attributes in modeling_mega.py

* checkpointing before refactoring incremental decoding functions

* removed stateful incremental key/values for EMA and self-attention

* refactored MovingAverageGatedAttention to remove stateful k/v history and use unified attention mask

* MovingAverageGatedAttention works with incremental decoding + past values, added sequence length enforcement

* more comments in MovingAverageGatedAttention + checkpointing before GatedCrossAttention

* bug fix in attention mask handling in MovingAverageGatedAttention

* removed incremental state from GatedCrossAttention and removed IncrementalState class

* finished gated cross attention and got MegaLayer working

* fixed causal masking in mega decoder

* fixed how padding and causal masks are passed through MegaLayer with and without k/v caching

* finished MegaModel; tested with encoder, decoder-only, and cross-attention type inputs; started work on downstream classes; removed mentions of position_ids

* added optional dense hidden layer for masked and causal LM classes

* docstring updates in MultiHeadEMA and GatedCrossAttention, removed unnecessary inputs in cross-attention

* removed before_attn_fn in Mega class and updated docstrings and comments up to there

* bug fix in MovingAverageGatedAttention masking

* working conversion of MLM checkpoint in scratchpad script -- perfect matches

* moved arg for hidden dense layer in LM head to config; discovered issue where from_pretrained is renaming gamma and beta parameters

* renamed gamma and beta parameters to avoid HF renaming when loading from checkpoint

* finished checkpoint conversion script

* cleanup old class in mega config script

* removed 'copied from' statements and passing integration tests

* added num_attention_heads=1 to config for integration compatibility, decoder tests working, generation tests failing

* fixed tuple output of megamodel

* all common tests passing after fixing issues in decoder, gradient retention, and initialization

* added mega-specific tests, ready for more documentation and style checks

* updated docstrings; checkpoint before style fixes

* style and quality checks, fixed initialization problem in float_tensor, ready for PR

* added mega to toctree

* removed unnecessary arg in megaconfig

* removed unused arg and fixed code samples with leftover roberta models

* Apply suggestions from code review

Applied all suggestions except the one renaming a class, as I'll need to update that througout
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixed issue where .view breaks batch dimension, conversion script fixed with absolute imports, updated readme with Mega->MEGA

* removed asserts in Mega code, renamed sequencenorm, gatedcrossattention, and NFFN, replaced get_activation_fn with ACTFN, and added sequencenorm to layer norms

* reformatted .forward() docstrings to match style and removed unused mask input in cross-attention

* removed all reset_parameters() methods and rolled into MegaPreTrainedModel._init_weights()

* renamed all single-letter variables and improved readability in tensor size comments, Mega->MEGA in 2 documentation files

* variable names in NFFN

* manual Mega->MEGA changes in docs

* Mega->MEGA in config auto

* style and quality fixes

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* renamed parameters and variables with confusing names, added copied from statements, moved fft conv to its own method, other cleanup from PR comments

* commit before dealing with merge conflicts

* made new attention activation functions available in ACT2FN and added generation test from OPT

* style and quality in activations and tests

* documentation fixes, renaming variables in dropout and rotary positions, used built-in causal masking, encoders->layers in MegaModel, moved comments into docstrings

* style and quality fixes after latest updates, before rotary position ids

* causal mask in MegaBlock docstring + added missing device passing

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* added Mega prefixes where missing, reverted MegaSequenceNorm to if-else, other module renaming requested in PR

* style and quality fixes + readme updates pointing to main

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

57f25f4b

Generate: Add GPTNeoX integration test (#22346) · 0fa46524
Joao Gante authored Mar 24, 2023

0fa46524