Commits · 738944c9eef159e8f4ead8f30593dddb5381c42f · chenpangpang / transformers

27 Mar, 2023 13 commits

Trainer: missing None check (#22404) · 738944c9
Joao Gante authored Mar 27, 2023
```
missing None check
```
738944c9
Trainer: move Seq2SeqTrainer imports under the typing guard (#22401) · 53155b52
Joao Gante authored Mar 27, 2023

53155b52

[Pix2Struct] Add support to resize embeddings (#22394) · 0e708178

NielsRogge authored Mar 27, 2023

* First draft

* Fix integration test

* Remove script

* Fix test and typos

* Fix one more test

* Skip tied embeddings test

* Remove line

* Address comments

0e708178

Transformers env safetensors (#22400) · f6b80a01
Sylvain Gugger authored Mar 27, 2023
```
* Report safetensors version in transformers-cli env

* Styling

* Trigger CI maybe
```
f6b80a01
[`bnb`] Force `requires_grad` to be `False` (#22396) · d324b70f
Younes Belkada authored Mar 27, 2023
```
for rg to be `False`
```
d324b70f
Generate: support for left-padding on GPTNeoX and Llama (#22382) · 7dcd8703
Joao Gante authored Mar 27, 2023

7dcd8703

Seq2seq trainer generation config arg (#22323) · 5506d049

Nathan Fradet authored Mar 27, 2023



* seq2seq trainer and training arguments accepting GenerationConfig arg

* seq2seq Trainer and training arguments docstring fixes

* Update training_args_seq2seq.py docstring
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fixing trainer_seq2seq.py docstring
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* seq2seq trainer: legacy gen args back & GenerationConfig created at init

* Seq2seq trainer: fix in case gen_config.max_new_tokens is None
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* seq2seq trainer: adding legacy arg retrocompatibility

* seq2seq trainer and training arguments accepting GenerationConfig arg

* seq2seq Trainer and training arguments docstring fixes

* Update training_args_seq2seq.py docstring
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fixing trainer_seq2seq.py docstring
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* seq2seq trainer: legacy gen args back & GenerationConfig created at init

* Seq2seq trainer: fix in case gen_config.max_new_tokens is None
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* seq2seq trainer: adding legacy arg retrocompatibility

* seq2seq trainer: evaluate and predict untouched

* Apply suggestions from code review
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* seq2seq trainer: adding init args, keeping IDEs hints

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

5506d049

Wav2Vec2ProcessorWithLM can return N best hypotheses now (#22235) · 03966cac

Vladislav Sokolovskii authored Mar 27, 2023



* Wav2Vec2ProcessorWithLM can return N best hypotheses now
Signed-off-by: Vladislav Sokolovskii <vladislav@parrothq.com>

* Wav2Vec2ProcessorWithLM n_best cannot be None
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Batch decoding can return  N best hypotheses now

batch_decode was extended with the same functionality as decode
function, N best hypotheses per sample can be returned
Signed-off-by: Vladislav Sokolovskii <vladislav@parrothq.com>

---------
Signed-off-by: Vladislav Sokolovskii <vladislav@parrothq.com>
Co-authored-by: Vladislav Sokolovskii <vladislav@parrothq.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

03966cac

load_in_8bit now respects 'balanced' device maps in multi-gpu environments (#22377) · 66d1eee6
кѳѳsнī authored Mar 27, 2023
```
balanced 8bit memory
```
66d1eee6
Adapt find_tied_parameters to handle breaking change in Accelerate (#22360) · 8cfc6678
Sylvain Gugger authored Mar 27, 2023

8cfc6678
Translated documentation in italian (#22388) · 204737fc
Nicola Procopio authored Mar 27, 2023
```
* updated toctree

* added and translated mdx documents
```
204737fc

Changed world_size() to get_world_size() bugfix (#22381) · d5c2c71c

Charlie-Bell authored Mar 27, 2023

Edited one line in src/transormers/generation/utils.py. Changed dist.world_size() to dist.get_world_size() since world_size() doesn't exist in pytorch.dist.

d5c2c71c

TensorFlow: additional missing `cmake` dependencies in CI (#22383) · c746eb16
Joao Gante authored Mar 27, 2023
```
* missing cmake

* more cmake
```
c746eb16

24 Mar, 2023 12 commits

[safetensors] don't use in `torch<1.10` (#22370) · cae78c46
Stas Bekman authored Mar 24, 2023
```
* [safetensors] don't use in pt<1.10

* better fix
```
cae78c46
Fix TF pipeline job · cfab34e1
Sylvain Gugger authored Mar 24, 2023

cfab34e1
[Trainer] add disclaimer that full_determinism is slow (#22368) · 500fce07
Stas Bekman authored Mar 24, 2023

500fce07

Resnet flax (#21472) · a0cbbba3

Shubhamai authored Mar 25, 2023



* [WIP] flax resnet

* added pretrained flax models, results reproducible

* Added pretrained flax models, results reproducible

* working on tests

* no real code change, just some comments

* [flax] adding support for batch norm layers

* fixing bugs related to pt+flax integration

* removing loss from modeling flax output class

* fixing classifier tests

* fixing comments, model output

* cleaning comments

* review changes

* review changes

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* renaming Flax to PyTorch

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

a0cbbba3

TensorFlow: pin maximum version to 2.12 (#22364) · 88dae78f
Joao Gante authored Mar 24, 2023

88dae78f
Improve error message (#22361) · 3a7f5fa9
Samuel Bubán authored Mar 24, 2023
```
* Improve error message

* Fix consistency
```
3a7f5fa9

Pin tensorflow-text to go with tensorflow (#22362) · 6587125c

Sylvain Gugger authored Mar 24, 2023

* Pin tensorflow-text to go with tensorflow

* Make it more convenient to pin TensorFlow

* setup don't like f-strings

6587125c

Update docker files to use official torch 2.0.0 (#22357) · 01203475

Yih-Dar authored Mar 24, 2023



* update docker files to use official torch 2.0.0

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

01203475

Add Mega: Moving Average Equipped Gated Attention (#21766) · 57f25f4b

Mitch Naylor authored Mar 24, 2023



* add mega file structure and plain pytorch version of mega source code

* added config class with old naming conventions

* filled in mega documentation

* added config class and embeddings with optional token types

* updated notes

* starting the conversion process, deleted intermediate and added use_cache back to config

* renamed config attributes in modeling_mega.py

* checkpointing before refactoring incremental decoding functions

* removed stateful incremental key/values for EMA and self-attention

* refactored MovingAverageGatedAttention to remove stateful k/v history and use unified attention mask

* MovingAverageGatedAttention works with incremental decoding + past values, added sequence length enforcement

* more comments in MovingAverageGatedAttention + checkpointing before GatedCrossAttention

* bug fix in attention mask handling in MovingAverageGatedAttention

* removed incremental state from GatedCrossAttention and removed IncrementalState class

* finished gated cross attention and got MegaLayer working

* fixed causal masking in mega decoder

* fixed how padding and causal masks are passed through MegaLayer with and without k/v caching

* finished MegaModel; tested with encoder, decoder-only, and cross-attention type inputs; started work on downstream classes; removed mentions of position_ids

* added optional dense hidden layer for masked and causal LM classes

* docstring updates in MultiHeadEMA and GatedCrossAttention, removed unnecessary inputs in cross-attention

* removed before_attn_fn in Mega class and updated docstrings and comments up to there

* bug fix in MovingAverageGatedAttention masking

* working conversion of MLM checkpoint in scratchpad script -- perfect matches

* moved arg for hidden dense layer in LM head to config; discovered issue where from_pretrained is renaming gamma and beta parameters

* renamed gamma and beta parameters to avoid HF renaming when loading from checkpoint

* finished checkpoint conversion script

* cleanup old class in mega config script

* removed 'copied from' statements and passing integration tests

* added num_attention_heads=1 to config for integration compatibility, decoder tests working, generation tests failing

* fixed tuple output of megamodel

* all common tests passing after fixing issues in decoder, gradient retention, and initialization

* added mega-specific tests, ready for more documentation and style checks

* updated docstrings; checkpoint before style fixes

* style and quality checks, fixed initialization problem in float_tensor, ready for PR

* added mega to toctree

* removed unnecessary arg in megaconfig

* removed unused arg and fixed code samples with leftover roberta models

* Apply suggestions from code review

Applied all suggestions except the one renaming a class, as I'll need to update that througout
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixed issue where .view breaks batch dimension, conversion script fixed with absolute imports, updated readme with Mega->MEGA

* removed asserts in Mega code, renamed sequencenorm, gatedcrossattention, and NFFN, replaced get_activation_fn with ACTFN, and added sequencenorm to layer norms

* reformatted .forward() docstrings to match style and removed unused mask input in cross-attention

* removed all reset_parameters() methods and rolled into MegaPreTrainedModel._init_weights()

* renamed all single-letter variables and improved readability in tensor size comments, Mega->MEGA in 2 documentation files

* variable names in NFFN

* manual Mega->MEGA changes in docs

* Mega->MEGA in config auto

* style and quality fixes

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* renamed parameters and variables with confusing names, added copied from statements, moved fft conv to its own method, other cleanup from PR comments

* commit before dealing with merge conflicts

* made new attention activation functions available in ACT2FN and added generation test from OPT

* style and quality in activations and tests

* documentation fixes, renaming variables in dropout and rotary positions, used built-in causal masking, encoders->layers in MegaModel, moved comments into docstrings

* style and quality fixes after latest updates, before rotary position ids

* causal mask in MegaBlock docstring + added missing device passing

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* added Mega prefixes where missing, reverted MegaSequenceNorm to if-else, other module renaming requested in PR

* style and quality fixes + readme updates pointing to main

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

57f25f4b

Generate: Add GPTNeoX integration test (#22346) · 0fa46524
Joao Gante authored Mar 24, 2023

0fa46524
Fix typo in Greedy Search Description (#22345) · b7960765
Ashwin Mathur authored Mar 24, 2023
```
Fix typo in greedy search docs
```
b7960765
[HFTracer] Make embeddings ops take on the dtype of the weight (#22347) · c0fa2aa0
James Reed authored Mar 24, 2023
```
* [HFTracer] Make embeddings ops take on the dtype of the weight

* fix bug
```
c0fa2aa0

23 Mar, 2023 13 commits

Automatically create/update tiny models (#22275) · e8cc0255

Yih-Dar authored Mar 23, 2023



* Automatically create or update tiny models

* Skip failed tests

* update workflow file

* use revision

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

e8cc0255

Enable training Llama with model or pipeline parallelism (#22329) · a92e0ad2

кѳѳsнī authored Mar 23, 2023



* Llama - Move target tokens to final pipeline device if needed

* Update src/transformers/models/llama/modeling_llama.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_llama.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

a92e0ad2

Generate: add test for left-padding support (#22322) · 502fec77
Joao Gante authored Mar 23, 2023

502fec77

Fix --bf16 option support for Neuron after PR #22300 (#22307) · ec9b18f6

jeffhataws authored Mar 23, 2023

This PR fixes the "RuntimeError: No CUDA GPUs are available"
when running with --bf16 option on Neuron.

Related PRs:
https://github.com/huggingface/transformers/pull/20684
https://github.com/huggingface/transformers/pull/22300

ec9b18f6

Added type hints to TFDeiTModel (#22327) · aef488c5

Batese2001 authored Mar 23, 2023



* Added type hints to TFDeiTModel

* make style

---------
Co-authored-by: Matt <rocketknight1@gmail.com>

aef488c5

Minor typo in pipeline FillMaskPipeline's documentation. (#22339) · 59b9351b
Samuel Larkin authored Mar 23, 2023

59b9351b
Fix various imports (#22281) · 506e7c63
Sylvain Gugger authored Mar 23, 2023
```
* Fix various imports

* Fix copies

* Fix import
```
506e7c63
Mention why one needs to specify max_steps in Trainer (#22333) · 053c2153
Quentin Lhoest authored Mar 23, 2023
```
* Mention why one needs to specify max_steps in Trainer

* dummy change to trigger CI
```
053c2153

Fixed gradient checkpoint bug for TimeSeriesTransformer (#22272) · 5a9eb314

mollerup23 authored Mar 23, 2023



* Fixed gradient checkpoint bug for this model

* Updating PR indentation (maintainer feedback)

* make fixup

---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>

5a9eb314

[`MBart`] Add `accelerate` support for MBart (#22309) · ff20f9cf
Younes Belkada authored Mar 23, 2023
```
add `accelerate` support for MBart
```
ff20f9cf

[gptj] support older pytorch version (#22325) · 61f79b29

Stas Bekman authored Mar 22, 2023



* [gptj] support older pytorch version

* contributor

* contributor

* make copies

---------
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>

61f79b29

Really fix quality due to ruff release · 80e3b363
Sylvain Gugger authored Mar 22, 2023

80e3b363
Fix quality due to ruff release · ef28df05
Sylvain authored Mar 22, 2023

ef28df05

22 Mar, 2023 2 commits

[deepspeed zero3] need `generate(synced_gpus=True, ...)` (#22242) · 73fdc8c5

Stas Bekman authored Mar 22, 2023



* [deepspeed zero3] need generate(synced_gpus=True, ...)

* fix

* rework per Sylvain's suggestion

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

73fdc8c5

Fix PipelineTests skip conditions (#22320) · 8b05ace0

Yih-Dar authored Mar 22, 2023



* check what tests fail

* Skip failing tests

* Skip failing tests

* Skip failing tests

* Skip failing tests

* clean up

* clean up

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

8b05ace0