Commits · 4c5c0af7e5280ad5c78d698e3808ee0a543b7262 · chenpangpang / transformers

16 Mar, 2023 7 commits

Update tiny model creation script (#22202) · 4c5c0af7

Yih-Dar authored Mar 16, 2023



* Update UNCONVERTIBLE_MODEL_ARCHITECTURES

* Deal with 2 model tester classes in single test file

* Deal with 2 model tester classes in single test file

* Deal with 2 model tester classes in single test file

* make style and quality

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

4c5c0af7

LLaMA Implementation (#21955) · 464d4207

Jason Phang authored Mar 16, 2023



* LLaMA

* sharding and docs

* tweak

* black

* inits

* ruff

* LLAMA_PRETRAINED_CONFIG_ARCHIVE_MAP

* init

* no checkpoint

* docs

* ruff

* type_vocab_size

* tokenizer fixes

* tokenizer fixes

* Update tokenization_llama.py

* Update tokenization_llama.py

* Update configuration_llama.py

* Update modeling_llama.py

* tokenizer add_bos by default

* licenses

* remove decoder

* norms and mlp

* rope overhaul

* tweaks

* black

* mention OPT implementation

* off-by-one naming

* typo

* fix

* tokenization fix and slicing bug

* padding config

* cleanup

* black

* update tests

* undo typo

* fix vocab caching logic

* ruff

* docbuilder

* attn fix from BlackSamorez

* initial feedback

* typo

* docs

* llama case

* llama case

* load checkpoint docs

* comment about tokenizer

* tokenizer defaults

* clear past_key_values if use_cache=False

* last tweaks

* last tweaks

* last tweaks

* last tweaks

---------
Co-authored-by: Stella Biderman <stellabiderman@gmail.com>

464d4207

LLaMA Implementation (#21955) · 0041be5b

Jason Phang authored Mar 16, 2023

* LLaMA

* sharding and docs

* tweak

* black

* inits

* ruff

* LLAMA_PRETRAINED_CONFIG_ARCHIVE_MAP

* init

* no checkpoint

* docs

* ruff

* type_vocab_size

* tokenizer fixes

* tokenizer fixes

* Update tokenization_llama.py

* Update tokenization_llama.py

* Update configuration_llama.py

* Update modeling_llama.py

* tokenizer add_bos by default

* licenses

* remove decoder

* norms and mlp

* rope overhaul

* tweaks

* black

* mention OPT implementation

* off-by-one naming

* typo

* fix

* tokenization fix and slicing bug

* padding config

* cleanup

* black

* update tests

* undo typo

* fix vocab caching logic

* ruff

* docbuilder

* attn fix from BlackSamorez

* initial feedback

* typo

* docs

* llama case

* llama case

* load checkpoint docs

* comment about tokenizer

* tokenizer defaults

* clear past_key_values if use_cache=False

* last tweaks

* last...

0041be5b

Italian Translation of migration.mdx (#22183) · 09922da4

Baelish03 authored Mar 16, 2023

* Tranlstion Italian: migration

* Update migration.mdx

minor fixes

* Update _toctree.yml

* Delete migration.mdx

* Add italian translation of migration.mdx

* Update of migration.mdx translation and toctree

09922da4

Update expected values in `MgpstrModelIntegrationTest` (#22195) · 52a57f7c
Yih-Dar authored Mar 16, 2023
```
Update values
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
52a57f7c
Fix typo in Align docs (#22199) · 1485bd9c
Alara Dirik authored Mar 16, 2023
```
Fix align docs typo
```
1485bd9c

Fix DeepSpeed CI (#22194) · 1c4a9acc

Yih-Dar authored Mar 16, 2023



* Deal with torch-tensorrt

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

1c4a9acc

15 Mar, 2023 5 commits

t5 remove data dependency (#22097) · 7c4999e4

Prathik Rao authored Mar 15, 2023



* t5 remove data dependency

* make style

* make fix-copies

---------
Co-authored-by: Prathik Rao <prathikrao@microsoft.com>

7c4999e4

Update BridgeTowerForContrastiveLearning (#22145) · 16121bae

Anahita Bhiwandiwalla authored Mar 15, 2023



* Use return_loss for BridgeTowerForContrastiveLearning, add example

* fix tests

* Update example in BridgeTowerForContrastiveLearning

* Update test_modeling_bridgetower.py

* update model output format

* minor update

* Update src/transformers/models/bridgetower/modeling_bridgetower.py

* make style

---------
Co-authored-by: Tiep Le <97980157+tileintel@users.noreply.github.com>
Co-authored-by: Tiep Le <tiep.le@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

16121bae

Regression pipeline device (#22190) · 42ad693b
Sylvain Gugger authored Mar 15, 2023
```
* Fix regression in pipeline when device=-1 is passed

* Add regression test
```
42ad693b
Revert 22152 MaskedImageCompletionOutput changes (#22187) · 73768147
amyeroberts authored Mar 15, 2023
```
Revert changes
```
73768147

Fix: unfinished_sequences with correct device (#22184) · 7b0e2cfd

浮躁的小螃蟹 authored Mar 16, 2023

Fix: unfinished_sequences with correct device 

The original code was causing errors when running torch.jit.trace due to the tensor options being incorrect. I fixed this by using torch.ones to create a tensor with the correct device and dtype. This should resolve the issue with running torch.jit.trace.

7b0e2cfd

14 Mar, 2023 13 commits

Run all tests by default (#22162) · f7329751
Sylvain Gugger authored Mar 14, 2023

f7329751
Load optimizer state on CPU to avoid CUDA OOM (#22159) · b7036f49
Sylvain Gugger authored Mar 14, 2023

b7036f49
v4.28.0.dev0 · ebdb185b
Sylvain Gugger authored Mar 14, 2023

ebdb185b
Revert "Enforce same behavior as PyTorch 2.0 for older versions" (#22163) · c52c5282
Sylvain Gugger authored Mar 14, 2023
```
Revert "Enforce same behavior as PyTorch 2.0 for older versions (#22136)"

This reverts commit 1c801d65.
```
c52c5282

[trainer] add `--optim adamw_torch_fused` for pt-2.0+ (#22144) · 085bf5c1

Stas Bekman authored Mar 14, 2023

* [trainer] add --optim adamw_torch_fused

* change optim default

* deal with non-torch

* revert default change; prep; add fp16/amp assert

* typo

* typo

085bf5c1

to_pil - don't rescale if int and in range 0-255 (#22158) · c6318c37

amyeroberts authored Mar 14, 2023

* Don't rescale if in and in range 0-255

* Raise value error if int values too large

* Update tests/test_image_transforms.py

* Update tests/test_image_transforms.py

c6318c37

Create MaskedImageCompletionOutput and fix ViT docs (#22152) · 3b22bfbc
Alara Dirik authored Mar 14, 2023
```
* create MaskedImageCompletionOutput

* fix bugs

* fix bugs
```
3b22bfbc

Fix big model inference for T5 models in float16 (#22095) · b45192ec

Sylvain Gugger authored Mar 14, 2023



* Fix big model inference for T5 models in float16

* Apply suggestions from code review
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Style

* Trigger CI with latest release

---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

b45192ec

Translation Italian: perf_train_cpu and perf_train_cpu_many (#22151) · 7f5ad6c3
Nicola Procopio authored Mar 14, 2023
```
* added translated files

added perf_train_cpu and perf_train_cpu_many

* updated toctree
```
7f5ad6c3
Update 2 doctest expected values for torch 2.0.0 (#22148) · ff887035
Yih-Dar authored Mar 14, 2023
```
update values
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
ff887035

Add ConvNeXT V2 (#21679) · cdddfbff

Alara Dirik authored Mar 14, 2023

* Add ConvNeXt V2 to transformers
* TF model is separated from the PR to fix issues

cdddfbff

Move `is_pipeline_test_to_skip` to specific model test classes (#21999) · 6c2ad00c

Yih-Dar authored Mar 14, 2023



* Move `is_pipeline_test_to_skip` to specific model test classes

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

6c2ad00c

[

🛠

️] Fix-whisper-breaking-changes (#21965) · 2beabd24

Arthur authored Mar 14, 2023



* temp fix

* temporary fix

* update

* fix tests

* fixup

* update based on reveiew
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* update to fix tests

* update docstring

---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

2beabd24

13 Mar, 2023 15 commits

docs: New terms and updates to glossary (#21982) · 101a6cd2

MichaelRipa authored Mar 13, 2023



* Updated glossary with new terms, added abbreviations for certain terms and merged autoencoding models, autoregressive models and causal language modeling into encoder and decoder models

* Update docs/source/en/glossary.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/glossary.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/glossary.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/glossary.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/glossary.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/glossary.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/glossary.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/glossary.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/glossary.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/glossary.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/glossary.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/glossary.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Added link to 'Pipeline for inference' tutorial

* Trigger CI

* Update docs/source/en/glossary.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/en/glossary.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Added entry for self supervised learning, added deleted entries + fixed broken links

* Update docs/source/en/glossary.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

101a6cd2

Prepare daily CI for torch 2.0.0 (#22135) · ba9e0191
Yih-Dar authored Mar 13, 2023
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
ba9e0191

[Safetensors] Add explicit flag to from pretrained (#22083) · f780557a

Patrick von Platen authored Mar 13, 2023



* [Safetensors] Add explicit  flag to from pretrained

* add test

* remove @

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

f780557a

Remove backend check for torch.compile (#22140) · 3a35937e

Sylvain Gugger authored Mar 13, 2023



* Remove backend enforcment for torch.compile

* Update error

* Update src/transformers/training_args.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Style

---------
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

3a35937e

[deepspeed docs] Activation Checkpointing (#22099) · 618697ef

Stas Bekman authored Mar 13, 2023



* [deepspeed docs] Activation Checkpointing

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update deepspeed.mdx

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

618697ef

[trainer] fix bug in grad accum with multiple epochs (#22098) · 5b85add7
Stas Bekman authored Mar 13, 2023
```
* [trainer] fix bug in grad accum

* comment out debug

* fix one-off

* rename counter
```
5b85add7
Enforce same behavior as PyTorch 2.0 for older versions (#22136) · 1c801d65
Sylvain Gugger authored Mar 13, 2023

1c801d65
Trainer: let generate pick its inputs (#22108) · e16cbe88
Joao Gante authored Mar 13, 2023
```
* Let generate pick its inputs

* fix squad seq2seq example
```
e16cbe88

[`Whiper`] add `get_input_embeddings` to `WhisperForAudioClassification` (#22133) · d979cf6e

Younes Belkada authored Mar 13, 2023



* add `get_input_embeddings` to `WhisperForAudioClassification`

* add common tests

* fix another common test

* Update tests/models/whisper/test_modeling_whisper.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix style

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

d979cf6e

Update configuration_align.py (projected_dim=640) (#22139) · 98797237
bishmdl76 authored Mar 13, 2023
```
Update configuration_align.py

updated projected_dim=640 from 512 in arguments of AlignConfig
```
98797237
Add a new script to check model testers' config (#22063) · 54ee56b1
Yih-Dar authored Mar 13, 2023
```
* Add script

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
54ee56b1
Adding Type Hints to TF_Pegasus model (#21941) · a096eaca
mollerup23 authored Mar 13, 2023
```
* Adding Type Hints to TF_Pegasus model

* Updated some parameters per maintainer comments
```
a096eaca
Fix doc link for MGP-STR (#22138) · 6cb5132a
Sylvain Gugger authored Mar 13, 2023

6cb5132a

Zero-shot image classification task guide (#22132) · 8def252d

Maria Khalusova authored Mar 13, 2023



* WIP

* WIP

* manual inference example

* make style

* Apply suggestions from code review
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>

---------
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>

8def252d

Fix gradient checkpointing bug in trocr (#22126) · e61081e7

Karim Foda authored Mar 13, 2023



* Fix gradient checkpointing bug in trocr

* Fix format

* Update src/transformers/models/trocr/modeling_trocr.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

e61081e7