Commits · 4991216841968e11b78e16379c42e9fc1011db88 · chenpangpang / transformers

02 Nov, 2023 1 commit

Removed the redundant SiLUActivation class. (#27136) · 49912168

Chi authored Nov 02, 2023

* Removed the redundant SiLUActivation class and now use nn.functional.silu directly.

* I apologize for adding torch.functional.silu. I have replaced it with nn.SiLU.

49912168

24 Mar, 2023 1 commit

Add Mega: Moving Average Equipped Gated Attention (#21766) · 57f25f4b

Mitch Naylor authored Mar 24, 2023



* add mega file structure and plain pytorch version of mega source code

* added config class with old naming conventions

* filled in mega documentation

* added config class and embeddings with optional token types

* updated notes

* starting the conversion process, deleted intermediate and added use_cache back to config

* renamed config attributes in modeling_mega.py

* checkpointing before refactoring incremental decoding functions

* removed stateful incremental key/values for EMA and self-attention

* refactored MovingAverageGatedAttention to remove stateful k/v history and use unified attention mask

* MovingAverageGatedAttention works with incremental decoding + past values, added sequence length enforcement

* more comments in MovingAverageGatedAttention + checkpointing before GatedCrossAttention

* bug fix in attention mask handling in MovingAverageGatedAttention

* removed incremental state from GatedCrossAttention and removed IncrementalState class

* finished gated cross attention and got MegaLayer working

* fixed causal masking in mega decoder

* fixed how padding and causal masks are passed through MegaLayer with and without k/v caching

* finished MegaModel; tested with encoder, decoder-only, and cross-attention type inputs; started work on downstream classes; removed mentions of position_ids

* added optional dense hidden layer for masked and causal LM classes

* docstring updates in MultiHeadEMA and GatedCrossAttention, removed unnecessary inputs in cross-attention

* removed before_attn_fn in Mega class and updated docstrings and comments up to there

* bug fix in MovingAverageGatedAttention masking

* working conversion of MLM checkpoint in scratchpad script -- perfect matches

* moved arg for hidden dense layer in LM head to config; discovered issue where from_pretrained is renaming gamma and beta parameters

* renamed gamma and beta parameters to avoid HF renaming when loading from checkpoint

* finished checkpoint conversion script

* cleanup old class in mega config script

* removed 'copied from' statements and passing integration tests

* added num_attention_heads=1 to config for integration compatibility, decoder tests working, generation tests failing

* fixed tuple output of megamodel

* all common tests passing after fixing issues in decoder, gradient retention, and initialization

* added mega-specific tests, ready for more documentation and style checks

* updated docstrings; checkpoint before style fixes

* style and quality checks, fixed initialization problem in float_tensor, ready for PR

* added mega to toctree

* removed unnecessary arg in megaconfig

* removed unused arg and fixed code samples with leftover roberta models

* Apply suggestions from code review

Applied all suggestions except the one renaming a class, as I'll need to update that througout
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixed issue where .view breaks batch dimension, conversion script fixed with absolute imports, updated readme with Mega->MEGA

* removed asserts in Mega code, renamed sequencenorm, gatedcrossattention, and NFFN, replaced get_activation_fn with ACTFN, and added sequencenorm to layer norms

* reformatted .forward() docstrings to match style and removed unused mask input in cross-attention

* removed all reset_parameters() methods and rolled into MegaPreTrainedModel._init_weights()

* renamed all single-letter variables and improved readability in tensor size comments, Mega->MEGA in 2 documentation files

* variable names in NFFN

* manual Mega->MEGA changes in docs

* Mega->MEGA in config auto

* style and quality fixes

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* renamed parameters and variables with confusing names, added copied from statements, moved fft conv to its own method, other cleanup from PR comments

* commit before dealing with merge conflicts

* made new attention activation functions available in ACT2FN and added generation test from OPT

* style and quality in activations and tests

* documentation fixes, renaming variables in dropout and rotary positions, used built-in causal masking, encoders->layers in MegaModel, moved comments into docstrings

* style and quality fixes after latest updates, before rotary position ids

* causal mask in MegaBlock docstring + added missing device passing

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* added Mega prefixes where missing, reverted MegaSequenceNorm to if-else, other module renaming requested in PR

* style and quality fixes + readme updates pointing to main

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

57f25f4b

02 Feb, 2023 1 commit
- Add the GeLU activation from pytorch with the tanh approximation (#21345) · e006ab51
  Joel Lamy-Poirier authored Feb 02, 2023
```
* gelu_python_tanh

* rename

* Version check, add test

* Pr comment
```
  e006ab51
14 Nov, 2022 1 commit

add MobileNetV2 model (#17845) · f711d683

Matthijs Hollemans authored Nov 14, 2022

* add model files etc for MobileNetV2

* rename files for MobileNetV1

* initial implementation of MobileNetV1

* fix conversion script

* cleanup

* write docs

* tweaks

* fix conversion script

* extract hidden states

* fix test cases

* make fixup

* fixup it all

* rename V1 to V2

* fix checkpoints

* fixup

* implement first block + weight conversion

* add remaining layers

* add output stride and dilation

* fixup

* add tests

* add deeplabv3+ head

* a bit of fixup

* finish deeplab conversion

* add link to doc

* fix issue with JIT trace

in_height and in_width would be Tensor objects during JIT trace, which caused Core ML conversion to fail on the remainder op. By making them ints, the result of the padding calculation becomes a constant value.

* cleanup

* fix order of models

* fix rebase error

* remove main from doc link

* add image processor

* remove old feature extractor

* fix converter + other issues

* fixup

* fix unit test

* add to onnx tests (but these appear broken now)

* add post_process_semantic_segmentation

* use google org

* remove unused imports

* move args

* replace weird assert

f711d683

20 Oct, 2022 1 commit
- Fix exception thrown using MishActivation (#19739) · baa00f65
  Chino authored Oct 20, 2022
```
* Fix exception thrown using MishActivation

* Update activations.py
```
  baa00f65
18 Oct, 2022 1 commit
- Fix activations being all the same module (#19728) · fb0bd7b7
  Sylvain Gugger authored Oct 18, 2022
  
  fb0bd7b7
14 Sep, 2022 1 commit
- PyTorch >= 1.7.0 and TensorFlow >= 2.4.0 (#19016) · a2a3afbc
  Sylvain Gugger authored Sep 14, 2022
  
  a2a3afbc
03 Aug, 2022 1 commit

Fix torch version comparisons (#18460) · 02b176c4

LSinev authored Aug 03, 2022

Comparisons like
version.parse(torch.__version__) > version.parse("1.6")
are True for torch==1.6.0+cu101 or torch==1.6.0+cpu

version.parse(version.parse(torch.__version__).base_version) are preferred (and available in pytorch_utils.py

02b176c4

19 Apr, 2022 1 commit
- TF: Add sigmoid activation function (#16819) · f09c45e0
  Joao Gante authored Apr 19, 2022
  
  f09c45e0
22 Feb, 2022 1 commit

Gelu10 (#15676) · 32295b15

Funtowicz Morgan authored Feb 22, 2022

* Add GeLU10 (clipped version of GeLU) to transformers to improve quantization performances.

* Add unittests.

* Import tensorflow after `is_tf_available` check.

* Fix tensorflow wrong function `tf.tensor` to `tf.constant`

* style.

* use `tf.math.max`

* Fix tf tests.

* style.

* style style style style style style

* style style style style style style

* Address @sgugger comments.

* Fix wrong operator for raising ValueError for ClippedGELUActivation.

32295b15

18 Feb, 2022 1 commit
- Fix SiluActivation (#15718) · 416dff73
  Sylvain Gugger authored Feb 18, 2022
  
  416dff73
16 Feb, 2022 1 commit

Implementation of activations as pytorch modules (#15616) · f65fe366

Eldar Kurtic authored Feb 16, 2022



* Implement activations as pytorch modules

* Apply fixup

* Add missing tests for activations

* Update docstring
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

f65fe366

28 Oct, 2021 1 commit
- Fix SEW-D implementation differences (#14191) · 1251072f
  Anton Lozhkov authored Oct 28, 2021
```
* Fix SEW-D

* Update tests

* isort
```
  1251072f
18 Jun, 2021 1 commit
- Depreciate pythonic Mish and support PyTorch 1.9 version of Mish (#12240) · f3558bbc
  Xa9aX ツ authored Jun 18, 2021
```
* Moved Mish to Torch 1.9 version

* Run black formatting
```
  f3558bbc
14 Jun, 2021 1 commit
- [style] consistent nn. and nn.functional (#12124) · 1ed2ebf6
  Stas Bekman authored Jun 14, 2021
```
* consistent nn. and nn.functional

* fix glitch

* fix glitch #2
```
  1ed2ebf6
12 May, 2021 1 commit

CLIP (#11445) · 8719afa1

Suraj Patil authored May 12, 2021



* begin second draft

* fix import, style

* add loss

* fix embeds, logits_scale, and projection

* fix imports

* add conversion script

* add feature_extractor and processor

* style

* add tests for tokenizer, extractor and processor

* add vision model tests

* add weight init

* add more tests

* fix save_load  test

* model output, dosstrings, causal mask

* config doc

* add clip model tests

* return dict

* bigin integration test

* add integration tests

* fix-copies

* fix init

* Clip => CLIP

* fix module name

* docs

* fix doc

* output_dim => projection_dim

* fix checkpoint names

* remoe fast tokenizer file

* fix conversion script

* fix tests, quality

* put causal mask on device

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix attribute test

* style

* address sylvains comments

* style

* fix docstrings

* add qucik_gelu in activations, docstrings

* clean-up attention test

* fix act fun

* fix config

* fix torchscript tests

* even batch_size

* remove comment

* fix ouput tu_tuple

* fix save load tests

* fix add tokens test

* add fast tokenizer

* update copyright

* new processor API

* fix docs

* docstrings

* docs

* fix doc

* fix doc

* fix tokenizer

* fix import in doc example

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* check types of config

* valhalla => openai

* load image using url

* fix test

* typo
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

8719afa1

31 Mar, 2021 1 commit

Enforce string-formatting with f-strings (#10980) · acc3bd9d

Sylvain Gugger authored Mar 31, 2021



* First third

* Styling and fix mistake

* Quality

* All the rest

* Treat %s and %d

* typo

* Missing )

* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

acc3bd9d

07 Dec, 2020 1 commit
- Copyright (#8970) · 00aa9dbc
  Sylvain Gugger authored Dec 07, 2020
```
* Add copyright everywhere missing

* Style
```
  00aa9dbc
30 Oct, 2020 1 commit

Replace swish with silu (#8166) · 00112c35

TFUsers authored Oct 30, 2020



* Replace swish with silu

* revert nn.silu to nn.swish due to older version

* simplify optimized silu conditional and fix format

* Update activations.py

* Update activations_tf.py

* Update modeling_flax_utils.py

* Update modeling_openai.py

* add swish testcase

* add pytorch swish testcase

* Add more robust python version check

* more formatting fixes
Co-authored-by: TFUsers <TFUsers@gmail.com>

00112c35

26 Oct, 2020 1 commit

Doc styling (#8067) · 08f534d2

Sylvain Gugger authored Oct 26, 2020

* Important files

* Styling them all

* Revert "Styling them all"

This reverts commit 7d029395fdae8513b8281cbc2a6c239f8093503e.

* Syling them for realsies

* Fix syntax error

* Fix benchmark_utils

* More fixes

* Fix modeling auto and script

* Remove new line

* Fixes

* More fixes

* Fix more files

* Style

* Add FSMT

* More fixes

* More fixes

* More fixes

* More fixes

* Fixes

* More fixes

* More fixes

* Last fixes

* Make sphinx happy

08f534d2

17 Oct, 2020 1 commit
- Remove duplicated mish activation function (#7856) · c65863ce
  Raza Habib authored Oct 17, 2020
```
* Remove duplicated mish activation function

* Update activations.py
```
  c65863ce
30 Sep, 2020 1 commit

Add DeBERTa model (#5929) · 7a0cf0ec

Pengcheng He authored Sep 30, 2020



* Add DeBERTa model

* Remove dependency of deberta

* Address comments

* Patch DeBERTa
Documentation
Style

* Add final tests

* Style

* Enable tests + nitpicks

* position IDs

* BERT -> DeBERTa

* Quality

* Style

* Tokenization

* Last updates.

* @patrickvonplaten's comments

* Not everything can be a copy

* Apply most of @sgugger's review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Last reviews

* DeBERTa -> Deberta
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

7a0cf0ec

24 Sep, 2020 1 commit
- Make PyTorch model files independent from each other (#7352) · 27174bd4
  Sylvain Gugger authored Sep 24, 2020
  
  27174bd4
22 Sep, 2020 1 commit

Add LayoutLM Model (#7064) · cd9a0585

Minghao Li authored Sep 22, 2020



* first version

* finish test docs readme model/config/tokenization class

* apply make style and make quality

* fix layoutlm GitHub link

* fix conflict in index.rst and add layoutlm to pretrained_models.rst

* fix bug in test_parents_and_children_in_mappings

* reformat modeling_auto.py and tokenization_auto.py

* fix bug in test_modeling_layoutlm.py

* Update docs/source/model_doc/layoutlm.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/model_doc/layoutlm.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* remove inh, add tokenizer fast, and update some doc

* copy and rename necessary class from modeling_bert to modeling_layoutlm

* Update src/transformers/configuration_layoutlm.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/configuration_layoutlm.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/configuration_layoutlm.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/configuration_layoutlm.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_layoutlm.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_layoutlm.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_layoutlm.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* add mish to activations.py, import ACT2FN and import logging from utils
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

cd9a0585

26 Aug, 2020 2 commits

Black 20 release · a75c64d8
Lysandre authored Aug 26, 2020

a75c64d8

Centralize logging (#6434) · 77abd1e7

Lysandre Debut authored Aug 26, 2020



* Logging

* Style

* hf_logging > utils.logging

* Address @thomwolf's comments

* Update test

* Update src/transformers/benchmark/benchmark_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Revert bad change
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

77abd1e7

11 May, 2020 1 commit
- Allow gpt2 to be exported to valid ONNX (#4244) · 82601f4c
  Tianlei Wu authored May 11, 2020
```
* allow gpt2 to be exported to valid ONNX model

* cast size from int to float explictly
```
  82601f4c
07 May, 2020 1 commit

Reformer (#3351) · dca34695

Patrick von Platen authored May 07, 2020

* first copy & past commit from Bert and morgans LSH code

* add easy way to compare to trax original code

* translate most of function

* make trax lsh self attention deterministic with numpy seed + copy paste code

* add same config

* add same config

* make layer init work

* implemented hash_vectors function for lsh attention

* continue reformer translation

* hf LSHSelfAttentionLayer gives same output as trax layer

* refactor code

* refactor code

* refactor code

* refactor

* refactor + add reformer config

* delete bogus file

* split reformer attention layer into two layers

* save intermediate step

* save intermediate step

* make test work

* add complete reformer block layer

* finish reformer layer

* implement causal and self mask

* clean reformer test and refactor code

* fix merge conflicts

* fix merge conflicts

* update init

* fix device for GPU

* fix chunk length init for tests

* include morgans optimization

* improve memory a bit

* improve comment

* factorize num_buckets

* better testing parameters

* make whole model work

* make lm model work

* add t5 copy paste tokenizer

* add chunking feed forward

* clean config

* add improved assert statements

* make tokenizer work

* improve test

* correct typo

* extend config

* add complexer test

* add new axial position embeddings

* add local block attention layer

* clean tests

* refactor

* better testing

* save intermediate progress

* clean test file

* make shorter input length work for model

* allow variable input length

* refactor

* make forward pass for pretrained model work

* add generation possibility

* finish dropout and init

* make style

* refactor

* add first version of RevNet Layers

* make forward pass work and add convert file

* make uploaded model forward pass work

* make uploaded model forward pass work

* refactor code

* add namedtuples and cache buckets

* correct head masks

* refactor

* made reformer more flexible

* make style

* remove set max length

* add attention masks

* fix up tests

* fix lsh attention mask

* make random seed optional for the moment

* improve memory in reformer

* add tests

* make style

* make sure masks work correctly

* detach gradients

* save intermediate

* correct backprob through gather

* make style

* change back num hashes

* rename to labels

* fix rotation shape

* fix detach

* update

* fix trainer

* fix backward dropout

* make reformer more flexible

* fix conflict

* fix

* fix

* add tests for fixed seed in reformer layer

* fix trainer typo

* fix typo in activations

* add fp16 tests

* add fp16 training

* support fp16

* correct gradient bug in reformer

* add fast gelu

* re-add dropout for embedding dropout

* better naming

* better naming

* renaming

* finalize test branch

* finalize tests

* add more tests

* finish tests

* fix

* fix type trainer

* fix fp16 tests

* fix tests

* fix tests

* fix tests

* fix issue with dropout

* fix dropout seeds

* correct random seed on gpu

* finalize random seed for dropout

* finalize random seed for dropout

* remove duplicate line

* correct half precision bug

* make style

* refactor

* refactor

* docstring

* remove sinusoidal position encodings for reformer

* move chunking to modeling_utils

* make style

* clean config

* make style

* fix tests

* fix auto tests

* pretrained models

* fix docstring

* update conversion file

* Update pretrained_models.rst

* fix rst

* fix rst

* update copyright

* fix test path

* fix test path

* fix small issue in test

* include reformer in generation tests

* add docs for axial position encoding

* finish docs

* Update convert_reformer_trax_checkpoint_to_pytorch.py

* remove isort

* include sams comments

* remove wrong comment in utils

* correct typos

* fix typo

* Update reformer.rst

* applied morgans optimization

* make style

* make gpu compatible

* remove bogus file

* big test refactor

* add example for chunking

* fix typo

* add to README

dca34695

29 Apr, 2020 1 commit
- Remove jitted method so that our models are pickable. (#4050) · e73595bd
  Lysandre Debut authored Apr 29, 2020
  
  e73595bd
23 Apr, 2020 1 commit
- Fix for #3873 to change type of exponent parameter for torch.pow() call from int to float (#3924) · 77b75d2c
  mneilly-et authored Apr 23, 2020
  
  77b75d2c
16 Apr, 2020 2 commits
- Tanh torch warnings · 14cdeee7
  Aryansh Omray authored Apr 16, 2020
  
  14cdeee7
- JIT not compatible with PyTorch/XLA (#3743) · d4867951
  Lysandre Debut authored Apr 16, 2020
  
  d4867951
03 Apr, 2020 1 commit

Speed up GELU computation with torch.jit (#2988) · c6acd246

Max Ryabinin authored Apr 03, 2020

* Compile gelu_new with torchscript

* Compile _gelu_python with torchscript

* Wrap gelu_new with torch.jit for torch>=1.4

c6acd246

19 Mar, 2020 1 commit
- [BART] torch 1.0 compatibility (#3322) · 4e4403c9
  Sam Shleifer authored Mar 19, 2020
```
* config.activation_function
```
  4e4403c9
21 Feb, 2020 1 commit
- Only use F.gelu for torch >=1.4.0 (#2955) · b5b3445c
  Sam Shleifer authored Feb 21, 2020
```
* Only use F.gelu for torch >=1.4.0

* Use F.gelu for newer torch
```
  b5b3445c
13 Feb, 2020 1 commit
- get_activation('relu') provides a simple mapping from strings i… (#2807) · ef74b0f0
  Sam Shleifer authored Feb 13, 2020
```
* activations.py contains a mapping from string to activation function
* resolves some `gelu` vs `gelu_new` ambiguity
```
  ef74b0f0