Commits · daf53241d6276c0cd932ee8ce3e5b0a403f392b7 · chenpangpang / transformers

14 Apr, 2023 1 commit
- Fix word_ids hyperlink (#22765) · daf53241
  Mayank Agarwal authored Apr 14, 2023
```
* Fix word_ids hyperlink

* Add suggested fix
```
  daf53241
10 Apr, 2023 1 commit

Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) (#22575) · e0921c6b

Joel Lamy-Poirier authored Apr 10, 2023



* Add model with cli tool

* Remove unwanted stuff

* Add new code

* Remove inference runner

* Style

* Fix checks

* Test updates

* make fixup

* fix docs

* fix doc

* fix test

* hopefully fix pipeline tests

* refactor

* fix CIs

* add comment

* rename to `GPTBigCodeForCausalLM`

* correct readme

* make fixup + docs

* make fixup

* fixes

* fixes

* Remove pruning

* Remove import

* Doc updates

* More pruning removal

* Combine copies

* Single MQA implementation, remove kv cache pre-allocation and padding

* Update doc

* Revert refactor to match gpt2 style

* Merge back key and value caches, fix some type hints

* Update doc

* Fix position ids pith padding (PR 21080)

* Add conversion script temporarily

* Update conversion script

* Remove checkpoint conversion

* New model

* Fix MQA test

* Fix copies

* try fix tests

* FIX TEST!!

* remove  `DoubleHeadsModel`

* add MQA tests

* add slow tests

* clean up

* add CPU checker

* final fixes

* fixes

- fix GPU issue
- fixed slow tests
- skip disk offload

* fix final issue

* Simplify and comment baddbmm fix

* Remove unnecessary code

* Transpose tweaks

* Use beta=1 on cpu, improve tests

---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>

e0921c6b

03 Apr, 2023 1 commit

added biogpt token classifier (#22447) · 7d25c9c8

Mohammed Jabir authored Apr 03, 2023



* added biogpt token classifier

* fix reviews

* Updated modeling_biogpt.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

7d25c9c8

24 Mar, 2023 1 commit

Add Mega: Moving Average Equipped Gated Attention (#21766) · 57f25f4b

Mitch Naylor authored Mar 24, 2023



* add mega file structure and plain pytorch version of mega source code

* added config class with old naming conventions

* filled in mega documentation

* added config class and embeddings with optional token types

* updated notes

* starting the conversion process, deleted intermediate and added use_cache back to config

* renamed config attributes in modeling_mega.py

* checkpointing before refactoring incremental decoding functions

* removed stateful incremental key/values for EMA and self-attention

* refactored MovingAverageGatedAttention to remove stateful k/v history and use unified attention mask

* MovingAverageGatedAttention works with incremental decoding + past values, added sequence length enforcement

* more comments in MovingAverageGatedAttention + checkpointing before GatedCrossAttention

* bug fix in attention mask handling in MovingAverageGatedAttention

* removed incremental state from GatedCrossAttention and removed IncrementalState class

* finished gated cross attention and got MegaLayer working

* fixed causal masking in mega decoder

* fixed how padding and causal masks are passed through MegaLayer with and without k/v caching

* finished MegaModel; tested with encoder, decoder-only, and cross-attention type inputs; started work on downstream classes; removed mentions of position_ids

* added optional dense hidden layer for masked and causal LM classes

* docstring updates in MultiHeadEMA and GatedCrossAttention, removed unnecessary inputs in cross-attention

* removed before_attn_fn in Mega class and updated docstrings and comments up to there

* bug fix in MovingAverageGatedAttention masking

* working conversion of MLM checkpoint in scratchpad script -- perfect matches

* moved arg for hidden dense layer in LM head to config; discovered issue where from_pretrained is renaming gamma and beta parameters

* renamed gamma and beta parameters to avoid HF renaming when loading from checkpoint

* finished checkpoint conversion script

* cleanup old class in mega config script

* removed 'copied from' statements and passing integration tests

* added num_attention_heads=1 to config for integration compatibility, decoder tests working, generation tests failing

* fixed tuple output of megamodel

* all common tests passing after fixing issues in decoder, gradient retention, and initialization

* added mega-specific tests, ready for more documentation and style checks

* updated docstrings; checkpoint before style fixes

* style and quality checks, fixed initialization problem in float_tensor, ready for PR

* added mega to toctree

* removed unnecessary arg in megaconfig

* removed unused arg and fixed code samples with leftover roberta models

* Apply suggestions from code review

Applied all suggestions except the one renaming a class, as I'll need to update that througout
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixed issue where .view breaks batch dimension, conversion script fixed with absolute imports, updated readme with Mega->MEGA

* removed asserts in Mega code, renamed sequencenorm, gatedcrossattention, and NFFN, replaced get_activation_fn with ACTFN, and added sequencenorm to layer norms

* reformatted .forward() docstrings to match style and removed unused mask input in cross-attention

* removed all reset_parameters() methods and rolled into MegaPreTrainedModel._init_weights()

* renamed all single-letter variables and improved readability in tensor size comments, Mega->MEGA in 2 documentation files

* variable names in NFFN

* manual Mega->MEGA changes in docs

* Mega->MEGA in config auto

* style and quality fixes

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* renamed parameters and variables with confusing names, added copied from statements, moved fft conv to its own method, other cleanup from PR comments

* commit before dealing with merge conflicts

* made new attention activation functions available in ACT2FN and added generation test from OPT

* style and quality in activations and tests

* documentation fixes, renaming variables in dropout and rotary positions, used built-in causal masking, encoders->layers in MegaModel, moved comments into docstrings

* style and quality fixes after latest updates, before rotary position ids

* causal mask in MegaBlock docstring + added missing device passing

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* added Mega prefixes where missing, reverted MegaSequenceNorm to if-else, other module renaming requested in PR

* style and quality fixes + readme updates pointing to main

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

57f25f4b

20 Mar, 2023 1 commit
- Fix doc links (#22274) · 8ac29fe0
  amyeroberts authored Mar 20, 2023
  
  8ac29fe0
27 Feb, 2023 1 commit
- Fix en documentation typos (#21799) · ba2a5f13
  Thomas Paviot authored Feb 27, 2023
```
* fix wrong url

* typos in english documentation
```
  ba2a5f13
15 Feb, 2023 1 commit

Add Ernie-M Model to huggingface (#21349) · 0c9c8472

Susnato Dhar authored Feb 15, 2023

* config and tokenization(fast too) changed and ErnieEncoder added

* Slow Tokenization Added

* Tokenizer(slow) is now working and Fast Tokenizer removed

* Added Config code

* Added Base Model and utils

* ErnieMModel is now working

* All added except tests

* All tests passed except ErnieUIEM

* All tests passed

* all fixes done

* all fixes done

* fixed MAP

* fixed check_code_quality

* fixed Build PR Documentation issue

* Added changes(comments) and also updated to the latest upstream/main

* Added fixup

* Added # Copied comments

* Added fixup

* Added more comments and some nits

* Added fixup

* Fixed README_hd.md

* Added more fixes

* ErnieMTokenizer (being sentencepiece) protected and other docs edited

* Added code_quality fix

* Fixed for

* Added more fix

* modified AZ

* ernie-m tokenization test added!

* attention mask part fixed(with 0->self.config.pad_token_id)

* applied make fixup

0c9c8472

10 Feb, 2023 1 commit

Add X-MOD (#20939) · b0d539cc

Jannis Vamvas authored Feb 10, 2023



* Add X-MOD to Readme

* Add documentation for X-MOD

* Implement X-MOD

* Fix formatting of X-MOD docs

* Change signature of X-MOD forward methods to use lang_ids

* Minor changes

* Rebase with main and run make fix-copies

* Make suggested changes to docstrings

* Improve code readability
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Fix code style

* Conversion script: Remove asserts and type annotations

* Remove _TOKENIZER_FOR_DOC

* XMOD -> Xmod

* Update copyright note

* Fix doctests

* Fix docstring

* Add integration test for FillMaskPipeline

* Revert "Add integration test for FillMaskPipeline"

This reverts commit 4381eb3b1d0f5d85785f89caba83928e6efa6d1f.

* Add end-to-end integration test for mask fill

* make style

* Rebase with main and make fix-copies

---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

b0d539cc

02 Feb, 2023 1 commit
- Fix task guide formatting (#21409) · 0a757176
  Steven Liu authored Feb 02, 2023
```
fix formatting
```
  0a757176
27 Jan, 2023 1 commit

Automated compatible models list for task guides (#21338) · 73a2ff69

Maria Khalusova authored Jan 27, 2023

* initial commit. added tip placeholders and a script

* removed unused imports, fixed paths

* fixed generated links

* make style

* split language modeling doc into two: causal language modeling and masked language modeling

* added check_task_guides.py to make fix-copies

* review feedback addressed

73a2ff69

25 Jan, 2023 1 commit

Documentation code sample fixes (#21302) · 23844941

Maria Khalusova authored Jan 25, 2023

* Fixed the following:
pipe -> pipeline
out in pipe(data()) is a list of dict, not a dict

* Fixed the TypeError: __init__() missing 1 required positional argument: 'key'

* Added a tip: code sample requires additional libraries to run

* Fixed custom config's name

* added seqeval to the required libraries

* fixed a missing dependency,
fixed metric naming,
added checkpoint to fix the datacollator

* added checkpoint to fix the datacollator,
added missing dependency

23844941

21 Nov, 2022 1 commit

Add inference section to task guides (#18781) · d896029e

Steven Liu authored Nov 21, 2022

* 📝 start adding inference section to task guides

* ✨ make style

* 📝 add multiple choice

* add rest of inference sections

* make style

* add compute_metric, push_to_hub, pipeline

* make style

* add updated sequence and token classification

* make style

* make edits in token classification

* add audio classification

* make style

* add asr

* make style

* add image classification

* make style

* add summarization

* make style

* add translation

* make style

* add multiple choice

* add language modeling

* add qa

* make style

* review and edits

* apply reviews

* make style

* fix call to processor

* apply audio reviews

* update to better asr model

* make style

d896029e

07 Sep, 2022 1 commit

Update TF fine-tuning docs (#18654) · 2b9513fd

Matt authored Sep 07, 2022



* Update TF fine-tuning docs

* Fix formatting

* Add some section headers so the right sidebar works better

* Squiggly it

* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Explain things in the text, not the comments

* Make the two dataset creation methods into a list

* Move the advice about collation out of a <Tip>

* Edits for clarity

* Edits for clarity

* Edits for clarity

* Replace `to_tf_dataset` with `prepare_tf_dataset` in the fine-tuning pages

* Restructure the page a little bit

* Restructure the page a little bit

* Restructure the page a little bit
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

2b9513fd

06 Jul, 2022 1 commit
- Doc to dataset (#18037) · 2e90c3df
  Sylvain Gugger authored Jul 06, 2022
```
* Link to the Datasets doc

* Remove unwanted file
```
  2e90c3df
04 Apr, 2022 1 commit

Enable doc in Spanish (#16518) · b9a768b3

Sylvain Gugger authored Apr 04, 2022

* Reorganize doc for multilingual support

* Fix style

* Style

* Toc trees

* Adapt templates

b9a768b3

25 Mar, 2022 1 commit
- Rename master to main for notebooks links and leftovers (#16397) · 867f3950
  Sylvain Gugger authored Mar 25, 2022
  
  867f3950
22 Mar, 2022 1 commit

Adopt framework-specific blocks for content (#16342) · 77321481

Steven Liu authored Mar 22, 2022

* ✨ refactor code samples with framework-specific blocks

* ✨ update training.mdx

* 🖍 apply feedback

77321481

18 Mar, 2022 1 commit
- Fix links in guides (#16182) · ffc319e7
  Steven Liu authored Mar 18, 2022
```
* 🖍 fix links in guides

* 🖍 apply feedback
```
  ffc319e7
15 Mar, 2022 1 commit
- Framework split (#16030) · 4f4e5ddb
  Sylvain Gugger authored Mar 15, 2022
```
* First files

* More files

* Last files

* Style
```
  4f4e5ddb
10 Mar, 2022 1 commit
- updating fine-tune classifier documentation (#16063) · 96ac7549
  David S. Batista authored Mar 10, 2022
  
  96ac7549
23 Feb, 2022 1 commit

🧼 NLP task guides (#15731) · fecb08c2

Steven Liu authored Feb 23, 2022

* clean commit of changes to NLP tasks

* 🖍 apply feedback

* 📝

 move tf data collator in multiple choice
Co-authored-by: Steven <stevhliu@gmail.com>

fecb08c2