Commits · c9385976571a215197a62899ffece50b383f5137 · chenpangpang / transformers

06 Jun, 2023 1 commit

[i18n-KO] Translated `language-modeling.mdx` (#23969) · c9385976

Wonhyeong Seo authored Jun 06, 2023



* docs: ko: `language_modeling.mdx`

* feat: nmt draft

* fix: manual edits

* fix: add inline toc

* fix: typo in toc_tree.yml

* fix: resolve suggestions
Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com>

---------
Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com>

c9385976

02 Jun, 2023 1 commit

Add an option to reduce compile() console spam (#23938) · 167a0d8f

Matt authored Jun 02, 2023

* Add an option to reduce compile() console spam

* Add annotations to the example scripts

* Add notes to the quicktour docs as well

* minor fix

167a0d8f

09 May, 2023 1 commit

Add RWKV-4 (#22797) · b4d4d6fe

Sylvain Gugger authored May 09, 2023



* First draft of RWKV-4

* Add support for generate

* Style post-rebase

* Properly use state

* Write doc

* Fix doc

* More math

* Add model to README, dummies and clean config

* Fix init

* multiple fixes:

- fix common tests
- fix configuraion default values
- add CI test for checking state computation
- fix some CI tests

* correct tokenizer

* some tweaks

- fix config docstring
- fix failing tests

* fix CI tests

- add output_attention / output_hidden_states
- override test_initialization
- fix failing CIs

* fix conversion script

- fix sharded case
- add new arguments

* add slow tests + more fixes on conversion script

* add another test

* final fixes

* change single name variable

* add mock attention mask for pipeline to work

* correct eos token id

* fix nits

* add checkpoints

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add `tie_word_embeddings` in docstring

* change tensor name

* fix final nits

* Trigger CI

---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

b4d4d6fe

28 Apr, 2023 1 commit

add open-llama model with ckpt (#22795) · c2c99dc7

s-JoL authored Apr 28, 2023



* update Open-Llama model

* update

* update format

* update doc

* update

* update stable embedding test

* update test case

* update format

* update readme

* fix typo

* update name

* remove tokenizer and update format

* remove convert_open_llama_weights_to_hf

* update warning and doc_string

---------
Co-authored-by: songliang.bayesian <songliang.bayesian@bytedance.com>

c2c99dc7

12 Apr, 2023 1 commit

add model resources for CPMAnt (new) (#20906) · 523ca4e0

pioliverse authored Apr 12, 2023



* resolve conflicts

* rebase and make style

* test

* test

* test

* rebase and make style

* rebase and make style

* tests

* tests

* rewrite some functions

* rebase and make style

* fix load_tf_weights_in_cpmant

* reformat some unrelated files

* upgrade quality

* fix some bugs & docstring

* add models and tests

* solve conflicts

* resolve conflicts

* resolve conflicts

* resolve conflicts

* resolve conflicts

* tests

* resolve conflicts

* resolve conflicts

* fix load_tf_weights_in_cpmant

* reformat some unrelated files

* upgrade quality

* fix some bugs & docstring

* save resolution

* make style

* delete redefinition code

* reformat function

* reformat

* resolve conflicts

* resolve conflicts

* resolve conflicts

* resolve conflicts

* resolve conflicts

* tests

* resolve conflicts

* resolve conflicts

* fix load_tf_weights_in_cpmant

* reformat some unrelated files

* upgrade quality

* resolve conflicts

* resolve conflicts

* resolve conflicts

* resolve conflicts

* resolve conflicts

* fix load_tf_weights_in_cpmant

* reformat some unrelated files

* upgrade quality

* resolve conflicts

* make style

* fix bugs and refactor

* modify docstrings and make style

* unify import format in __init__.py

* fix import-altclp bug

* fix copies to update index.md

* fix unused config parameters

* fix unused config parameters

* fix unused config parameters

* update README_ja.md

* dummy commit for unit test

* fix attention mask

* add CPMAntTokenizer&-Fast to auto-mapping

* drop redundant changes in README_ko

* fix  defaults in docstring

* fix use_cache and some docstring

* add missing args in tokenizer

* modify tester inheritance

* add is_jieba_available

* fix some bugs

* make style and fix-copies

* add doctests

* skip integration tests

* add is_jieba_available

* fix bugs in common tests

* adjust docstrings and make style

* add argument docstring

* adjust code to some specifications

* make style and fix-copies

* add fast tokenization test

* dummy commit for unit test

* dummy commit for unit test

* dummy commit for unit test

* normalize some comments and names

* Bert->CPMAnt

* camel names and drop redundant codes

* make style and fix-coies

* add CpmTokenizerFast _import_structure

* drop cpmanttokenizerfast in model_doc

* fix some problems

* fix CPMAnt tokenization for common test

* make style and fixup

* fix copies and fixup

* fix bugs in tokenization test

* dummy commit for connection failure in unittest

* fix copies

* drop trailing comma

* fix decorator in tests

* dummy commit for connection failure in unittest

---------
Co-authored-by: Gong Baitao <gongbaitao11@gmail.com>

523ca4e0

10 Apr, 2023 1 commit

Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) (#22575) · e0921c6b

Joel Lamy-Poirier authored Apr 10, 2023



* Add model with cli tool

* Remove unwanted stuff

* Add new code

* Remove inference runner

* Style

* Fix checks

* Test updates

* make fixup

* fix docs

* fix doc

* fix test

* hopefully fix pipeline tests

* refactor

* fix CIs

* add comment

* rename to `GPTBigCodeForCausalLM`

* correct readme

* make fixup + docs

* make fixup

* fixes

* fixes

* Remove pruning

* Remove import

* Doc updates

* More pruning removal

* Combine copies

* Single MQA implementation, remove kv cache pre-allocation and padding

* Update doc

* Revert refactor to match gpt2 style

* Merge back key and value caches, fix some type hints

* Update doc

* Fix position ids pith padding (PR 21080)

* Add conversion script temporarily

* Update conversion script

* Remove checkpoint conversion

* New model

* Fix MQA test

* Fix copies

* try fix tests

* FIX TEST!!

* remove  `DoubleHeadsModel`

* add MQA tests

* add slow tests

* clean up

* add CPU checker

* final fixes

* fixes

- fix GPU issue
- fixed slow tests
- skip disk offload

* fix final issue

* Simplify and comment baddbmm fix

* Remove unnecessary code

* Transpose tweaks

* Use beta=1 on cpu, improve tests

---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>

e0921c6b

24 Mar, 2023 1 commit

Add Mega: Moving Average Equipped Gated Attention (#21766) · 57f25f4b

Mitch Naylor authored Mar 24, 2023



* add mega file structure and plain pytorch version of mega source code

* added config class with old naming conventions

* filled in mega documentation

* added config class and embeddings with optional token types

* updated notes

* starting the conversion process, deleted intermediate and added use_cache back to config

* renamed config attributes in modeling_mega.py

* checkpointing before refactoring incremental decoding functions

* removed stateful incremental key/values for EMA and self-attention

* refactored MovingAverageGatedAttention to remove stateful k/v history and use unified attention mask

* MovingAverageGatedAttention works with incremental decoding + past values, added sequence length enforcement

* more comments in MovingAverageGatedAttention + checkpointing before GatedCrossAttention

* bug fix in attention mask handling in MovingAverageGatedAttention

* removed incremental state from GatedCrossAttention and removed IncrementalState class

* finished gated cross attention and got MegaLayer working

* fixed causal masking in mega decoder

* fixed how padding and causal masks are passed through MegaLayer with and without k/v caching

* finished MegaModel; tested with encoder, decoder-only, and cross-attention type inputs; started work on downstream classes; removed mentions of position_ids

* added optional dense hidden layer for masked and causal LM classes

* docstring updates in MultiHeadEMA and GatedCrossAttention, removed unnecessary inputs in cross-attention

* removed before_attn_fn in Mega class and updated docstrings and comments up to there

* bug fix in MovingAverageGatedAttention masking

* working conversion of MLM checkpoint in scratchpad script -- perfect matches

* moved arg for hidden dense layer in LM head to config; discovered issue where from_pretrained is renaming gamma and beta parameters

* renamed gamma and beta parameters to avoid HF renaming when loading from checkpoint

* finished checkpoint conversion script

* cleanup old class in mega config script

* removed 'copied from' statements and passing integration tests

* added num_attention_heads=1 to config for integration compatibility, decoder tests working, generation tests failing

* fixed tuple output of megamodel

* all common tests passing after fixing issues in decoder, gradient retention, and initialization

* added mega-specific tests, ready for more documentation and style checks

* updated docstrings; checkpoint before style fixes

* style and quality checks, fixed initialization problem in float_tensor, ready for PR

* added mega to toctree

* removed unnecessary arg in megaconfig

* removed unused arg and fixed code samples with leftover roberta models

* Apply suggestions from code review

Applied all suggestions except the one renaming a class, as I'll need to update that througout
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixed issue where .view breaks batch dimension, conversion script fixed with absolute imports, updated readme with Mega->MEGA

* removed asserts in Mega code, renamed sequencenorm, gatedcrossattention, and NFFN, replaced get_activation_fn with ACTFN, and added sequencenorm to layer norms

* reformatted .forward() docstrings to match style and removed unused mask input in cross-attention

* removed all reset_parameters() methods and rolled into MegaPreTrainedModel._init_weights()

* renamed all single-letter variables and improved readability in tensor size comments, Mega->MEGA in 2 documentation files

* variable names in NFFN

* manual Mega->MEGA changes in docs

* Mega->MEGA in config auto

* style and quality fixes

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* renamed parameters and variables with confusing names, added copied from statements, moved fft conv to its own method, other cleanup from PR comments

* commit before dealing with merge conflicts

* made new attention activation functions available in ACT2FN and added generation test from OPT

* style and quality in activations and tests

* documentation fixes, renaming variables in dropout and rotary positions, used built-in causal masking, encoders->layers in MegaModel, moved comments into docstrings

* style and quality fixes after latest updates, before rotary position ids

* causal mask in MegaBlock docstring + added missing device passing

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* added Mega prefixes where missing, reverted MegaSequenceNorm to if-else, other module renaming requested in PR

* style and quality fixes + readme updates pointing to main

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

57f25f4b

16 Mar, 2023 1 commit

LLaMA Implementation (#21955) · 0041be5b

Jason Phang authored Mar 16, 2023



* LLaMA

* sharding and docs

* tweak

* black

* inits

* ruff

* LLAMA_PRETRAINED_CONFIG_ARCHIVE_MAP

* init

* no checkpoint

* docs

* ruff

* type_vocab_size

* tokenizer fixes

* tokenizer fixes

* Update tokenization_llama.py

* Update tokenization_llama.py

* Update configuration_llama.py

* Update modeling_llama.py

* tokenizer add_bos by default

* licenses

* remove decoder

* norms and mlp

* rope overhaul

* tweaks

* black

* mention OPT implementation

* off-by-one naming

* typo

* fix

* tokenization fix and slicing bug

* padding config

* cleanup

* black

* update tests

* undo typo

* fix vocab caching logic

* ruff

* docbuilder

* attn fix from BlackSamorez

* initial feedback

* typo

* docs

* llama case

* llama case

* load checkpoint docs

* comment about tokenizer

* tokenizer defaults

* clear past_key_values if use_cache=False

* last tweaks

* last tweaks

* last tweaks

* last tweaks

---------
Co-authored-by: Stella Biderman <stellabiderman@gmail.com>

0041be5b

06 Mar, 2023 1 commit

docs: improve clarity for language modeling (#21952) · 31e3c6c3

PD Hall authored Mar 06, 2023

* docs: improve clarity for clm/mlm

* docs: remove incorrect explanation

* docs: remove incorrect explanation

---------

Co-authored-by: pdhall99 <pdhall99>

31e3c6c3

10 Feb, 2023 1 commit

Add X-MOD (#20939) · b0d539cc

Jannis Vamvas authored Feb 10, 2023



* Add X-MOD to Readme

* Add documentation for X-MOD

* Implement X-MOD

* Fix formatting of X-MOD docs

* Change signature of X-MOD forward methods to use lang_ids

* Minor changes

* Rebase with main and run make fix-copies

* Make suggested changes to docstrings

* Improve code readability
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Fix code style

* Conversion script: Remove asserts and type annotations

* Remove _TOKENIZER_FOR_DOC

* XMOD -> Xmod

* Update copyright note

* Fix doctests

* Fix docstring

* Add integration test for FillMaskPipeline

* Revert "Add integration test for FillMaskPipeline"

This reverts commit 4381eb3b1d0f5d85785f89caba83928e6efa6d1f.

* Add end-to-end integration test for mask fill

* make style

* Rebase with main and make fix-copies

---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

b0d539cc

02 Feb, 2023 1 commit
- Fix task guide formatting (#21409) · 0a757176
  Steven Liu authored Feb 02, 2023
```
fix formatting
```
  0a757176
27 Jan, 2023 1 commit

Automated compatible models list for task guides (#21338) · 73a2ff69

Maria Khalusova authored Jan 27, 2023

* initial commit. added tip placeholders and a script

* removed unused imports, fixed paths

* fixed generated links

* make style

* split language modeling doc into two: causal language modeling and masked language modeling

* added check_task_guides.py to make fix-copies

* review feedback addressed

73a2ff69

21 Nov, 2022 1 commit

Add inference section to task guides (#18781) · d896029e

Steven Liu authored Nov 21, 2022

* 📝 start adding inference section to task guides

* ✨ make style

* 📝 add multiple choice

* add rest of inference sections

* make style

* add compute_metric, push_to_hub, pipeline

* make style

* add updated sequence and token classification

* make style

* make edits in token classification

* add audio classification

* make style

* add asr

* make style

* add image classification

* make style

* add summarization

* make style

* add translation

* make style

* add multiple choice

* add language modeling

* add qa

* make style

* review and edits

* apply reviews

* make style

* fix call to processor

* apply audio reviews

* update to better asr model

* make style

d896029e

07 Sep, 2022 1 commit

Update TF fine-tuning docs (#18654) · 2b9513fd

Matt authored Sep 07, 2022



* Update TF fine-tuning docs

* Fix formatting

* Add some section headers so the right sidebar works better

* Squiggly it

* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Explain things in the text, not the comments

* Make the two dataset creation methods into a list

* Move the advice about collation out of a <Tip>

* Edits for clarity

* Edits for clarity

* Edits for clarity

* Replace `to_tf_dataset` with `prepare_tf_dataset` in the fine-tuning pages

* Restructure the page a little bit

* Restructure the page a little bit

* Restructure the page a little bit
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

2b9513fd

06 Jul, 2022 1 commit
- Doc to dataset (#18037) · 2e90c3df
  Sylvain Gugger authored Jul 06, 2022
```
* Link to the Datasets doc

* Remove unwanted file
```
  2e90c3df
28 Jun, 2022 1 commit
- In `group_texts` function, drop last block if smaller than `block_size` (#17908) · bfcd5743
  Bill Ray authored Jun 28, 2022
  
  bfcd5743
04 Apr, 2022 1 commit

Enable doc in Spanish (#16518) · b9a768b3

Sylvain Gugger authored Apr 04, 2022

* Reorganize doc for multilingual support

* Fix style

* Style

* Toc trees

* Adapt templates

b9a768b3

25 Mar, 2022 1 commit
- Rename master to main for notebooks links and leftovers (#16397) · 867f3950
  Sylvain Gugger authored Mar 25, 2022
  
  867f3950
22 Mar, 2022 1 commit

Adopt framework-specific blocks for content (#16342) · 77321481

Steven Liu authored Mar 22, 2022

* ✨ refactor code samples with framework-specific blocks

* ✨ update training.mdx

* 🖍 apply feedback

77321481

18 Mar, 2022 1 commit
- Fix links in guides (#16182) · ffc319e7
  Steven Liu authored Mar 18, 2022
```
* 🖍 fix links in guides

* 🖍 apply feedback
```
  ffc319e7
15 Mar, 2022 1 commit
- Framework split (#16030) · 4f4e5ddb
  Sylvain Gugger authored Mar 15, 2022
```
* First files

* More files

* Last files

* Style
```
  4f4e5ddb
23 Feb, 2022 1 commit

🧼 NLP task guides (#15731) · fecb08c2

Steven Liu authored Feb 23, 2022

* clean commit of changes to NLP tasks

* 🖍 apply feedback

* 📝

 move tf data collator in multiple choice
Co-authored-by: Steven <stevhliu@gmail.com>

fecb08c2