1. 18 Jul, 2023 1 commit
    • NielsRogge's avatar
      Add DINOv2 (#24016) · 3ec10e6c
      NielsRogge authored
      * First draft
      
      * More improvements
      
      * Convert patch embedding layer
      
      * Convert all weights
      
      * Make conversion work
      
      * Improve conversion script
      
      * Fix style
      
      * Make all tests pass
      
      * Add image processor to auto mapping
      
      * Add swiglu ffn
      
      * Add image processor to conversion script
      
      * Fix conversion of giant model
      
      * Fix documentation
      
      * Fix style
      
      * Fix tests
      
      * Address comments
      
      * Address more comments
      
      * Remove unused arguments
      
      * Remove more arguments
      
      * Rename parameters
      
      * Include mask token
      
      * Address comments
      
      * Add docstring
      
      * Transfer checkpoints
      
      * Empty commit
      3ec10e6c
  2. 11 Jul, 2023 2 commits
  3. 10 Jul, 2023 1 commit
  4. 05 Jul, 2023 1 commit
  5. 03 Jul, 2023 1 commit
    • Arthur's avatar
      [`Umt5`] Add google's umt5 to `transformers` (#24477) · 799df10a
      Arthur authored
      
      
      * add tokenization template
      
      * update conversion script
      
      * update modeling code
      
      * update
      
      * update convert checkpoint
      
      * update modeling
      
      * revert changes on convert script
      
      * new conversion script for new format
      
      * correct position bias
      
      * cleaning a bit
      
      * Credit co authors
      Co-authored-by: default avataragemagician <ahmed.elnaggar@tum.de>
      
      Co-authored-by: stefan-it
      <>
      
      * styling
      
      * Add docq
      
      * fix copies
      
      * add co author
      
      * Other Author
      
      * Merge branch 'main' of https://github.com/huggingface/transformers
      
       into add-umt5
      
      * add testing
      
      * nit
      
      * Update docs/source/en/model_doc/umt5.mdx
      Co-authored-by: default avatarStefan Schweter <stefan@schweter.it>
      
      * fix t5
      
      * actual fix?
      
      * revert wrong changes
      
      * remove
      
      * update test
      
      * more fixes
      
      * revert some changes
      
      * add SPIECE_UNDERLINE
      
      * add a commone xample
      
      * upfate
      
      * fix copies
      
      * revert changes on t5 conversion script
      
      * revert bytefallback changes since there was no addition yet
      
      * fixup
      
      * fixup
      
      * ingore umt5 cutom testing folder
      
      * fix readmes
      
      * revertT5 changes
      
      * same outputs
      
      * fixup
      
      * update example
      
      * Apply suggestions from code review
      
      * style
      
      * draft addition of all new files
      
      * current update
      
      * fix attention and stuff
      
      * finish refactoring
      
      * auto config
      
      * fixup
      
      * more nits
      
      * add umt5 to init
      
      * use md format
      
      * Update README.md
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * revert changes on mt5
      
      * revert mt4 changes
      
      * update test
      
      * more fixes
      
      * add to mapping
      
      * fix-copies
      
      * fix copies
      
      * foix retain grad
      
      * fix some tests
      
      * nits
      
      * done
      
      * Update src/transformers/models/umt5/modeling_umt5.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update docs/source/en/model_doc/umt5.md
      
      * Update src/transformers/models/umt5/__init__.py
      
      * Update docs/source/en/model_doc/umt5.md
      Co-authored-by: default avatarStefan Schweter <stefan@schweter.it>
      
      * Update src/transformers/models/umt5/modeling_umt5.py
      
      * update conversion script + use google checkpoints
      
      * nits
      
      * update test and modelling
      
      * stash slow convert
      
      * update fixupd
      
      * don't change slow
      
      ---------
      
      Co-authored-by: stefan-it <>
      Co-authored-by: default avatarStefan Schweter <stefan@schweter.it>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      799df10a
  6. 29 Jun, 2023 3 commits
    • amyeroberts's avatar
      Removal of deprecated vision methods and specify deprecation versions (#24570) · b324557a
      amyeroberts authored
      * Removal of deprecated methods and specify versions
      
      * Fix tests
      b324557a
    • Sanchit Gandhi's avatar
      Add Musicgen (#24109) · 1c1c9075
      Sanchit Gandhi authored
      
      
      * Add Audiocraft
      
      * add cross attention
      
      * style
      
      * add for lm
      
      * convert and verify
      
      * introduce t5
      
      * split configs
      
      * load t5 + lm
      
      * clean conversion
      
      * copy from t5
      
      * style
      
      * start pattern provider
      
      * make generation work
      
      * style
      
      * fix pos embs
      
      * propagate shape changes
      
      * propagate shape changes
      
      * style
      
      * delay pattern: pad tokens at end
      
      * audiocraft -> musicgen
      
      * fix inits
      
      * add mdx
      
      * style
      
      * fix pad token in processor
      
      * override generate and add todos
      
      * add init to test
      
      * undo pattern delay mask after gen
      
      * remove cfg logits processor
      
      * remove cfg logits processor
      
      * remove logits processor in favour of mask
      
      * clean pos embs
      
      * make fix copies
      
      * update readmes
      
      * clean pos emb
      
      * refactor encoder/decoder
      
      * make fix copies
      
      * update conversion
      
      * fix config imports
      
      * update config docs
      
      * make style
      
      * send pattern mask to device
      
      * pattern mask with delay
      
      * recover prompted audio tokens
      
      * fix docstrings
      
      * laydown test file
      
      * pattern edge case
      
      * remove t5 ref
      
      * add processing class
      
      * config refactor
      
      * better pattern comment
      
      * check if mask is not present
      
      * check if mask is not present
      
      * refactor to auto class
      
      * remove encoder configs
      
      * fix processor
      
      * processor import
      
      * start updating conversion
      
      * start updating tests
      
      * make style
      
      * convert t5, encodec, lm
      
      * convert as composite
      
      * also convert processor
      
      * run generate
      
      * classifier free gen
      
      * comments and clean up
      
      * make style
      
      * docs for logit proc
      
      * docstring for uncond gen
      
      * start lm tests
      
      * work tests
      
      * let the lm generate
      
      * refactor: reshape inside forward
      
      * undo greedy loop changes
      
      * from_enc_dec -> from_sub_model
      
      * fix input id shapes in docstrings
      
      * Apply suggestions from code review
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * undo generate changes
      
      * from sub model config
      
      * Update src/transformers/models/musicgen/modeling_musicgen.py
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * make generate work again
      
      * generate uncond -> get uncond inputs
      
      * remove prefix allowed tokens fn
      
      * better error message
      
      * logit proc checks
      
      * Apply suggestions from code review
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      
      * make decoder only tests work
      
      * composite fast tests
      
      * make style
      
      * uncond generation
      
      * feat extr padding
      
      * make audio prompt work
      
      * fix inputs docstrings
      
      * unconditional inputs: dict -> model output
      
      * clean up tests
      
      * more clean up tests
      
      * make style
      
      * t5 encoder -> auto text encoder
      
      * remove comments
      
      * deal with frames
      
      * fix auto text
      
      * slow tests
      
      * nice mdx
      
      * remove can generate
      
      * todo - hub id
      
      * convert m/l
      
      * make fix copies
      
      * only import generation with torch
      
      * ignore decoder from tests
      
      * don't wrap uncond inputs
      
      * make style
      
      * cleaner uncond inputs
      
      * add example to musicgen forward
      
      * fix docs
      
      * ignore MusicGen Model/ForConditionalGeneration in auto mapping
      
      * add doc section to toctree
      
      * add to doc tests
      
      * add processor tests
      
      * fix push to hub in conversion
      
      * tips for decoder only loading
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * fix conversion for s / m / l checkpoints
      
      * import stopping criteria from module
      
      * remove from pipeline tests
      
      * fix uncond docstring
      
      * decode audio method
      
      * fix docs
      
      * org: sanchit-gandhi -> facebook
      
      * fix max pos embeddings
      
      * remove auto doc (not compatible with shapes)
      
      * bump max pos emb
      
      * make style
      
      * fix doc
      
      * fix config doc
      
      * fix config doc
      
      * ignore musicgen config from docstring
      
      * make style
      
      * fix config
      
      * fix config for doctest
      
      * consistent from_sub_models
      
      * don't automap decoder
      
      * fix mdx save audio file
      
      * fix mdx save audio file
      
      * processor batch decode for audio
      
      * remove keys to ignore
      
      * update doc md
      
      * update generation config
      
      * allow changes for default generation config
      
      * update tests
      
      * make style
      
      * fix docstring for uncond
      
      * fix processor test
      
      * fix processor test
      
      ---------
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      1c1c9075
    • amyeroberts's avatar
      Update old existing feature extractor references (#24552) · ae454f41
      amyeroberts authored
      * Update old existing feature extractor references
      
      * Typo
      
      * Apply suggestions from code review
      
      * Apply suggestions from code review
      
      * Apply suggestions from code review
      
      * Address comments from review - update 'feature extractor'
      Co-authored by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
      ae454f41
  7. 28 Jun, 2023 1 commit
  8. 27 Jun, 2023 1 commit
  9. 26 Jun, 2023 1 commit
  10. 20 Jun, 2023 1 commit
  11. 19 Jun, 2023 1 commit
  12. 14 Jun, 2023 1 commit
  13. 09 Jun, 2023 1 commit
  14. 02 Jun, 2023 2 commits
  15. 24 May, 2023 1 commit
  16. 12 May, 2023 1 commit
  17. 10 May, 2023 1 commit
  18. 09 May, 2023 1 commit
    • Sylvain Gugger's avatar
      Add RWKV-4 (#22797) · b4d4d6fe
      Sylvain Gugger authored
      
      
      * First draft of RWKV-4
      
      * Add support for generate
      
      * Style post-rebase
      
      * Properly use state
      
      * Write doc
      
      * Fix doc
      
      * More math
      
      * Add model to README, dummies and clean config
      
      * Fix init
      
      * multiple fixes:
      
      - fix common tests
      - fix configuraion default values
      - add CI test for checking state computation
      - fix some CI tests
      
      * correct tokenizer
      
      * some tweaks
      
      - fix config docstring
      - fix failing tests
      
      * fix CI tests
      
      - add output_attention / output_hidden_states
      - override test_initialization
      - fix failing CIs
      
      * fix conversion script
      
      - fix sharded case
      - add new arguments
      
      * add slow tests + more fixes on conversion script
      
      * add another test
      
      * final fixes
      
      * change single name variable
      
      * add mock attention mask for pipeline to work
      
      * correct eos token id
      
      * fix nits
      
      * add checkpoints
      
      * Apply suggestions from code review
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * add `tie_word_embeddings` in docstring
      
      * change tensor name
      
      * fix final nits
      
      * Trigger CI
      
      ---------
      Co-authored-by: default avataryounesbelkada <younesbelkada@gmail.com>
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      b4d4d6fe
  19. 04 May, 2023 2 commits
  20. 03 May, 2023 1 commit
  21. 02 May, 2023 1 commit
  22. 01 May, 2023 1 commit
  23. 28 Apr, 2023 1 commit
  24. 27 Apr, 2023 2 commits
  25. 24 Apr, 2023 1 commit
  26. 23 Apr, 2023 1 commit
  27. 14 Apr, 2023 1 commit
  28. 12 Apr, 2023 1 commit
    • pioliverse's avatar
      add model resources for CPMAnt (new) (#20906) · 523ca4e0
      pioliverse authored
      
      
      * resolve conflicts
      
      * rebase and make style
      
      * test
      
      * test
      
      * test
      
      * rebase and make style
      
      * rebase and make style
      
      * tests
      
      * tests
      
      * rewrite some functions
      
      * rebase and make style
      
      * fix load_tf_weights_in_cpmant
      
      * reformat some unrelated files
      
      * upgrade quality
      
      * fix some bugs & docstring
      
      * add models and tests
      
      * solve conflicts
      
      * resolve conflicts
      
      * resolve conflicts
      
      * resolve conflicts
      
      * resolve conflicts
      
      * tests
      
      * resolve conflicts
      
      * resolve conflicts
      
      * fix load_tf_weights_in_cpmant
      
      * reformat some unrelated files
      
      * upgrade quality
      
      * fix some bugs & docstring
      
      * save resolution
      
      * make style
      
      * delete redefinition code
      
      * reformat function
      
      * reformat
      
      * resolve conflicts
      
      * resolve conflicts
      
      * resolve conflicts
      
      * resolve conflicts
      
      * resolve conflicts
      
      * tests
      
      * resolve conflicts
      
      * resolve conflicts
      
      * fix load_tf_weights_in_cpmant
      
      * reformat some unrelated files
      
      * upgrade quality
      
      * resolve conflicts
      
      * resolve conflicts
      
      * resolve conflicts
      
      * resolve conflicts
      
      * resolve conflicts
      
      * fix load_tf_weights_in_cpmant
      
      * reformat some unrelated files
      
      * upgrade quality
      
      * resolve conflicts
      
      * make style
      
      * fix bugs and refactor
      
      * modify docstrings and make style
      
      * unify import format in __init__.py
      
      * fix import-altclp bug
      
      * fix copies to update index.md
      
      * fix unused config parameters
      
      * fix unused config parameters
      
      * fix unused config parameters
      
      * update README_ja.md
      
      * dummy commit for unit test
      
      * fix attention mask
      
      * add CPMAntTokenizer&-Fast to auto-mapping
      
      * drop redundant changes in README_ko
      
      * fix  defaults in docstring
      
      * fix use_cache and some docstring
      
      * add missing args in tokenizer
      
      * modify tester inheritance
      
      * add is_jieba_available
      
      * fix some bugs
      
      * make style and fix-copies
      
      * add doctests
      
      * skip integration tests
      
      * add is_jieba_available
      
      * fix bugs in common tests
      
      * adjust docstrings and make style
      
      * add argument docstring
      
      * adjust code to some specifications
      
      * make style and fix-copies
      
      * add fast tokenization test
      
      * dummy commit for unit test
      
      * dummy commit for unit test
      
      * dummy commit for unit test
      
      * normalize some comments and names
      
      * Bert->CPMAnt
      
      * camel names and drop redundant codes
      
      * make style and fix-coies
      
      * add CpmTokenizerFast _import_structure
      
      * drop cpmanttokenizerfast in model_doc
      
      * fix some problems
      
      * fix CPMAnt tokenization for common test
      
      * make style and fixup
      
      * fix copies and fixup
      
      * fix bugs in tokenization test
      
      * dummy commit for connection failure in unittest
      
      * fix copies
      
      * drop trailing comma
      
      * fix decorator in tests
      
      * dummy commit for connection failure in unittest
      
      ---------
      Co-authored-by: default avatarGong Baitao <gongbaitao11@gmail.com>
      523ca4e0
  29. 10 Apr, 2023 2 commits
    • Sugawara's avatar
      add GPTNeoXForSequenceClassification (#22671) · 6daa9cb5
      Sugawara authored
      * add GPTNeoXForSequenceClassification
      
      * move the labels to logits.device (ref: #22561)
      
      * fix
      6daa9cb5
    • Joel Lamy-Poirier's avatar
      Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) (#22575) · e0921c6b
      Joel Lamy-Poirier authored
      
      
      * Add model with cli tool
      
      * Remove unwanted stuff
      
      * Add new code
      
      * Remove inference runner
      
      * Style
      
      * Fix checks
      
      * Test updates
      
      * make fixup
      
      * fix docs
      
      * fix doc
      
      * fix test
      
      * hopefully fix pipeline tests
      
      * refactor
      
      * fix CIs
      
      * add comment
      
      * rename to `GPTBigCodeForCausalLM`
      
      * correct readme
      
      * make fixup + docs
      
      * make fixup
      
      * fixes
      
      * fixes
      
      * Remove pruning
      
      * Remove import
      
      * Doc updates
      
      * More pruning removal
      
      * Combine copies
      
      * Single MQA implementation, remove kv cache pre-allocation and padding
      
      * Update doc
      
      * Revert refactor to match gpt2 style
      
      * Merge back key and value caches, fix some type hints
      
      * Update doc
      
      * Fix position ids pith padding (PR 21080)
      
      * Add conversion script temporarily
      
      * Update conversion script
      
      * Remove checkpoint conversion
      
      * New model
      
      * Fix MQA test
      
      * Fix copies
      
      * try fix tests
      
      * FIX TEST!!
      
      * remove  `DoubleHeadsModel`
      
      * add MQA tests
      
      * add slow tests
      
      * clean up
      
      * add CPU checker
      
      * final fixes
      
      * fixes
      
      - fix GPU issue
      - fixed slow tests
      - skip disk offload
      
      * fix final issue
      
      * Simplify and comment baddbmm fix
      
      * Remove unnecessary code
      
      * Transpose tweaks
      
      * Use beta=1 on cpu, improve tests
      
      ---------
      Co-authored-by: default avataryounesbelkada <younesbelkada@gmail.com>
      e0921c6b
  30. 03 Apr, 2023 1 commit
  31. 27 Mar, 2023 1 commit
    • Arthur's avatar
      [WIP]`NLLB-MoE` Adds the moe model (#22024) · 19ade242
      Arthur authored
      * Initial commit
      
      * update modeling code
      
      * update doc
      
      * add functions necessary
      
      * fix impotrs
      
      * revert changes
      
      * fixup
      
      * more styling to get going
      
      * remove standalone encoder
      
      * update code
      
      * styling
      
      * fix config and model
      
      * update code and some refactoring
      
      * make more tests pass
      
      * Adding NLLB-200 - MoE - 54.5B for no language left behind
      Fixes #21300
      
      * fix mor common tests
      
      * styke
      
      * update testing file
      
      * update
      
      * update
      
      * Router2 doc
      
      * update check config with sparse layer
      
      * add dummy router
      
      * update current conversion script
      
      * create on the fly conversion script
      
      * Fixup
      
      * style
      
      * style 2
      
      * fix empty return
      
      * fix return
      
      * Update default config sparse layers
      
      * easier to create sparse layers
      
      * update
      
      * update conversion script
      
      * update modeling
      
      * add to toctree
      
      * styling
      
      * make ruff happy
      
      * update docstring
      
      * update conversion script
      
      * update, will break tests but impelemting top2
      
      * update
      
      * local groups are supported here
      
      * ️ Support for local groups is now removed ️
      
      This is because it has to work with model parallelism that we do not support
      
      * finish simplificaiton
      
      * Fix forward
      
      * style
      
      * fixup
      
      * Update modelling and test, refactoring
      
      * update tests
      
      * remove final layer)norm as it is done in the FF
      
      * routing works! Logits test added
      
      * nit in test
      
      * remove top1router
      
      * style
      
      * make sure sparse are tested. Had to change route_tokens a liottle bit
      
      * add support for unslip models when converting
      
      * fixup
      
      * style
      
      * update test s
      
      * update test
      
      * REFACTOR
      
      * encoder outputs match!
      
      * style
      
      * update testing
      
      * 🎉encoder and decoder logits match 🎉
      
      
      
      * styleing
      
      * update tests
      
      * cleanup tests
      
      * fix router test and CIs
      
      * cleanup
      
      * cleanup test styling
      
      * fix tests
      
      * Finally the generation tests match!
      
      * cleanup
      
      * update test
      
      * style testing file
      
      * remove script
      
      * cleanup
      
      * more cleanup
      
      * nits
      
      * update
      
      * NLLB tokenizer is wrong and will be fixed soon
      
      * use LongTensors
      
      * update tests
      
      * revert some small changes
      
      * fix second expert sampling and batch prioritized routing
      
      * update tests
      
      * finish last tests
      
      * make ruff happy
      
      * update
      
      * ruff again
      
      * style
      
      * Update docs/source/en/model_doc/nllb-moe.mdx
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Updates based on review
      
      * style and fix import issue
      
      * nit
      
      * more nits
      
      * cleanup
      
      * styling
      
      * update test_seconde_expert_policy
      
      * fix name
      
      * last nit on the markdown examples
      
      ---------
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      19ade242
  32. 24 Mar, 2023 1 commit
    • Mitch Naylor's avatar
      Add Mega: Moving Average Equipped Gated Attention (#21766) · 57f25f4b
      Mitch Naylor authored
      
      
      * add mega file structure and plain pytorch version of mega source code
      
      * added config class with old naming conventions
      
      * filled in mega documentation
      
      * added config class and embeddings with optional token types
      
      * updated notes
      
      * starting the conversion process, deleted intermediate and added use_cache back to config
      
      * renamed config attributes in modeling_mega.py
      
      * checkpointing before refactoring incremental decoding functions
      
      * removed stateful incremental key/values for EMA and self-attention
      
      * refactored MovingAverageGatedAttention to remove stateful k/v history and use unified attention mask
      
      * MovingAverageGatedAttention works with incremental decoding + past values, added sequence length enforcement
      
      * more comments in MovingAverageGatedAttention + checkpointing before GatedCrossAttention
      
      * bug fix in attention mask handling in MovingAverageGatedAttention
      
      * removed incremental state from GatedCrossAttention and removed IncrementalState class
      
      * finished gated cross attention and got MegaLayer working
      
      * fixed causal masking in mega decoder
      
      * fixed how padding and causal masks are passed through MegaLayer with and without k/v caching
      
      * finished MegaModel; tested with encoder, decoder-only, and cross-attention type inputs; started work on downstream classes; removed mentions of position_ids
      
      * added optional dense hidden layer for masked and causal LM classes
      
      * docstring updates in MultiHeadEMA and GatedCrossAttention, removed unnecessary inputs in cross-attention
      
      * removed before_attn_fn in Mega class and updated docstrings and comments up to there
      
      * bug fix in MovingAverageGatedAttention masking
      
      * working conversion of MLM checkpoint in scratchpad script -- perfect matches
      
      * moved arg for hidden dense layer in LM head to config; discovered issue where from_pretrained is renaming gamma and beta parameters
      
      * renamed gamma and beta parameters to avoid HF renaming when loading from checkpoint
      
      * finished checkpoint conversion script
      
      * cleanup old class in mega config script
      
      * removed 'copied from' statements and passing integration tests
      
      * added num_attention_heads=1 to config for integration compatibility, decoder tests working, generation tests failing
      
      * fixed tuple output of megamodel
      
      * all common tests passing after fixing issues in decoder, gradient retention, and initialization
      
      * added mega-specific tests, ready for more documentation and style checks
      
      * updated docstrings; checkpoint before style fixes
      
      * style and quality checks, fixed initialization problem in float_tensor, ready for PR
      
      * added mega to toctree
      
      * removed unnecessary arg in megaconfig
      
      * removed unused arg and fixed code samples with leftover roberta models
      
      * Apply suggestions from code review
      
      Applied all suggestions except the one renaming a class, as I'll need to update that througout
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * fixed issue where .view breaks batch dimension, conversion script fixed with absolute imports, updated readme with Mega->MEGA
      
      * removed asserts in Mega code, renamed sequencenorm, gatedcrossattention, and NFFN, replaced get_activation_fn with ACTFN, and added sequencenorm to layer norms
      
      * reformatted .forward() docstrings to match style and removed unused mask input in cross-attention
      
      * removed all reset_parameters() methods and rolled into MegaPreTrainedModel._init_weights()
      
      * renamed all single-letter variables and improved readability in tensor size comments, Mega->MEGA in 2 documentation files
      
      * variable names in NFFN
      
      * manual Mega->MEGA changes in docs
      
      * Mega->MEGA in config auto
      
      * style and quality fixes
      
      * Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * renamed parameters and variables with confusing names, added copied from statements, moved fft conv to its own method, other cleanup from PR comments
      
      * commit before dealing with merge conflicts
      
      * made new attention activation functions available in ACT2FN and added generation test from OPT
      
      * style and quality in activations and tests
      
      * documentation fixes, renaming variables in dropout and rotary positions, used built-in causal masking, encoders->layers in MegaModel, moved comments into docstrings
      
      * style and quality fixes after latest updates, before rotary position ids
      
      * causal mask in MegaBlock docstring + added missing device passing
      
      * Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update README.md
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * added Mega prefixes where missing, reverted MegaSequenceNorm to if-else, other module renaming requested in PR
      
      * style and quality fixes + readme updates pointing to main
      
      ---------
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      57f25f4b
  33. 20 Mar, 2023 1 commit