1. 17 Nov, 2023 1 commit
  2. 20 Jun, 2023 1 commit
  3. 25 Apr, 2023 1 commit
  4. 24 Apr, 2023 1 commit
  5. 24 Mar, 2023 1 commit
    • Mitch Naylor's avatar
      Add Mega: Moving Average Equipped Gated Attention (#21766) · 57f25f4b
      Mitch Naylor authored
      
      
      * add mega file structure and plain pytorch version of mega source code
      
      * added config class with old naming conventions
      
      * filled in mega documentation
      
      * added config class and embeddings with optional token types
      
      * updated notes
      
      * starting the conversion process, deleted intermediate and added use_cache back to config
      
      * renamed config attributes in modeling_mega.py
      
      * checkpointing before refactoring incremental decoding functions
      
      * removed stateful incremental key/values for EMA and self-attention
      
      * refactored MovingAverageGatedAttention to remove stateful k/v history and use unified attention mask
      
      * MovingAverageGatedAttention works with incremental decoding + past values, added sequence length enforcement
      
      * more comments in MovingAverageGatedAttention + checkpointing before GatedCrossAttention
      
      * bug fix in attention mask handling in MovingAverageGatedAttention
      
      * removed incremental state from GatedCrossAttention and removed IncrementalState class
      
      * finished gated cross attention and got MegaLayer working
      
      * fixed causal masking in mega decoder
      
      * fixed how padding and causal masks are passed through MegaLayer with and without k/v caching
      
      * finished MegaModel; tested with encoder, decoder-only, and cross-attention type inputs; started work on downstream classes; removed mentions of position_ids
      
      * added optional dense hidden layer for masked and causal LM classes
      
      * docstring updates in MultiHeadEMA and GatedCrossAttention, removed unnecessary inputs in cross-attention
      
      * removed before_attn_fn in Mega class and updated docstrings and comments up to there
      
      * bug fix in MovingAverageGatedAttention masking
      
      * working conversion of MLM checkpoint in scratchpad script -- perfect matches
      
      * moved arg for hidden dense layer in LM head to config; discovered issue where from_pretrained is renaming gamma and beta parameters
      
      * renamed gamma and beta parameters to avoid HF renaming when loading from checkpoint
      
      * finished checkpoint conversion script
      
      * cleanup old class in mega config script
      
      * removed 'copied from' statements and passing integration tests
      
      * added num_attention_heads=1 to config for integration compatibility, decoder tests working, generation tests failing
      
      * fixed tuple output of megamodel
      
      * all common tests passing after fixing issues in decoder, gradient retention, and initialization
      
      * added mega-specific tests, ready for more documentation and style checks
      
      * updated docstrings; checkpoint before style fixes
      
      * style and quality checks, fixed initialization problem in float_tensor, ready for PR
      
      * added mega to toctree
      
      * removed unnecessary arg in megaconfig
      
      * removed unused arg and fixed code samples with leftover roberta models
      
      * Apply suggestions from code review
      
      Applied all suggestions except the one renaming a class, as I'll need to update that througout
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * fixed issue where .view breaks batch dimension, conversion script fixed with absolute imports, updated readme with Mega->MEGA
      
      * removed asserts in Mega code, renamed sequencenorm, gatedcrossattention, and NFFN, replaced get_activation_fn with ACTFN, and added sequencenorm to layer norms
      
      * reformatted .forward() docstrings to match style and removed unused mask input in cross-attention
      
      * removed all reset_parameters() methods and rolled into MegaPreTrainedModel._init_weights()
      
      * renamed all single-letter variables and improved readability in tensor size comments, Mega->MEGA in 2 documentation files
      
      * variable names in NFFN
      
      * manual Mega->MEGA changes in docs
      
      * Mega->MEGA in config auto
      
      * style and quality fixes
      
      * Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * renamed parameters and variables with confusing names, added copied from statements, moved fft conv to its own method, other cleanup from PR comments
      
      * commit before dealing with merge conflicts
      
      * made new attention activation functions available in ACT2FN and added generation test from OPT
      
      * style and quality in activations and tests
      
      * documentation fixes, renaming variables in dropout and rotary positions, used built-in causal masking, encoders->layers in MegaModel, moved comments into docstrings
      
      * style and quality fixes after latest updates, before rotary position ids
      
      * causal mask in MegaBlock docstring + added missing device passing
      
      * Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update README.md
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * added Mega prefixes where missing, reverted MegaSequenceNorm to if-else, other module renaming requested in PR
      
      * style and quality fixes + readme updates pointing to main
      
      ---------
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      57f25f4b
  6. 06 Mar, 2023 1 commit
  7. 27 Feb, 2023 1 commit
  8. 10 Feb, 2023 1 commit
    • Jannis Vamvas's avatar
      Add X-MOD (#20939) · b0d539cc
      Jannis Vamvas authored
      
      
      * Add X-MOD to Readme
      
      * Add documentation for X-MOD
      
      * Implement X-MOD
      
      * Fix formatting of X-MOD docs
      
      * Change signature of X-MOD forward methods to use lang_ids
      
      * Minor changes
      
      * Rebase with main and run make fix-copies
      
      * Make suggested changes to docstrings
      
      * Improve code readability
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      
      * Fix code style
      
      * Conversion script: Remove asserts and type annotations
      
      * Remove _TOKENIZER_FOR_DOC
      
      * XMOD -> Xmod
      
      * Update copyright note
      
      * Fix doctests
      
      * Fix docstring
      
      * Add integration test for FillMaskPipeline
      
      * Revert "Add integration test for FillMaskPipeline"
      
      This reverts commit 4381eb3b1d0f5d85785f89caba83928e6efa6d1f.
      
      * Add end-to-end integration test for mask fill
      
      * make style
      
      * Rebase with main and make fix-copies
      
      ---------
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      b0d539cc
  9. 02 Feb, 2023 1 commit
  10. 27 Jan, 2023 1 commit
    • Maria Khalusova's avatar
      Automated compatible models list for task guides (#21338) · 73a2ff69
      Maria Khalusova authored
      * initial commit. added tip placeholders and a script
      
      * removed unused imports, fixed paths
      
      * fixed generated links
      
      * make style
      
      * split language modeling doc into two: causal language modeling and masked language modeling
      
      * added check_task_guides.py to make fix-copies
      
      * review feedback addressed
      73a2ff69
  11. 21 Nov, 2022 1 commit
    • Steven Liu's avatar
      Add inference section to task guides (#18781) · d896029e
      Steven Liu authored
      * 📝 start adding inference section to task guides
      
      *  make style
      
      * 📝 add multiple choice
      
      * add rest of inference sections
      
      * make style
      
      * add compute_metric, push_to_hub, pipeline
      
      * make style
      
      * add updated sequence and token classification
      
      * make style
      
      * make edits in token classification
      
      * add audio classification
      
      * make style
      
      * add asr
      
      * make style
      
      * add image classification
      
      * make style
      
      * add summarization
      
      * make style
      
      * add translation
      
      * make style
      
      * add multiple choice
      
      * add language modeling
      
      * add qa
      
      * make style
      
      * review and edits
      
      * apply reviews
      
      * make style
      
      * fix call to processor
      
      * apply audio reviews
      
      * update to better asr model
      
      * make style
      d896029e
  12. 07 Sep, 2022 1 commit
  13. 06 Jul, 2022 1 commit
  14. 28 Jun, 2022 1 commit
  15. 04 Apr, 2022 1 commit
  16. 25 Mar, 2022 1 commit
  17. 22 Mar, 2022 1 commit
  18. 18 Mar, 2022 1 commit
  19. 15 Mar, 2022 1 commit
  20. 23 Feb, 2022 1 commit