1. 24 Mar, 2023 4 commits
    • Mitch Naylor's avatar
      Add Mega: Moving Average Equipped Gated Attention (#21766) · 57f25f4b
      Mitch Naylor authored
      * add mega file structure and plain pytorch version of mega source code
      
      * added config class with old naming conventions
      
      * filled in mega documentation
      
      * added config class and embeddings with optional token types
      
      * updated notes
      
      * starting the conversion process, deleted intermediate and added use_cache back to config
      
      * renamed config attributes in modeling_mega.py
      
      * checkpointing before refactoring incremental decoding functions
      
      * removed stateful incremental key/values for EMA and self-attention
      
      * refactored MovingAverageGatedAttention to remove stateful k/v history and use unified attention mask
      
      * MovingAverageGatedAttention works with incremental decoding + past values, added sequence length enforcement
      
      * more comments in MovingAverageGatedAttention + checkpointing before GatedCrossAttention
      
      * bug fix in attention mask handling in MovingAverageGatedAttention
      
      * removed incremental state from GatedCrossAtt...
      57f25f4b
    • Joao Gante's avatar
      0fa46524
    • Ashwin Mathur's avatar
      Fix typo in Greedy Search Description (#22345) · b7960765
      Ashwin Mathur authored
      Fix typo in greedy search docs
      b7960765
    • James Reed's avatar
      [HFTracer] Make embeddings ops take on the dtype of the weight (#22347) · c0fa2aa0
      James Reed authored
      * [HFTracer] Make embeddings ops take on the dtype of the weight
      
      * fix bug
      c0fa2aa0
  2. 23 Mar, 2023 13 commits
  3. 22 Mar, 2023 16 commits
  4. 21 Mar, 2023 7 commits