1. 27 Mar, 2024 5 commits
    • Lysandre Debut's avatar
      Reimplement "Automatic safetensors conversion when lacking these files" (#29846) · 4d8427f7
      Lysandre Debut authored
      * Automatic safetensors conversion when lacking these files (#29390)
      
      * Automatic safetensors conversion when lacking these files
      
      * Remove debug
      
      * Thread name
      
      * Typo
      
      * Ensure that raises do not affect the main thread
      
      * Catch all errors
      4d8427f7
    • Hovnatan Karapetyan's avatar
      Fix 29807, sinusoidal positional encodings overwritten by post_init() (#29813) · a81cf9ee
      Hovnatan Karapetyan authored
      * Check for requires_grad when initing weights
      
      * Add unit test
      
      * Move sinusoidal positional encoding generation after post_init()
      
      * Add modules to skip init list
      
      * Move create_sinusoidal_embeddings to _init_weights
      a81cf9ee
    • Anton Vlasjuk's avatar
      Mamba `slow_forward` gradient fix (#29563) · cefb819f
      Anton Vlasjuk authored
      * FIX: Cached slow forward in mamba
      - additionally added mamba cached test
      - added unused test (mamba causal lm forward and backward)
      - fixed typo: "causl" --> "causal"
      
      * formatting
      
      * fix: use real `slow_forward` call instead of torch module's
      
      * add shape assertion for mixer block test
      
      * adjust shape assertion
      cefb819f
    • Bo Zheng's avatar
      Add Qwen2MoE (#29377) · 1c39974a
      Bo Zheng authored
      
      
      * add support for qwen2 MoE models
      
      * update docs
      
      * add support for qwen2 MoE models
      
      * update docs
      
      * update model name & test
      
      * update readme
      
      * update class names & readme & model_doc of Qwen2MoE.
      
      * update architecture name
      
      * fix qwen2_moe tests
      
      * use Qwen2Tokenizer instead of Qwen2MoeTokenizer
      
      * update modeling_qwen2_moe.py
      
      * fix model architecture
      
      * fix qwen2_moe tests
      
      * use Qwen2Tokenizer instead of Qwen2MoeTokenizer
      
      * update modeling_qwen2_moe.py
      
      * fix model architecture
      
      * fix style
      
      * fix test when there are sparse and non sparse layers
      
      * fixup
      
      * Update README.md
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * fixup
      
      * fixup
      
      * add archive back
      
      * add support for qwen2 MoE models
      
      * update docs
      
      * update model name & test
      
      * update readme
      
      * update class names & readme & model_doc of Qwen2MoE.
      
      * update architecture name
      
      * fix qwen2_moe tests
      
      * use Qwen2Tokenizer instead of Qwen2MoeTokenizer
      
      * update modeling_qwen2_moe.py
      
      * fix model architecture
      
      * fixup
      
      * fix qwen2_moe tests
      
      * use Qwen2Tokenizer instead of Qwen2MoeTokenizer
      
      * fix style
      
      * fix test when there are sparse and non sparse layers
      
      * fixup
      
      * add archive back
      
      * fix integration test
      
      * fixup
      
      ---------
      Co-authored-by: default avatarbozheng-hit <dsoul0621@gmail.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      1c39974a
    • Benjamin Minixhofer's avatar
      Support `num_attention_heads` != `num_key_value_heads` in Flax Llama Implementation (#29557) · 8e08acad
      Benjamin Minixhofer authored
      * fix tinyllama flax modelling
      
      * rename vars to minimize changes
      
      * move
      
      * formatting
      
      * remove unused var
      8e08acad
  2. 26 Mar, 2024 8 commits
  3. 25 Mar, 2024 6 commits
  4. 24 Mar, 2024 1 commit
    • gamepad_coder's avatar
      model_summary.md - Restore link to Harvard's Annotated Transformer. (#29702) · 76a33a10
      gamepad_coder authored
      * model_summary.md - Add link to Harvard's Annotated Transformer.
      
      * model_summary.md - slight wording change + capitalize name of the paper
      
      * model_summary.md - moves the Annotated Transformer link in a praenthesis next to the link to the original paper (great idea, stevhliu!)
      
      * model_summary.md - moves the Annotated Transformer link in a praenthesis next to the link to the original paper (commit pt. 2, accidentally removed "has" in pt. 1)
      76a33a10
  5. 23 Mar, 2024 1 commit
  6. 22 Mar, 2024 10 commits
  7. 21 Mar, 2024 9 commits