1. 07 Feb, 2024 2 commits
    • Hongxin Liu's avatar
      [moe] support mixtral (#5309) · da39d21b
      Hongxin Liu authored
      * [moe] add mixtral block for single expert
      
      * [moe] mixtral block fwd support uneven ep
      
      * [moe] mixtral block bwd support uneven ep
      
      * [moe] add mixtral moe layer
      
      * [moe] simplify replace
      
      * [meo] support save sharded mixtral
      
      * [meo] support load sharded mixtral
      
      * [meo] support save sharded optim
      
      * [meo] integrate moe manager into plug
      
      * [meo] fix optimizer load
      
      * [meo] fix mixtral layer
      da39d21b
    • Xuanlei Zhao's avatar
      [moe] init mixtral impl · 7d8e0338
      Xuanlei Zhao authored
      7d8e0338
  2. 09 Jan, 2024 1 commit
  3. 09 Nov, 2023 1 commit
    • Wenhao Chen's avatar
      [moe]: fix ep/tp tests, add hierarchical all2all (#4982) · 72444127
      Wenhao Chen authored
      * fix: add warning for EP different behavior
      
      * fix: use shard_data in ep & tp model
      
      * to: add used_capacity
      
      * fix: fix router test
      
      * feat: add create_ep_node_group
      
      * feat: add create_ep_hierarchical_group fn
      
      * feat: add HierarchicalAllToAll
      
      * test: add hierarchical all2all test
      
      * fix: fix test errors
      
      * fix: simplify create_ep_hierarchical_group
      
      * fix: add hierarchical_alltoall arg
      
      * fix: fix environ typo
      
      * revert: revert process mesh order
      
      * to: add todo mark
      
      * fix: skip hierarchical_comm if torch < 1.13.1
      72444127
  4. 08 Nov, 2023 1 commit
    • Xuanlei Zhao's avatar
      [moe] support optimizer checkpoint (#5015) · f71e63b0
      Xuanlei Zhao authored
      * Refactor MoE Manager setup method
      
      * unshard optim ckpt
      
      * optim io
      
      * update transformer version
      
      * update requirements
      
      * update ckpt
      
      * update ckpt
      
      * update ckpt
      
      * fix engine
      
      * fix engine
      f71e63b0
  5. 02 Nov, 2023 1 commit