• Hongxin Liu's avatar
    [moe] support mixtral (#5309) · da39d21b
    Hongxin Liu authored
    * [moe] add mixtral block for single expert
    
    * [moe] mixtral block fwd support uneven ep
    
    * [moe] mixtral block bwd support uneven ep
    
    * [moe] add mixtral moe layer
    
    * [moe] simplify replace
    
    * [meo] support save sharded mixtral
    
    * [meo] support load sharded mixtral
    
    * [meo] support save sharded optim
    
    * [meo] integrate moe manager into plug
    
    * [meo] fix optimizer load
    
    * [meo] fix mixtral layer
    da39d21b
mixtral_policy.py 22.4 KB