• Wenhao Chen's avatar
    [moe]: fix ep/tp tests, add hierarchical all2all (#4982) · 72444127
    Wenhao Chen authored
    * fix: add warning for EP different behavior
    
    * fix: use shard_data in ep & tp model
    
    * to: add used_capacity
    
    * fix: fix router test
    
    * feat: add create_ep_node_group
    
    * feat: add create_ep_hierarchical_group fn
    
    * feat: add HierarchicalAllToAll
    
    * test: add hierarchical all2all test
    
    * fix: fix test errors
    
    * fix: simplify create_ep_hierarchical_group
    
    * fix: add hierarchical_alltoall arg
    
    * fix: fix environ typo
    
    * revert: revert process mesh order
    
    * to: add todo mark
    
    * fix: skip hierarchical_comm if torch < 1.13.1
    72444127
train.py 13.6 KB