• Bin Jia's avatar
    [shardformer/sequence parallel] Cherry pick commit to new branch (#4450) · 424629fe
    Bin Jia authored
    * [shardformer/sequence parallel] Support sequence parallel for gpt2 (#4384)
    
    * [sequence parallel] add sequence parallel linear col/row support (#4336)
    
    * add sequence parallel linear col/row support
    
    * add annotation
    
    * add annotation
    
    * add support for gpt2 fused qkv linear layer
    
    * support sequence parallel in GPT2
    
    * add docstring and note
    
    * add requirments
    
    * remove unused flash-attb
    
    * modify flash attn test
    
    * modify flash attn setting
    
    * modify flash attn code
    
    * add assert before divide, rename forward function
    
    * [shardformer/test] fix gpt2 test with seq-parallel
    
    * [shardformer/sequence parallel] Overlap input gather and grad computation during col backward (#4401)
    
    * overlap gather input / grad computing during col backward
    
    * modify test for overlap
    
    * simplify code
    
    * fix code and modify cuda stream synchronize
    
    * [shardformer/sequence parallel] polish code
    424629fe
_utils.py 10.8 KB