[shardformer/sequence parallel] support gpt2 seq parallel with pp/dp/tp (#4460)
* support gpt2 seq parallel with pp/dp/tp * fix a bug when waiting for stream done * delete unused gpt2_seq file
Showing
Please register or sign in to comment
* support gpt2 seq parallel with pp/dp/tp * fix a bug when waiting for stream done * delete unused gpt2_seq file