[Model] Support Multi-GPU for Transformer model (#356)
* multi-process version of transformer * lots of fix * fix bugs and accum gradients for multiple batches * many fixes * minor * upd * set torch device * fix bugs * fix and minor * comments and clean up * uncomment viz code
Showing
Please register or sign in to comment