Improve performance running on multiple GPUs (#3347)
* Use multiple streams to broadcast positions
* Use multiple streams to reduce forces
* Adds sync between default stream and peer-copy
* Minor cleanup
Co-authored-by:
David Clark <daclark@nvidia.com>
Showing
Please register or sign in to comment