Tkurth/distributed disco (#30)
* initial implementation of distributed DISCO layer
* working distributed convolution
* working refactored serial conv transpose with torch kernel
* working distributed conv and transposed conv when using the python kernel
* working distributed convolution with torch kernel
* fixed triton kernel tests
* adding print statement to debug CI
* adjusting tolerances in local convolution unittest
---------
Co-authored-by:
Boris Bonev <bbonev@nvidia.com>
Showing
Please register or sign in to comment