supporting different hidden dimensions (#559)
* supporting different hidden dimensions
* add support for larger hidden dimensions (greater than 8K)
* remove empty line
* add loop unrolling factor for dropout kernels
* update different kernels based on the reviews
Co-authored-by:
Jeff Rasley <jerasley@microsoft.com>
Showing
csrc/includes/custom_cuda_layers.h
100644 → 100755
This diff is collapsed.
This diff is collapsed.
Please register or sign in to comment