Adding Mish activation function (#1938)
* Adding Mish activation function
* Bug fixed
* Added test for Mish
* Removed unwanted comments
* Simplified calculation and removed comments
* Kernel added and gradient computation simplified
* Gradient simplified
* Corrected gradient calculations
* Compute output when input greater than 8
* Minor correction
* Remove unnecessary pgrad for Mish
* Removed CUDNN calls
* Add standalone CUDA implementation of the Mish activation function
* Fix in-place gradient in the CUDA version; refactor a little
* Swap delta and omega
* Need to have src (=x) (and not dest) available for Mish
* Add test case that makes sure that cuda::mish and cpu::mish return the same results
* Minor tweaking to keep the previous behaviour
Co-authored-by:
Juha Reunanen <juha.reunanen@tomaattinen.com>
Showing
Please register or sign in to comment