[Common] Added JIT-compiled fused cast transpose kernels (#903)
* Merged CT+dbias+dact into a single template Signed-off-by:Oleg Goncharov <ogoncharov@nvidia.com> * Moved gated activations ifrom the cast_transpose_fused ito a sseparate cpp file Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Code clean up Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Code clean up Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Code clean up Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Code clean up Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Update transformer_engine/common/transpose/cast_transpose_fusion.cu Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> * Update transformer_engine/common/transpose/cast_transpose_fusion.cu Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> * Reverted the change with the file split Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Implemented JIT compiled kernels Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Code clean up Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Replaced aligned statically compiled kernels with JIT kernels. Added support of various activations functions for JIT kernels. Cleaned up the code per the code review Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Code clean up Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> --------- Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> Signed-off-by:
Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
Showing
This diff is collapsed.
Please register or sign in to comment