Unverified Commit 2af8f32a authored by zjing14's avatar zjing14 Committed by GitHub
Browse files

Update include/ck/tensor_operation/gpu/element/unary_element_wise_operation.hpp


Co-authored-by: default avatarAdam Osewski <19374865+aosewski@users.noreply.github.com>
parent 13515923
...@@ -12,6 +12,9 @@ ...@@ -12,6 +12,9 @@
namespace ck { namespace ck {
// Fast int4x4 to half8_t data type conversion based on paper
// [Who Says Elephants Can't Run: Bringing Large Scale MoE Models into Cloud Scale Production]
// (https://arxiv.org/abs/2211.10017) and implementation:
// https://github.com/NVIDIA/FasterTransformer/blob/main/src/fastertransformer/cutlass_extensions/include/cutlass_extensions/interleaved_numeric_conversion.h // https://github.com/NVIDIA/FasterTransformer/blob/main/src/fastertransformer/cutlass_extensions/include/cutlass_extensions/interleaved_numeric_conversion.h
__host__ __device__ inline half4_t pki4_to_half4(int q) __host__ __device__ inline half4_t pki4_to_half4(int q)
{ {
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment