"test/vscode:/vscode.git/clone" did not exist on "2205cc9e1ab15eee51ae5c6c54c258de8afefe26"
Optimizing fp8_fp16 mixedprec gemm (#1150)
* add delayed cvt * extend fp16 gemm_splitk instances for fp8_fp16 gemm * add f8 example * add 128 kperblk instances for fp8 * add kpb128 instance * added more instances into kpb128 * clean code * clean code * fix * fix * fixed * Update example/35_splitK_gemm/splitK_gemm_xdl_fp16_fp8.cpp Co-authored-by:Bartłomiej Kocot <barkocot@amd.com> * Update include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer.hpp Co-authored-by:
Bartłomiej Kocot <barkocot@amd.com> * Update library/src/tensor_operation_instance/gpu/gemm_splitk/device_gemm_xdl_splitk_f16_fp8_f16_mk_nk_mn_kpb128_instance.cpp Co-authored-by:
Bartłomiej Kocot <barkocot@amd.com> --------- Co-authored-by:
Jing Zhang <jizha@amd.com> Co-authored-by:
Bartłomiej Kocot <barkocot@amd.com>
Showing
Please register or sign in to comment