Unverified Commit abf4bdb9 authored by Adam Osewski's avatar Adam Osewski Committed by GitHub
Browse files

Common forward convolution utility refactor. (#141)



* Convolution ND

* Code unification across dimensions for generating tensor descriptors.
* Example
* Instances

* Move convnd f32 instance file to comply with repo structure.

* Conv 1D tensor layouts.

* Formatting and use ReferenceConv

* Reference ConvFwd supporting 1D and 2D convolution.

* Debug printing TensorLayout name.

* Conv fwd 1D instance f32

* Refactor conv ND example.

Needed to support various conv dimensio.

Needed to support various conv dimensions

* Rename conv nd example director to prevent conflicts.

* Refactor some common utility to single file.

Plus some tests.

* Refactor GetHostTensorDescriptor + UT.

* Add 1D test case.

* Test reference convolution 1d/2d

* Remove some leftovers.

* Fix convolution example error for 1D

* Refactor test check errors utility function.

* Test Conv2D Fwd XDL

* More UT for 1D case.

* Parameterize input & weight initializers.

* Rename example to prevent conflicts.

* Split convnd instance into separate files for 1d/2d

* Address review comments.

* Fix data type for flops/gbytes calculations.

* Assign example number 11.

* 3D cases for convolution utility functions.

* 3D reference convolution.

* Add support for 3D convolution.

* Check for inputs bigger than  2GB.

* Formatting

* Support for bf16/f16/f32/i8 - conv instances + UT.

* Use check_err from test_util.hpp.

* Split convnd test into separate files for each dim.

* Fix data generation and use proper instances.

* Formatting

* Skip tensor initialization if not necessary.

* Fix CMakefiles.

* Remove redundant conv2d_fwd test.

* Lower problem size for conv3D UT.

* 3D case for convnd example.

* Remove leftovers after merge.

* Add Conv Specialization string to GetTypeString

* Skip instance causing numerical errors.

* Small fixes.

* Remove redundant includes.

* Fix namespace name error.

* Script for automatic testing and logging convolution fwd UTs

* Comment out numactl cmd.

* Refine weights initalization and relax rtol for fp16

* Move test_util.hpp to check_err.hpp

* Refine weights initalization and relax rtol for fp16

* Refactor common part of test conv utils.

* Move utility function to single common place.

* Add additional common functions to utility.

* Refactor convnd_fwd_xdl examples.

* Remove redundant files.
* Unify structure.

* Add constructor to ConvParams.

* And add input parameters validation.

* Modify conv examples to use single utility file.

* Remove check_error from host_tensor.hpp

* Get rid of check_indices function.

* Remove bf16_to_f32 function overload for scalars.

* Fix namespace.

* Add half_float::half for check_err.

* Fix conv params size in UT.

* Fix weights initialization for int8.

* Fix weights initialization for int8.

* Add type_convert when store output in ref conv 1D.

* Get back old conv2d_fwd_xdl operation.

* Silence conv debug print.

* format

* clean

* clean

* Fix merge.

* Fix namespace for check_err

* Formatting.

* Fix merge artifacts.

* Remove deleted header.

* Fix some includes and use ck::utils::check_err.

* Remove unused check_indices restored by previous merge.

* Fix namespaces after merge.

* Fix compilation error.

* Small fixes.

* Use common functions.
* Fix filename
* Fix namespaces.

* Fix merge artifact - retrieve removed by accident fun.

* Fix ConvForwardSpecialization.

* Adhere to coding style rules.

* Fix merge artifacts.
Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
parent 6717168c
...@@ -4,6 +4,8 @@ ...@@ -4,6 +4,8 @@
#include <cstdlib> #include <cstdlib>
#include <stdlib.h> #include <stdlib.h>
#include <half.hpp> #include <half.hpp>
#include "check_err.hpp"
#include "config.hpp" #include "config.hpp"
#include "print.hpp" #include "print.hpp"
#include "device.hpp" #include "device.hpp"
...@@ -284,6 +286,6 @@ int main(int argc, char* argv[]) ...@@ -284,6 +286,6 @@ int main(int argc, char* argv[])
LogRangeAsType<float>(std::cout << "wei_host : ", wei_k_c_y_x_host_result.mData, ",") LogRangeAsType<float>(std::cout << "wei_host : ", wei_k_c_y_x_host_result.mData, ",")
<< std::endl; << std::endl;
} }
check_error(wei_k_c_y_x_host_result, wei_k_c_y_x_device_result); ck::utils::check_err(wei_k_c_y_x_device_result.mData, wei_k_c_y_x_host_result.mData);
} }
} }
...@@ -4,6 +4,8 @@ ...@@ -4,6 +4,8 @@
#include <cstdlib> #include <cstdlib>
#include <getopt.h> #include <getopt.h>
#include <half.hpp> #include <half.hpp>
#include "check_err.hpp"
#include "config.hpp" #include "config.hpp"
#include "print.hpp" #include "print.hpp"
#include "device.hpp" #include "device.hpp"
...@@ -371,12 +373,13 @@ int main(int argc, char* argv[]) ...@@ -371,12 +373,13 @@ int main(int argc, char* argv[])
if(args.do_verification) if(args.do_verification)
{ {
out_dev.FromDevice(out.mData.data()); out_dev.FromDevice(out.mData.data());
check_error(out_ref, out); ck::utils::check_err(out.mData, out_ref.mData);
if(NeedIndices) if(NeedIndices)
{ {
out_indices_dev.FromDevice(out_indices.mData.data()); out_indices_dev.FromDevice(out_indices.mData.data());
check_indices(out_indices_ref, out_indices); ck::utils::check_err(out_indices.mData, out_indices_ref.mData);
;
}; };
}; };
} }
...@@ -3,6 +3,8 @@ ...@@ -3,6 +3,8 @@
#include <initializer_list> #include <initializer_list>
#include <cstdlib> #include <cstdlib>
#include <stdlib.h> #include <stdlib.h>
#include "check_err.hpp"
#include "config.hpp" #include "config.hpp"
#include "print.hpp" #include "print.hpp"
#include "device.hpp" #include "device.hpp"
...@@ -300,13 +302,14 @@ int main(int argc, char* argv[]) ...@@ -300,13 +302,14 @@ int main(int argc, char* argv[])
out_device_buf.FromDevice(out_n_c_ho_wo_device.mData.data()); out_device_buf.FromDevice(out_n_c_ho_wo_device.mData.data());
check_error(out_n_c_ho_wo_host, out_n_c_ho_wo_device); ck::utils::check_err(out_n_c_ho_wo_device.mData, out_n_c_ho_wo_host.mData);
if constexpr(NeedIndices) if constexpr(NeedIndices)
{ {
out_indices_device_buf.FromDevice(out_indices_n_c_ho_wo_device.mData.data()); out_indices_device_buf.FromDevice(out_indices_n_c_ho_wo_device.mData.data());
// check_indices(out_indices_n_c_ho_wo_host, out_indices_n_c_ho_wo_device); // ck::utils::check_err(out_indices_n_c_ho_wo_device.mData,
// out_indices_n_c_ho_wo_host.mData);;
}; };
} }
} }
...@@ -4,6 +4,8 @@ ...@@ -4,6 +4,8 @@
#include <cstdlib> #include <cstdlib>
#include <stdlib.h> #include <stdlib.h>
#include <half.hpp> #include <half.hpp>
#include "check_err.hpp"
#include "config.hpp" #include "config.hpp"
#include "print.hpp" #include "print.hpp"
#include "device.hpp" #include "device.hpp"
...@@ -225,7 +227,7 @@ int main(int argc, char* argv[]) ...@@ -225,7 +227,7 @@ int main(int argc, char* argv[])
ref_invoker.Run(ref_argument); ref_invoker.Run(ref_argument);
check_error(c_m_n_host_result, c_m_n_device_result); ck::utils::check_err(c_m_n_device_result.mData, c_m_n_host_result.mData);
} }
return 0; return 0;
......
...@@ -4,6 +4,8 @@ ...@@ -4,6 +4,8 @@
#include <cstdlib> #include <cstdlib>
#include <stdlib.h> #include <stdlib.h>
#include <half.hpp> #include <half.hpp>
#include "check_err.hpp"
#include "config.hpp" #include "config.hpp"
#include "print.hpp" #include "print.hpp"
#include "device.hpp" #include "device.hpp"
...@@ -225,8 +227,7 @@ int main(int argc, char* argv[]) ...@@ -225,8 +227,7 @@ int main(int argc, char* argv[])
c_element_op); c_element_op);
ref_invoker.Run(ref_argument); ref_invoker.Run(ref_argument);
ck::utils::check_err(c_device_tensors[i].mData, c_host_tensors[i].mData);
check_error(c_host_tensors[i], c_device_tensors[i]);
} }
} }
......
...@@ -6,7 +6,7 @@ ...@@ -6,7 +6,7 @@
#include <half.hpp> #include <half.hpp>
#include "config.hpp" #include "config.hpp"
#include "conv_utils.hpp" #include "conv_fwd_util.hpp"
#include "print.hpp" #include "print.hpp"
#include "device.hpp" #include "device.hpp"
#include "host_tensor.hpp" #include "host_tensor.hpp"
...@@ -99,10 +99,10 @@ void print_use_msg() ...@@ -99,10 +99,10 @@ void print_use_msg()
<< " <right padding>, (ie RightPy, RightPx for 2D)\n" << " <right padding>, (ie RightPy, RightPx for 2D)\n"
<< std::endl; << std::endl;
} }
ck::conv_util::ConvParams parse_conv_params(int num_dim_spatial, char* argv[]) ck::utils::conv::ConvParams parse_conv_params(int num_dim_spatial, char* argv[])
{ {
// (N, K, C) + num_dim_spatial * 6 (filter, input, strides, dilations, pad left, pad right) // (N, K, C) + num_dim_spatial * 6 (filter, input, strides, dilations, pad left, pad right)
ck::conv_util::ConvParams params; ck::utils::conv::ConvParams params;
int arg_idx = 5; int arg_idx = 5;
params.num_dim_spatial = num_dim_spatial; params.num_dim_spatial = num_dim_spatial;
...@@ -144,72 +144,6 @@ ck::conv_util::ConvParams parse_conv_params(int num_dim_spatial, char* argv[]) ...@@ -144,72 +144,6 @@ ck::conv_util::ConvParams parse_conv_params(int num_dim_spatial, char* argv[])
return params; return params;
} }
HostTensorDescriptor get_input_host_tensor_descriptor(const std::vector<std::size_t>& dims,
int num_dim_spatial = 2)
{
namespace tl = ck::tensor_layout::convolution;
switch(num_dim_spatial)
{
case 3: {
return ck::conv_util::GetHostTensorDescriptor(dims, tl::NDHWC{});
}
case 2: {
return ck::conv_util::GetHostTensorDescriptor(dims, tl::NHWC{});
}
case 1: {
return ck::conv_util::GetHostTensorDescriptor(dims, tl::NWC{});
}
default: {
throw std::runtime_error("Unsupported number of spatial dimensions provided!");
}
}
}
HostTensorDescriptor get_filters_host_tensor_descriptor(const std::vector<std::size_t>& dims,
int num_dim_spatial = 2)
{
namespace tl = ck::tensor_layout::convolution;
switch(num_dim_spatial)
{
case 3: {
return ck::conv_util::GetHostTensorDescriptor(dims, tl::KZYXC{});
}
case 2: {
return ck::conv_util::GetHostTensorDescriptor(dims, tl::KYXC{});
}
case 1: {
return ck::conv_util::GetHostTensorDescriptor(dims, tl::KXC{});
}
default: {
throw std::runtime_error("Unsupported number of spatial dimensions provided!");
}
}
}
HostTensorDescriptor get_output_host_tensor_descriptor(const std::vector<std::size_t>& dims,
int num_dim_spatial = 2)
{
namespace tl = ck::tensor_layout::convolution;
switch(num_dim_spatial)
{
case 3: {
return ck::conv_util::GetHostTensorDescriptor(dims, tl::NDHWK{});
}
case 2: {
return ck::conv_util::GetHostTensorDescriptor(dims, tl::NHWK{});
}
case 1: {
return ck::conv_util::GetHostTensorDescriptor(dims, tl::NWK{});
}
default: {
throw std::runtime_error("Unsupported number of spatial dimensions provided!");
}
}
}
DeviceConvBwdDataBasePtr get_conv_instance(int num_dim_spatial) DeviceConvBwdDataBasePtr get_conv_instance(int num_dim_spatial)
{ {
switch(num_dim_spatial) switch(num_dim_spatial)
...@@ -236,7 +170,7 @@ int main(int argc, char* argv[]) ...@@ -236,7 +170,7 @@ int main(int argc, char* argv[])
int nrepeat = 5; int nrepeat = 5;
int num_dim_spatial = 2; int num_dim_spatial = 2;
ck::conv_util::ConvParams params; ck::utils::conv::ConvParams params;
params.C = 128; params.C = 128;
if(argc == 4) if(argc == 4)
...@@ -288,13 +222,13 @@ int main(int argc, char* argv[]) ...@@ -288,13 +222,13 @@ int main(int argc, char* argv[])
std::end(output_spatial_lengths)); std::end(output_spatial_lengths));
Tensor<InDataType> in_n_c_hi_wi_host_result( Tensor<InDataType> in_n_c_hi_wi_host_result(
get_input_host_tensor_descriptor(input_dims, num_dim_spatial)); ck::utils::conv::get_input_host_tensor_descriptor(input_dims, num_dim_spatial));
Tensor<InDataType> in_n_c_hi_wi_device_result( Tensor<InDataType> in_n_c_hi_wi_device_result(
get_input_host_tensor_descriptor(input_dims, num_dim_spatial)); ck::utils::conv::get_input_host_tensor_descriptor(input_dims, num_dim_spatial));
Tensor<WeiDataType> wei_k_c_y_x( Tensor<WeiDataType> wei_k_c_y_x(
get_filters_host_tensor_descriptor(filter_dims, num_dim_spatial)); ck::utils::conv::get_filters_host_tensor_descriptor(filter_dims, num_dim_spatial));
Tensor<OutDataType> out_n_k_ho_wo( Tensor<OutDataType> out_n_k_ho_wo(
get_output_host_tensor_descriptor(output_dims, num_dim_spatial)); ck::utils::conv::get_output_host_tensor_descriptor(output_dims, num_dim_spatial));
std::cout << "in_n_c_hi_wi: " << in_n_c_hi_wi_host_result.mDesc << std::endl; std::cout << "in_n_c_hi_wi: " << in_n_c_hi_wi_host_result.mDesc << std::endl;
std::cout << "wei_k_c_y_x: " << wei_k_c_y_x.mDesc << std::endl; std::cout << "wei_k_c_y_x: " << wei_k_c_y_x.mDesc << std::endl;
...@@ -352,15 +286,15 @@ int main(int argc, char* argv[]) ...@@ -352,15 +286,15 @@ int main(int argc, char* argv[])
float ave_time = invoker->Run(argument.get(), nrepeat); float ave_time = invoker->Run(argument.get(), nrepeat);
std::size_t flop = ck::conv_util::GetFlops( std::size_t flop = ck::utils::conv::get_flops(
params.N, params.C, params.K, params.filter_spatial_lengths, output_spatial_lengths); params.N, params.C, params.K, params.filter_spatial_lengths, output_spatial_lengths);
std::size_t num_btype = std::size_t num_btype = ck::utils::conv::get_btype<InDataType, WeiDataType, OutDataType>(
ck::conv_util::GetBtype<InDataType, WeiDataType, OutDataType>(params.N, params.N,
params.C, params.C,
params.K, params.K,
params.input_spatial_lengths, params.input_spatial_lengths,
params.filter_spatial_lengths, params.filter_spatial_lengths,
output_spatial_lengths); output_spatial_lengths);
float tflops = static_cast<float>(flop) / 1.E9 / ave_time; float tflops = static_cast<float>(flop) / 1.E9 / ave_time;
float gb_per_sec = num_btype / 1.E6 / ave_time; float gb_per_sec = num_btype / 1.E6 / ave_time;
......
...@@ -13,6 +13,7 @@ include_directories(BEFORE ...@@ -13,6 +13,7 @@ include_directories(BEFORE
${PROJECT_SOURCE_DIR}/library/include/ck/library/host_tensor ${PROJECT_SOURCE_DIR}/library/include/ck/library/host_tensor
${PROJECT_SOURCE_DIR}/library/include/ck/library/reference_tensor_operation/cpu ${PROJECT_SOURCE_DIR}/library/include/ck/library/reference_tensor_operation/cpu
${PROJECT_SOURCE_DIR}/library/include/ck/library/reference_tensor_operation/gpu ${PROJECT_SOURCE_DIR}/library/include/ck/library/reference_tensor_operation/gpu
${PROJECT_SOURCE_DIR}/library/include/ck/library/utility
${PROJECT_SOURCE_DIR}/external/include/half ${PROJECT_SOURCE_DIR}/external/include/half
) )
...@@ -29,10 +30,8 @@ add_subdirectory(01_gemm) ...@@ -29,10 +30,8 @@ add_subdirectory(01_gemm)
add_subdirectory(02_gemm_alpha_beta) add_subdirectory(02_gemm_alpha_beta)
add_subdirectory(03_gemm_bias_relu) add_subdirectory(03_gemm_bias_relu)
add_subdirectory(04_gemm_bias_relu_add) add_subdirectory(04_gemm_bias_relu_add)
add_subdirectory(05_conv2d_fwd)
add_subdirectory(06_conv2d_fwd_bias_relu) add_subdirectory(06_conv2d_fwd_bias_relu)
add_subdirectory(07_conv2d_fwd_bias_relu_add) add_subdirectory(07_conv2d_fwd_bias_relu_add)
add_subdirectory(08_conv3d_fwd)
add_subdirectory(09_convnd_fwd) add_subdirectory(09_convnd_fwd)
add_subdirectory(10_conv2d_bwd_data) add_subdirectory(10_conv2d_bwd_data)
add_subdirectory(11_conv2d_bwd_weight) add_subdirectory(11_conv2d_bwd_weight)
......
#ifndef CONVOLUTION_UTILITY_HPP
#define CONVOLUTION_UTILITY_HPP
#include <vector>
namespace ck {
namespace tensor_operation {
struct ConvolutionUtility
{
static std::vector<ck::index_t>
ComputeOutputSpatialLengths(std::vector<ck::index_t> input_spatial_lengths,
std::vector<ck::index_t> filter_spatial_lengths,
std::vector<ck::index_t> conv_strides,
std::vector<ck::index_t> conv_dilations,
std::vector<ck::index_t> in_left_pads,
std::vector<ck::index_t> in_right_pads)
{
if(input_spatial_lengths.size() == 2)
{
assert(filter_spatial_lengths.size() == 2);
assert(conv_strides.size() == 2);
assert(conv_dilations.size() == 2);
assert(in_left_pads.size() == 2);
assert(in_right_pads.size() == 2);
const index_t YEff = (filter_spatial_lengths[0] - 1) * conv_dilations[0] + 1;
const index_t XEff = (filter_spatial_lengths[1] - 1) * conv_dilations[1] + 1;
const index_t Hi = input_spatial_lengths[0];
const index_t Wi = input_spatial_lengths[1];
const index_t Ho =
(Hi + in_left_pads[0] + in_right_pads[0] - YEff) / conv_strides[0] + 1;
const index_t Wo =
(Wi + in_left_pads[1] + in_right_pads[1] - XEff) / conv_strides[1] + 1;
return {Ho, Wo};
}
else if(input_spatial_lengths.size() == 3)
{
assert(filter_spatial_lengths.size() == 3);
assert(conv_strides.size() == 3);
assert(conv_dilations.size() == 3);
assert(in_left_pads.size() == 3);
assert(in_right_pads.size() == 3);
const index_t ZEff = (filter_spatial_lengths[0] - 1) * conv_dilations[0] + 1;
const index_t YEff = (filter_spatial_lengths[1] - 1) * conv_dilations[1] + 1;
const index_t XEff = (filter_spatial_lengths[2] - 1) * conv_dilations[2] + 1;
const index_t Di = input_spatial_lengths[0];
const index_t Hi = input_spatial_lengths[1];
const index_t Wi = input_spatial_lengths[2];
const index_t Do =
(Di + in_left_pads[0] + in_right_pads[0] - ZEff) / conv_strides[0] + 1;
const index_t Ho =
(Hi + in_left_pads[1] + in_right_pads[1] - YEff) / conv_strides[1] + 1;
const index_t Wo =
(Wi + in_left_pads[2] + in_right_pads[2] - XEff) / conv_strides[2] + 1;
return {Do, Ho, Wo};
}
else
{
return {};
}
}
};
} // namespace tensor_operation
} // namespace ck
#endif
...@@ -4,7 +4,7 @@ ...@@ -4,7 +4,7 @@
#include <iostream> #include <iostream>
#include <memory> #include <memory>
#include <sstream> #include <sstream>
#include "convolution_utility.hpp" #include "conv_fwd_util.hpp"
#include "device.hpp" #include "device.hpp"
#include "device_conv_fwd.hpp" #include "device_conv_fwd.hpp"
#include "common_header.hpp" #include "common_header.hpp"
...@@ -53,36 +53,30 @@ struct DeviceConv3dFwdNaive_Input_N_Di_Hi_Wi_C_Weight_K_Z_Y_X_C_Output_N_Do_Ho_W ...@@ -53,36 +53,30 @@ struct DeviceConv3dFwdNaive_Input_N_Di_Hi_Wi_C_Weight_K_Z_Y_X_C_Output_N_Do_Ho_W
InElementwiseOperation in_element_op, InElementwiseOperation in_element_op,
WeiElementwiseOperation wei_element_op, WeiElementwiseOperation wei_element_op,
OutElementwiseOperation out_element_op) OutElementwiseOperation out_element_op)
: N_{N}, : params_{3,
K_{K}, N,
C_{C}, K,
in_spatial_lengths_{input_spatial_lengths}, C,
filter_spatial_lengths_{filter_spatial_lengths}, filter_spatial_lengths,
input_spatial_lengths,
conv_filter_strides,
conv_filter_dilations,
input_left_pads,
input_right_pads},
out_spatial_lengths_{output_spatial_lengths}, out_spatial_lengths_{output_spatial_lengths},
conv_filter_strides_{conv_filter_strides},
conv_filter_dilations_{conv_filter_dilations},
in_left_pads_{input_left_pads},
in_right_pads_{input_right_pads},
p_in_{p_in}, p_in_{p_in},
p_wei_{p_wei}, p_wei_{p_wei},
p_out_{p_out}, p_out_{p_out},
in_element_op_{in_element_op}, in_element_op_{in_element_op},
wei_element_op_{wei_element_op}, wei_element_op_{wei_element_op},
out_element_op_{out_element_op} out_element_op_{out_element_op}
{ {
} }
// private: // private:
index_t N_; utils::conv::ConvParams params_;
index_t K_;
index_t C_;
std::vector<index_t> in_spatial_lengths_;
std::vector<index_t> filter_spatial_lengths_;
std::vector<index_t> out_spatial_lengths_; std::vector<index_t> out_spatial_lengths_;
std::vector<index_t> conv_filter_strides_;
std::vector<index_t> conv_filter_dilations_;
std::vector<index_t> in_left_pads_;
std::vector<index_t> in_right_pads_;
const InDataType* p_in_; const InDataType* p_in_;
const WeiDataType* p_wei_; const WeiDataType* p_wei_;
...@@ -157,13 +151,7 @@ struct DeviceConv3dFwdNaive_Input_N_Di_Hi_Wi_C_Weight_K_Z_Y_X_C_Output_N_Do_Ho_W ...@@ -157,13 +151,7 @@ struct DeviceConv3dFwdNaive_Input_N_Di_Hi_Wi_C_Weight_K_Z_Y_X_C_Output_N_Do_Ho_W
static bool IsSupportedArgument(const Argument& arg) static bool IsSupportedArgument(const Argument& arg)
{ {
std::vector<index_t> out_spatial_lengths = std::vector<index_t> out_spatial_lengths = arg.params_.GetOutputSpatialLengths();
ConvolutionUtility::ComputeOutputSpatialLengths(arg.in_spatial_lengths_,
arg.filter_spatial_lengths_,
arg.conv_filter_strides_,
arg.conv_filter_dilations_,
arg.in_left_pads_,
arg.in_right_pads_);
bool out_lengths_are_consistent = out_spatial_lengths[0] == arg.out_spatial_lengths_[0] && bool out_lengths_are_consistent = out_spatial_lengths[0] == arg.out_spatial_lengths_[0] &&
out_spatial_lengths[1] == arg.out_spatial_lengths_[1] && out_spatial_lengths[1] == arg.out_spatial_lengths_[1] &&
......
...@@ -300,9 +300,6 @@ HostTensorDescriptor::HostTensorDescriptor(const std::vector<X>& lens, ...@@ -300,9 +300,6 @@ HostTensorDescriptor::HostTensorDescriptor(const std::vector<X>& lens,
void ostream_HostTensorDescriptor(const HostTensorDescriptor& desc, std::ostream& os = std::cout); void ostream_HostTensorDescriptor(const HostTensorDescriptor& desc, std::ostream& os = std::cout);
#if 1 #if 1
// FIXME: remove
float bf16_to_f32_(ck::bhalf_t src_val);
// FIXME: remove // FIXME: remove
void bf16_to_f32_(const Tensor<ck::bhalf_t>& src, Tensor<float>& dst); void bf16_to_f32_(const Tensor<ck::bhalf_t>& src, Tensor<float>& dst);
#endif #endif
...@@ -353,28 +350,4 @@ float check_error(const Tensor<T>& ref, const Tensor<T>& result) ...@@ -353,28 +350,4 @@ float check_error(const Tensor<T>& ref, const Tensor<T>& result)
return linf_error; return linf_error;
} }
template <typename T>
void check_indices(const Tensor<T>& ref, const Tensor<T>& result)
{
bool has_error = false;
int error_count = 0;
for(int i = 0; i < ref.mData.size(); ++i)
{
if(ref.mData[i] != result.mData[i])
{
std::cerr << std::endl
<< "Indices different at position " << i << " (ref: " << ref.mData[i]
<< ", result: " << result.mData[i] << ")" << std::endl;
has_error = true;
error_count++;
if(error_count == 20)
break;
};
}
if(!has_error)
std::cout << std::endl << "Indices result is completely acccurate!" << std::endl;
}
#endif #endif
#ifndef TEST_UTIL_HPP #ifndef CHECK_ERR_HPP
#define TEST_UTIL_HPP #define CHECK_ERR_HPP
#include <algorithm> #include <algorithm>
#include <cmath> #include <cmath>
#include <cstdlib> #include <cstdlib>
#include <half.hpp>
#include <iostream> #include <iostream>
#include <iomanip> #include <iomanip>
#include <iterator> #include <iterator>
...@@ -13,16 +14,17 @@ ...@@ -13,16 +14,17 @@
#include "data_type.hpp" #include "data_type.hpp"
namespace test { namespace ck {
namespace utils {
template <typename T> template <typename T>
typename std::enable_if<std::is_floating_point<T>::value && !std::is_same<T, ck::half_t>::value, typename std::enable_if<std::is_floating_point<T>::value && !std::is_same<T, half_t>::value,
bool>::type bool>::type
check_err(const std::vector<T>& out, check_err(const std::vector<T>& out,
const std::vector<T>& ref, const std::vector<T>& ref,
const std::string& msg, const std::string& msg = "Error: Incorrect results!",
double rtol = 1e-5, double rtol = 1e-5,
double atol = 1e-8) double atol = 1e-8)
{ {
if(out.size() != ref.size()) if(out.size() != ref.size())
{ {
...@@ -60,13 +62,12 @@ check_err(const std::vector<T>& out, ...@@ -60,13 +62,12 @@ check_err(const std::vector<T>& out,
} }
template <typename T> template <typename T>
typename std::enable_if<std::is_same<T, ck::bhalf_t>::value || std::is_same<T, ck::half_t>::value, typename std::enable_if<std::is_same<T, bhalf_t>::value, bool>::type
bool>::type
check_err(const std::vector<T>& out, check_err(const std::vector<T>& out,
const std::vector<T>& ref, const std::vector<T>& ref,
const std::string& msg, const std::string& msg = "Error: Incorrect results!",
double rtol = 1e-5, double rtol = 1e-3,
double atol = 1e-8) double atol = 1e-3)
{ {
if(out.size() != ref.size()) if(out.size() != ref.size())
{ {
...@@ -77,14 +78,15 @@ check_err(const std::vector<T>& out, ...@@ -77,14 +78,15 @@ check_err(const std::vector<T>& out,
} }
bool res{true}; bool res{true};
int err_count = 0; int err_count = 0;
double err = 0; double err = 0;
double max_err = ck::type_convert<float>(ck::NumericLimits<T>::Min()); // TODO: This is a hack. We should have proper specialization for bhalf_t data type.
double max_err = std::numeric_limits<float>::min();
for(std::size_t i = 0; i < ref.size(); ++i) for(std::size_t i = 0; i < ref.size(); ++i)
{ {
float o = ck::type_convert<float>(out[i]); double o = type_convert<float>(out[i]);
float r = ck::type_convert<float>(ref[i]); double r = type_convert<float>(ref[i]);
err = std::abs(o - r); err = std::abs(o - r);
if(err > atol + rtol * std::abs(r) || !std::isfinite(o) || !std::isfinite(r)) if(err > atol + rtol * std::abs(r) || !std::isfinite(o) || !std::isfinite(r))
{ {
max_err = err > max_err ? err : max_err; max_err = err > max_err ? err : max_err;
...@@ -105,11 +107,14 @@ check_err(const std::vector<T>& out, ...@@ -105,11 +107,14 @@ check_err(const std::vector<T>& out,
return res; return res;
} }
bool check_err(const std::vector<ck::half_t>& out, template <typename T>
const std::vector<ck::half_t>& ref, typename std::enable_if<std::is_same<T, half_t>::value || std::is_same<T, half_float::half>::value,
const std::string& msg, bool>::type
ck::half_t rtol = static_cast<ck::half_t>(1e-3f), check_err(const std::vector<T>& out,
ck::half_t atol = static_cast<ck::half_t>(1e-3f)) const std::vector<T>& ref,
const std::string& msg = "Error: Incorrect results!",
double rtol = 1e-3,
double atol = 1e-3)
{ {
if(out.size() != ref.size()) if(out.size() != ref.size())
{ {
...@@ -122,20 +127,20 @@ bool check_err(const std::vector<ck::half_t>& out, ...@@ -122,20 +127,20 @@ bool check_err(const std::vector<ck::half_t>& out,
bool res{true}; bool res{true};
int err_count = 0; int err_count = 0;
double err = 0; double err = 0;
double max_err = std::numeric_limits<ck::half_t>::min(); double max_err = std::numeric_limits<T>::min();
for(std::size_t i = 0; i < ref.size(); ++i) for(std::size_t i = 0; i < ref.size(); ++i)
{ {
double out_ = double(out[i]); double o = type_convert<float>(out[i]);
double ref_ = double(ref[i]); double r = type_convert<float>(ref[i]);
err = std::abs(out_ - ref_); err = std::abs(o - r);
if(err > atol + rtol * std::abs(ref_) || !std::isfinite(out_) || !std::isfinite(ref_)) if(err > atol + rtol * std::abs(r) || !std::isfinite(o) || !std::isfinite(r))
{ {
max_err = err > max_err ? err : max_err; max_err = err > max_err ? err : max_err;
err_count++; err_count++;
if(err_count < 5) if(err_count < 5)
{ {
std::cout << std::setw(12) << std::setprecision(7) << "out[" << i << "] != ref[" std::cout << std::setw(12) << std::setprecision(7) << "out[" << i << "] != ref["
<< i << "]: " << out_ << "!=" << ref_ << std::endl << i << "]: " << o << " != " << r << std::endl
<< msg << std::endl; << msg << std::endl;
} }
res = false; res = false;
...@@ -149,13 +154,12 @@ bool check_err(const std::vector<ck::half_t>& out, ...@@ -149,13 +154,12 @@ bool check_err(const std::vector<ck::half_t>& out,
} }
template <typename T> template <typename T>
typename std::enable_if<std::is_integral<T>::value && !std::is_same<T, ck::bhalf_t>::value, typename std::enable_if<std::is_integral<T>::value && !std::is_same<T, bhalf_t>::value, bool>::type
bool>::type
check_err(const std::vector<T>& out, check_err(const std::vector<T>& out,
const std::vector<T>& ref, const std::vector<T>& ref,
const std::string& msg, const std::string& msg = "Error: Incorrect results!",
double = 0, double = 0,
double = 0) double = 0)
{ {
if(out.size() != ref.size()) if(out.size() != ref.size())
{ {
...@@ -178,7 +182,8 @@ check_err(const std::vector<T>& out, ...@@ -178,7 +182,8 @@ check_err(const std::vector<T>& out,
return true; return true;
} }
} // namespace test } // namespace utils
} // namespace ck
template <typename T> template <typename T>
std::ostream& operator<<(std::ostream& os, const std::vector<T>& v) std::ostream& operator<<(std::ostream& os, const std::vector<T>& v)
......
#ifndef CONV_UTILS_HPP #ifndef CONV_FWD_UTIL_HPP
#define CONV_UTILS_HPP #define CONV_FWD_UTIL_HPP
#include <cstdlib> #include <algorithm>
#include <functional> #include <cstdlib>
#include <iterator> #include <functional>
#include <numeric> #include <iterator>
#include <sstream> #include <numeric>
#include <type_traits> #include <sstream>
#include <vector> #include <random>
#include <tuple>
#include "config.hpp" #include <type_traits>
#include "host_tensor.hpp" #include <vector>
#include "tensor_layout.hpp"
#include "check_err.hpp"
namespace ck { #include "config.hpp"
namespace conv_util { #include "device.hpp"
#include "device_conv_fwd.hpp"
/** #include "device_tensor.hpp"
* @brief Calculate number of FLOPs for Convolution #include "element_wise_operation.hpp"
* #include "host_tensor.hpp"
* @param[in] N Batch size. #include "reference_conv_fwd.hpp"
* @param[in] C Number of input channels. #include "tensor_layout.hpp"
* @param[in] K Number of output channels.
* @param[in] filter_spatial_lengths Filter spatial dimensions lengths. namespace ck {
* @param[in] output_spatial_lengths Convolution output spatial dimensions namespace utils {
* lengths. namespace conv {
*
* @return The number of flops. using DeviceConvFwdNoOpPtr =
*/ ck::tensor_operation::device::DeviceConvFwdPtr<ck::tensor_operation::element_wise::PassThrough,
std::size_t GetFlops(ck::index_t N, ck::tensor_operation::element_wise::PassThrough,
ck::index_t C, ck::tensor_operation::element_wise::PassThrough>;
ck::index_t K,
const std::vector<ck::index_t>& filter_spatial_lengths, /**
const std::vector<ck::index_t>& output_spatial_lengths) * @brief Calculate number of FLOPs for Convolution
{ *
// 2 * N * K * <output spatial lengths product> * C * <filter spatial lengths product> * @param[in] N Batch size.
return static_cast<std::size_t>(2) * N * K * * @param[in] C Number of input channels.
std::accumulate(std::begin(output_spatial_lengths), * @param[in] K Number of output channels.
std::end(output_spatial_lengths), * @param[in] filter_spatial_lengths Filter spatial dimensions lengths.
static_cast<std::size_t>(1), * @param[in] output_spatial_lengths Convolution output spatial dimensions
std::multiplies<std::size_t>()) * * lengths.
C * *
std::accumulate(std::begin(filter_spatial_lengths), * @return The number of flops.
std::end(filter_spatial_lengths), */
static_cast<std::size_t>(1), std::size_t get_flops(ck::index_t N,
std::multiplies<std::size_t>()); ck::index_t C,
} ck::index_t K,
const std::vector<ck::index_t>& filter_spatial_lengths,
/** const std::vector<ck::index_t>& output_spatial_lengths)
* @brief Calculate number of bytes read/write by convolution algorithm. {
* // 2 * N * K * <output spatial lengths product> * C * <filter spatial lengths product>
* @param[in] N Batch size. return static_cast<std::size_t>(2) * N * K *
* @param[in] C Number of input channels. std::accumulate(std::begin(output_spatial_lengths),
* @param[in] K Number of output channels. std::end(output_spatial_lengths),
* @param[in] input_spatial_lengths Input spatial dimensions lengths. static_cast<std::size_t>(1),
* @param[in] filter_spatial_lengths Filter spatial dimensions lengths. std::multiplies<std::size_t>()) *
* @param[in] output_spatial_lengths Output spatial dimensions lengths C *
* std::accumulate(std::begin(filter_spatial_lengths),
* @tparam InDataType Input tensor data type. std::end(filter_spatial_lengths),
* @tparam WeiDataType Weights tensor data type. static_cast<std::size_t>(1),
* @tparam OutDataType Output tensor data type. std::multiplies<std::size_t>());
* }
* @return The number of used bytes.
*/ /**
template <typename InDataType = float, * @brief Calculate number of bytes read/write by convolution algorithm.
typename WeiDataType = InDataType, *
typename OutDataType = InDataType> * @param[in] N Batch size.
std::size_t GetBtype(ck::index_t N, * @param[in] C Number of input channels.
ck::index_t C, * @param[in] K Number of output channels.
ck::index_t K, * @param[in] input_spatial_lengths Input spatial dimensions lengths.
const std::vector<ck::index_t>& input_spatial_lengths, * @param[in] filter_spatial_lengths Filter spatial dimensions lengths.
const std::vector<ck::index_t>& filter_spatial_lengths, * @param[in] output_spatial_lengths Output spatial dimensions lengths
const std::vector<ck::index_t>& output_spatial_lengths) *
{ * @tparam InDataType Input tensor data type.
// sizeof(InDataType) * (N * C * <input spatial lengths product>) + * @tparam WeiDataType Weights tensor data type.
// sizeof(WeiDataType) * (K * C * <filter spatial lengths product>) + * @tparam OutDataType Output tensor data type.
// sizeof(OutDataType) * (N * K * <output spatial lengths product>); *
return sizeof(InDataType) * (N * C * * @return The number of used bytes.
std::accumulate(std::begin(input_spatial_lengths), */
std::end(input_spatial_lengths), template <typename InDataType = float,
static_cast<std::size_t>(1), typename WeiDataType = InDataType,
std::multiplies<std::size_t>())) + typename OutDataType = InDataType>
sizeof(WeiDataType) * (K * C * std::size_t get_btype(ck::index_t N,
std::accumulate(std::begin(filter_spatial_lengths), ck::index_t C,
std::end(filter_spatial_lengths), ck::index_t K,
static_cast<std::size_t>(1), const std::vector<ck::index_t>& input_spatial_lengths,
std::multiplies<std::size_t>())) + const std::vector<ck::index_t>& filter_spatial_lengths,
sizeof(OutDataType) * (N * K * const std::vector<ck::index_t>& output_spatial_lengths)
std::accumulate(std::begin(output_spatial_lengths), {
std::end(output_spatial_lengths), // sizeof(InDataType) * (N * C * <input spatial lengths product>) +
static_cast<std::size_t>(1), // sizeof(WeiDataType) * (K * C * <filter spatial lengths product>) +
std::multiplies<std::size_t>())); // sizeof(OutDataType) * (N * K * <output spatial lengths product>);
} return sizeof(InDataType) * (N * C *
std::accumulate(std::begin(input_spatial_lengths),
struct ConvParams std::end(input_spatial_lengths),
{ static_cast<std::size_t>(1),
ConvParams() std::multiplies<std::size_t>())) +
: num_dim_spatial(2), sizeof(WeiDataType) * (K * C *
N(128), std::accumulate(std::begin(filter_spatial_lengths),
K(256), std::end(filter_spatial_lengths),
C(192), static_cast<std::size_t>(1),
filter_spatial_lengths(2, 3), std::multiplies<std::size_t>())) +
input_spatial_lengths(2, 71), sizeof(OutDataType) * (N * K *
conv_filter_strides(2, 2), std::accumulate(std::begin(output_spatial_lengths),
conv_filter_dilations(2, 1), std::end(output_spatial_lengths),
input_left_pads(2, 1), static_cast<std::size_t>(1),
input_right_pads(2, 1) std::multiplies<std::size_t>()));
{ }
}
ConvParams(ck::index_t n_dim_spatial, struct ConvParams
ck::index_t n, {
ck::index_t k, ConvParams()
ck::index_t c, : num_dim_spatial(2),
std::vector<ck::index_t> filter_lengths, N(128),
std::vector<ck::index_t> input_lengths, K(256),
std::vector<ck::index_t> conv_strides, C(192),
std::vector<ck::index_t> conv_dilations, filter_spatial_lengths(2, 3),
std::vector<ck::index_t> left_pads, input_spatial_lengths(2, 71),
std::vector<ck::index_t> right_pads) conv_filter_strides(2, 2),
: num_dim_spatial(n_dim_spatial), conv_filter_dilations(2, 1),
N(n), input_left_pads(2, 1),
K(k), input_right_pads(2, 1)
C(c), {
filter_spatial_lengths(filter_lengths), }
input_spatial_lengths(input_lengths),
conv_filter_strides(conv_strides), ConvParams(ck::index_t n_dim,
conv_filter_dilations(conv_dilations), ck::index_t n_batch,
input_left_pads(left_pads), ck::index_t n_out_channels,
input_right_pads(right_pads) ck::index_t n_in_channels,
{ const std::vector<ck::index_t>& filters_len,
} const std::vector<ck::index_t>& input_len,
const std::vector<ck::index_t>& strides,
ck::index_t num_dim_spatial; const std::vector<ck::index_t>& dilations,
ck::index_t N; const std::vector<ck::index_t>& left_pads,
ck::index_t K; const std::vector<ck::index_t>& right_pads)
ck::index_t C; : num_dim_spatial(n_dim),
N(n_batch),
std::vector<ck::index_t> filter_spatial_lengths; K(n_out_channels),
std::vector<ck::index_t> input_spatial_lengths; C(n_in_channels),
filter_spatial_lengths(filters_len),
std::vector<ck::index_t> conv_filter_strides; input_spatial_lengths(input_len),
std::vector<ck::index_t> conv_filter_dilations; conv_filter_strides(strides),
conv_filter_dilations(dilations),
std::vector<ck::index_t> input_left_pads; input_left_pads(left_pads),
std::vector<ck::index_t> input_right_pads; input_right_pads(right_pads)
{
std::vector<ck::index_t> GetOutputSpatialLengths() const if(filter_spatial_lengths.size() != num_dim_spatial ||
{ input_spatial_lengths.size() != num_dim_spatial ||
std::vector<ck::index_t> out_spatial_len(num_dim_spatial, 0); conv_filter_strides.size() != num_dim_spatial ||
for(ck::index_t i = 0; i < num_dim_spatial; ++i) conv_filter_dilations.size() != num_dim_spatial ||
{ input_left_pads.size() != num_dim_spatial || input_right_pads.size() != num_dim_spatial)
// XEff = (X - 1) * conv_dilation_w + 1; {
// Wo = (Wi + in_left_pad_w + in_right_pad_w - XEff) / conv_stride_w + 1; throw(std::runtime_error(
const ck::index_t idx_eff = "ConvParams::GetOutputSpatialLengths: "
(filter_spatial_lengths[i] - 1) * conv_filter_dilations[i] + 1; "parameter size is different from number of declared dimensions!"));
out_spatial_len[i] = }
(input_spatial_lengths[i] + input_left_pads[i] + input_right_pads[i] - idx_eff) / }
conv_filter_strides[i] +
1; ck::index_t num_dim_spatial;
} ck::index_t N;
return out_spatial_len; ck::index_t K;
} ck::index_t C;
};
std::vector<ck::index_t> filter_spatial_lengths;
/** std::vector<ck::index_t> input_spatial_lengths;
* @brief Gets the host tensor descriptor.
* std::vector<ck::index_t> conv_filter_strides;
* @param[in] dims The tensor dimensions lengths. Always in NCHW format. std::vector<ck::index_t> conv_filter_dilations;
* @param[in] layout The tensor data layout.
* std::vector<ck::index_t> input_left_pads;
* @tparam TensorLayout Layout type. std::vector<ck::index_t> input_right_pads;
*
* @return The host tensor descriptor object. std::vector<ck::index_t> GetOutputSpatialLengths() const
*/ {
template <typename TensorLayout> if(filter_spatial_lengths.size() != num_dim_spatial ||
HostTensorDescriptor GetHostTensorDescriptor(const std::vector<std::size_t>& dims, input_spatial_lengths.size() != num_dim_spatial ||
const TensorLayout& layout) conv_filter_strides.size() != num_dim_spatial ||
{ conv_filter_dilations.size() != num_dim_spatial ||
std::size_t C = dims[1]; input_left_pads.size() != num_dim_spatial || input_right_pads.size() != num_dim_spatial)
// 1D {
if constexpr(std::is_same<TensorLayout, ck::tensor_layout::convolution::NCW>::value || throw(std::runtime_error(
std::is_same<TensorLayout, ck::tensor_layout::convolution::KCX>::value || "ConvParams::GetOutputSpatialLengths: "
std::is_same<TensorLayout, ck::tensor_layout::convolution::NKW>::value) "parameter size is different from number of declared dimensions!"));
{ }
return HostTensorDescriptor(dims, std::vector<std::size_t>({C * dims[2], dims[2], 1})); std::vector<ck::index_t> out_spatial_len(num_dim_spatial, 0);
} for(ck::index_t i = 0; i < num_dim_spatial; ++i)
else if constexpr(std::is_same<TensorLayout, ck::tensor_layout::convolution::NWC>::value || {
std::is_same<TensorLayout, ck::tensor_layout::convolution::KXC>::value || // XEff = (X - 1) * conv_dilation_w + 1;
std::is_same<TensorLayout, ck::tensor_layout::convolution::NWK>::value) // Wo = (Wi + in_left_pad_w + in_right_pad_w - XEff) / conv_stride_w + 1;
{ const ck::index_t idx_eff =
return HostTensorDescriptor(dims, std::vector<std::size_t>({C * dims[2], 1, C})); (filter_spatial_lengths[i] - 1) * conv_filter_dilations[i] + 1;
} out_spatial_len[i] =
// 2D (input_spatial_lengths[i] + input_left_pads[i] + input_right_pads[i] - idx_eff) /
else if constexpr(std::is_same<TensorLayout, ck::tensor_layout::convolution::NCHW>::value || conv_filter_strides[i] +
std::is_same<TensorLayout, ck::tensor_layout::convolution::KCYX>::value || 1;
std::is_same<TensorLayout, ck::tensor_layout::convolution::NKHW>::value) }
{ return out_spatial_len;
}
return HostTensorDescriptor( };
dims, std::vector<std::size_t>{C * dims[2] * dims[3], dims[2] * dims[3], dims[3], 1});
} /**
else if constexpr(std::is_same<TensorLayout, ck::tensor_layout::convolution::NHWC>::value || * @brief Gets the host tensor descriptor.
std::is_same<TensorLayout, ck::tensor_layout::convolution::KYXC>::value || *
std::is_same<TensorLayout, ck::tensor_layout::convolution::NHWK>::value) * @param[in] dims The tensor dimensions lengths. Always in NCHW format.
{ * @param[in] layout The tensor data layout.
return HostTensorDescriptor( *
dims, std::vector<std::size_t>{C * dims[2] * dims[3], 1, dims[3] * C, C}); * @tparam TensorLayout Layout type.
} *
// 3D * @return The host tensor descriptor object.
else if constexpr(std::is_same<TensorLayout, ck::tensor_layout::convolution::NCDHW>::value || */
std::is_same<TensorLayout, ck::tensor_layout::convolution::KCZYX>::value || template <typename TensorLayout>
std::is_same<TensorLayout, ck::tensor_layout::convolution::NKDHW>::value) HostTensorDescriptor get_host_tensor_descriptor(const std::vector<std::size_t>& dims,
{ const TensorLayout& layout)
{
return HostTensorDescriptor(dims, std::size_t C = dims[1];
std::vector<std::size_t>{C * dims[2] * dims[3] * dims[4], // 1D
dims[2] * dims[3] * dims[4], if constexpr(std::is_same<TensorLayout, ck::tensor_layout::convolution::NCW>::value ||
dims[3] * dims[4], std::is_same<TensorLayout, ck::tensor_layout::convolution::KCX>::value ||
dims[4], std::is_same<TensorLayout, ck::tensor_layout::convolution::NKW>::value)
1}); {
}
else if constexpr(std::is_same<TensorLayout, ck::tensor_layout::convolution::NDHWC>::value || return HostTensorDescriptor(dims, std::vector<std::size_t>({C * dims[2], dims[2], 1}));
std::is_same<TensorLayout, ck::tensor_layout::convolution::KZYXC>::value || }
std::is_same<TensorLayout, ck::tensor_layout::convolution::NDHWK>::value) else if constexpr(std::is_same<TensorLayout, ck::tensor_layout::convolution::NWC>::value ||
{ std::is_same<TensorLayout, ck::tensor_layout::convolution::KXC>::value ||
return HostTensorDescriptor( std::is_same<TensorLayout, ck::tensor_layout::convolution::NWK>::value)
dims, {
std::vector<std::size_t>{ return HostTensorDescriptor(dims, std::vector<std::size_t>({C * dims[2], 1, C}));
C * dims[2] * dims[3] * dims[4], 1, dims[3] * dims[4] * C, dims[4] * C, C}); }
} // 2D
else if constexpr(std::is_same<TensorLayout, ck::tensor_layout::convolution::NCHW>::value ||
std::stringstream err_msg; std::is_same<TensorLayout, ck::tensor_layout::convolution::KCYX>::value ||
err_msg << "Unsupported data layout provided: " << layout << "!"; std::is_same<TensorLayout, ck::tensor_layout::convolution::NKHW>::value)
throw std::runtime_error(err_msg.str()); {
}
return HostTensorDescriptor(
} // namespace conv_util dims, std::vector<std::size_t>{C * dims[2] * dims[3], dims[2] * dims[3], dims[3], 1});
} // namespace ck }
else if constexpr(std::is_same<TensorLayout, ck::tensor_layout::convolution::NHWC>::value ||
#endif std::is_same<TensorLayout, ck::tensor_layout::convolution::KYXC>::value ||
std::is_same<TensorLayout, ck::tensor_layout::convolution::NHWK>::value)
{
return HostTensorDescriptor(
dims, std::vector<std::size_t>{C * dims[2] * dims[3], 1, dims[3] * C, C});
}
// 3D
else if constexpr(std::is_same<TensorLayout, ck::tensor_layout::convolution::NCDHW>::value ||
std::is_same<TensorLayout, ck::tensor_layout::convolution::KCZYX>::value ||
std::is_same<TensorLayout, ck::tensor_layout::convolution::NKDHW>::value)
{
return HostTensorDescriptor(dims,
std::vector<std::size_t>{C * dims[2] * dims[3] * dims[4],
dims[2] * dims[3] * dims[4],
dims[3] * dims[4],
dims[4],
1});
}
else if constexpr(std::is_same<TensorLayout, ck::tensor_layout::convolution::NDHWC>::value ||
std::is_same<TensorLayout, ck::tensor_layout::convolution::KZYXC>::value ||
std::is_same<TensorLayout, ck::tensor_layout::convolution::NDHWK>::value)
{
return HostTensorDescriptor(
dims,
std::vector<std::size_t>{
C * dims[2] * dims[3] * dims[4], 1, C * dims[3] * dims[4], C * dims[4], C});
}
std::stringstream err_msg;
err_msg << "Unsupported data layout provided: " << layout << "!";
throw std::runtime_error(err_msg.str());
}
template <typename InDataType = float,
typename WeiDataType = float,
typename OutDataType = float,
typename InLayout = ck::tensor_layout::convolution::NHWC,
typename WeiLayout = ck::tensor_layout::convolution::KYXC,
typename OutLayout = ck::tensor_layout::convolution::NHWK>
auto get_host_tensors(const ConvParams& params, bool init = true)
{
std::vector<std::size_t> input_dims{static_cast<std::size_t>(params.N),
static_cast<std::size_t>(params.C)};
input_dims.insert(std::end(input_dims),
std::begin(params.input_spatial_lengths),
std::end(params.input_spatial_lengths));
std::vector<std::size_t> filter_dims{static_cast<std::size_t>(params.K),
static_cast<std::size_t>(params.C)};
filter_dims.insert(std::end(filter_dims),
std::begin(params.filter_spatial_lengths),
std::end(params.filter_spatial_lengths));
const std::vector<ck::index_t>& output_spatial_lengths = params.GetOutputSpatialLengths();
std::vector<std::size_t> output_dims{static_cast<std::size_t>(params.N),
static_cast<std::size_t>(params.K)};
output_dims.insert(std::end(output_dims),
std::begin(output_spatial_lengths),
std::end(output_spatial_lengths));
Tensor<InDataType> input(ck::utils::conv::get_host_tensor_descriptor(input_dims, InLayout{}));
Tensor<WeiDataType> weights(
ck::utils::conv::get_host_tensor_descriptor(filter_dims, WeiLayout{}));
Tensor<OutDataType> host_output(
ck::utils::conv::get_host_tensor_descriptor(output_dims, OutLayout{}));
Tensor<OutDataType> device_output(
ck::utils::conv::get_host_tensor_descriptor(output_dims, OutLayout{}));
if(init)
{
std::mt19937 gen(11939);
if constexpr(std::is_same<InDataType, uint8_t>::value)
{
std::uniform_int_distribution<> dis(-5, 5);
std::generate(
input.begin(), input.end(), [&dis, &gen]() { return InDataType(dis(gen)); });
std::generate(
weights.begin(), weights.end(), [&dis, &gen]() { return WeiDataType(dis(gen)); });
}
else
{
std::uniform_real_distribution<> dis(0.f, 1.f);
std::generate(
input.begin(), input.end(), [&dis, &gen]() { return InDataType(dis(gen)); });
std::generate(
weights.begin(), weights.end(), [&dis, &gen]() { return WeiDataType(dis(gen)); });
}
std::fill(host_output.begin(), host_output.end(), OutDataType(0.f));
std::fill(device_output.begin(), device_output.end(), OutDataType(0.f));
}
return std::make_tuple(input, weights, host_output, device_output);
}
HostTensorDescriptor get_output_host_tensor_descriptor(const std::vector<std::size_t>& dims,
int num_dim_spatial = 2)
{
namespace tl = ck::tensor_layout::convolution;
switch(num_dim_spatial)
{
case 3: {
return ck::utils::conv::get_host_tensor_descriptor(dims, tl::NDHWK{});
}
case 2: {
return ck::utils::conv::get_host_tensor_descriptor(dims, tl::NHWK{});
}
case 1: {
return ck::utils::conv::get_host_tensor_descriptor(dims, tl::NWK{});
}
default: {
throw std::runtime_error("Unsupported number of spatial dimensions provided!");
}
}
}
HostTensorDescriptor get_filters_host_tensor_descriptor(const std::vector<std::size_t>& dims,
int num_dim_spatial = 2)
{
namespace tl = ck::tensor_layout::convolution;
switch(num_dim_spatial)
{
case 3: {
return ck::utils::conv::get_host_tensor_descriptor(dims, tl::KZYXC{});
}
case 2: {
return ck::utils::conv::get_host_tensor_descriptor(dims, tl::KYXC{});
}
case 1: {
return ck::utils::conv::get_host_tensor_descriptor(dims, tl::KXC{});
}
default: {
throw std::runtime_error("Unsupported number of spatial dimensions provided!");
}
}
}
HostTensorDescriptor get_input_host_tensor_descriptor(const std::vector<std::size_t>& dims,
int num_dim_spatial = 2)
{
namespace tl = ck::tensor_layout::convolution;
switch(num_dim_spatial)
{
case 3: {
return ck::utils::conv::get_host_tensor_descriptor(dims, tl::NDHWC{});
}
case 2: {
return ck::utils::conv::get_host_tensor_descriptor(dims, tl::NHWC{});
}
case 1: {
return ck::utils::conv::get_host_tensor_descriptor(dims, tl::NWC{});
}
default: {
throw std::runtime_error("Unsupported number of spatial dimensions provided!");
}
}
}
template <ck::index_t NDim,
typename InDataType = float,
typename WeiDataType = float,
typename OutDataType = float>
void run_reference_convolution_forward(const ConvParams& params,
const Tensor<InDataType>& input,
const Tensor<WeiDataType>& weights,
Tensor<OutDataType>& output)
{
using PassThrough = ck::tensor_operation::element_wise::PassThrough;
auto ref_conv = ck::tensor_operation::host::ReferenceConvFwd<InDataType,
WeiDataType,
OutDataType,
PassThrough,
PassThrough,
PassThrough,
NDim>();
auto ref_invoker = ref_conv.MakeInvoker();
auto ref_argument = ref_conv.MakeArgument(input,
weights,
output,
params.conv_filter_strides,
params.conv_filter_dilations,
params.input_left_pads,
params.input_right_pads,
PassThrough{},
PassThrough{},
PassThrough{});
ref_invoker.Run(ref_argument);
}
template <ck::index_t NDim,
typename InDataType = float,
typename WeiDataType = float,
typename OutDataType = float,
template <ck::index_t, typename, typename, typename>
class DeviceConvNDFwdInstance>
void run_convolution_forward(const ConvParams& params,
const Tensor<InDataType>& input,
const Tensor<WeiDataType>& weights,
Tensor<OutDataType>& output)
{
using PassThrough = ck::tensor_operation::element_wise::PassThrough;
DeviceMem in_device_buf(sizeof(InDataType) * input.mDesc.GetElementSpace());
DeviceMem wei_device_buf(sizeof(WeiDataType) * weights.mDesc.GetElementSpace());
DeviceMem out_device_buf(sizeof(OutDataType) * output.mDesc.GetElementSpace());
in_device_buf.ToDevice(input.mData.data());
wei_device_buf.ToDevice(weights.mData.data());
const std::vector<ck::index_t>& output_spatial_lengths = params.GetOutputSpatialLengths();
auto conv = DeviceConvNDFwdInstance<NDim, InDataType, WeiDataType, OutDataType>();
auto invoker = conv.MakeInvoker();
auto argument = conv.MakeArgument(static_cast<InDataType*>(in_device_buf.GetDeviceBuffer()),
static_cast<WeiDataType*>(wei_device_buf.GetDeviceBuffer()),
static_cast<OutDataType*>(out_device_buf.GetDeviceBuffer()),
params.N,
params.K,
params.C,
params.input_spatial_lengths,
params.filter_spatial_lengths,
output_spatial_lengths,
params.conv_filter_strides,
params.conv_filter_dilations,
params.input_left_pads,
params.input_right_pads,
PassThrough{},
PassThrough{},
PassThrough{});
if(!conv.IsSupportedArgument(argument))
{
throw std::runtime_error(
"Error! device_conv with the specified compilation parameters does "
"not support this Conv problem");
}
invoker.Run(argument);
out_device_buf.FromDevice(output.mData.data());
}
template <ck::index_t NDim,
typename InDataType = float,
typename WeiDataType = float,
typename OutDataType = float>
bool run_convolution_forward_instances(const ConvParams& params,
const std::vector<DeviceConvFwdNoOpPtr>& conv_ptrs,
const Tensor<InDataType>& input,
const Tensor<WeiDataType>& weights,
Tensor<OutDataType>& output,
const Tensor<OutDataType>& host_output)
{
using PassThrough = ck::tensor_operation::element_wise::PassThrough;
DeviceMem in_device_buf(sizeof(InDataType) * input.mDesc.GetElementSpace());
DeviceMem wei_device_buf(sizeof(WeiDataType) * weights.mDesc.GetElementSpace());
DeviceMem out_device_buf(sizeof(OutDataType) * output.mDesc.GetElementSpace());
in_device_buf.ToDevice(input.mData.data());
wei_device_buf.ToDevice(weights.mData.data());
const std::vector<ck::index_t>& output_spatial_lengths = params.GetOutputSpatialLengths();
bool res{true};
for(auto& conv_ptr : conv_ptrs)
{
auto invoker = conv_ptr->MakeInvokerPointer();
auto argument = conv_ptr->MakeArgumentPointer(
static_cast<InDataType*>(in_device_buf.GetDeviceBuffer()),
static_cast<WeiDataType*>(wei_device_buf.GetDeviceBuffer()),
static_cast<OutDataType*>(out_device_buf.GetDeviceBuffer()),
params.N,
params.K,
params.C,
params.input_spatial_lengths,
params.filter_spatial_lengths,
output_spatial_lengths,
params.conv_filter_strides,
params.conv_filter_dilations,
params.input_left_pads,
params.input_right_pads,
PassThrough{},
PassThrough{},
PassThrough{});
if(conv_ptr->IsSupportedArgument(argument.get()))
{
float atol{1e-5f};
float rtol{1e-4f};
if constexpr(std::is_same_v<InDataType, ck::half_t>)
{
atol = 1e-4f;
rtol = 2.5e-3f;
}
invoker->Run(argument.get());
out_device_buf.FromDevice(output.mData.data());
res = res &&
ck::utils::check_err(
output.mData, host_output.mData, "Error: incorrect results!", atol, rtol);
hipGetErrorString(
hipMemset(out_device_buf.GetDeviceBuffer(), 0, out_device_buf.mMemSize));
}
}
return res;
}
} // namespace conv
} // namespace utils
} // namespace ck
#endif
...@@ -65,21 +65,10 @@ void ostream_HostTensorDescriptor(const HostTensorDescriptor& desc, std::ostream ...@@ -65,21 +65,10 @@ void ostream_HostTensorDescriptor(const HostTensorDescriptor& desc, std::ostream
} }
#if 1 #if 1
// FIXME: remove
float bf16_to_f32_(ck::bhalf_t src_val)
{
union
{
uint32_t int32;
float fp32;
} u = {uint32_t(src_val) << 16};
return u.fp32;
}
// FIXME: remove // FIXME: remove
void bf16_to_f32_(const Tensor<ck::bhalf_t>& src, Tensor<float>& dst) void bf16_to_f32_(const Tensor<ck::bhalf_t>& src, Tensor<float>& dst)
{ {
for(int i = 0; i < src.mData.size(); ++i) for(int i = 0; i < src.mData.size(); ++i)
dst.mData[i] = bf16_to_f32_(src.mData[i]); dst.mData[i] = ck::type_convert<float>(src.mData[i]);
} }
#endif #endif
...@@ -4,6 +4,8 @@ ...@@ -4,6 +4,8 @@
#include <cstdlib> #include <cstdlib>
#include <stdlib.h> #include <stdlib.h>
#include <half.hpp> #include <half.hpp>
#include "check_err.hpp"
#include "config.hpp" #include "config.hpp"
#include "debug.hpp" #include "debug.hpp"
#include "print.hpp" #include "print.hpp"
...@@ -401,7 +403,7 @@ int main(int argc, char* argv[]) ...@@ -401,7 +403,7 @@ int main(int argc, char* argv[])
make_tuple(in_right_pad_h, in_right_pad_w), make_tuple(in_right_pad_h, in_right_pad_w),
activ_type); activ_type);
check_error(add_host, add_device); ck::utils::check_err(add_device.mData, add_host.mData);
if(do_log) if(do_log)
{ {
......
...@@ -4,6 +4,8 @@ ...@@ -4,6 +4,8 @@
#include <cstdlib> #include <cstdlib>
#include <stdlib.h> #include <stdlib.h>
#include <half.hpp> #include <half.hpp>
#include "check_err.hpp"
#include "config.hpp" #include "config.hpp"
#include "debug.hpp" #include "debug.hpp"
#include "print.hpp" #include "print.hpp"
...@@ -473,7 +475,7 @@ int main(int argc, char* argv[]) ...@@ -473,7 +475,7 @@ int main(int argc, char* argv[])
make_tuple(in_right_pad_h, in_right_pad_w), make_tuple(in_right_pad_h, in_right_pad_w),
layout); layout);
check_error(in_host, in_device); ck::utils::check_err(in_device.mData, in_host.mData);
if(do_log) if(do_log)
{ {
......
...@@ -4,6 +4,8 @@ ...@@ -4,6 +4,8 @@
#include <cstdlib> #include <cstdlib>
#include <stdlib.h> #include <stdlib.h>
#include <half.hpp> #include <half.hpp>
#include "check_err.hpp"
#include "config.hpp" #include "config.hpp"
#include "debug.hpp" #include "debug.hpp"
#include "print.hpp" #include "print.hpp"
...@@ -534,7 +536,7 @@ int main(int argc, char* argv[]) ...@@ -534,7 +536,7 @@ int main(int argc, char* argv[])
make_tuple(in_right_pad_h, in_right_pad_w), make_tuple(in_right_pad_h, in_right_pad_w),
layout); layout);
check_error(out_host, out_device); ck::utils::check_err(out_device.mData, out_host.mData);
if(do_log) if(do_log)
{ {
......
...@@ -4,6 +4,8 @@ ...@@ -4,6 +4,8 @@
#include <cstdlib> #include <cstdlib>
#include <stdlib.h> #include <stdlib.h>
#include <half.hpp> #include <half.hpp>
#include "check_err.hpp"
#include "config.hpp" #include "config.hpp"
#include "debug.hpp" #include "debug.hpp"
#include "print.hpp" #include "print.hpp"
...@@ -377,7 +379,7 @@ int main(int argc, char* argv[]) ...@@ -377,7 +379,7 @@ int main(int argc, char* argv[])
make_tuple(in_right_pad_h, in_right_pad_w), make_tuple(in_right_pad_h, in_right_pad_w),
activ_type); activ_type);
check_error(out_host, out_device); ck::utils::check_err(out_device.mData, out_host.mData);
if(do_log) if(do_log)
{ {
......
...@@ -4,6 +4,8 @@ ...@@ -4,6 +4,8 @@
#include <cstdlib> #include <cstdlib>
#include <stdlib.h> #include <stdlib.h>
#include <half.hpp> #include <half.hpp>
#include "check_err.hpp"
#include "config.hpp" #include "config.hpp"
#include "debug.hpp" #include "debug.hpp"
#include "print.hpp" #include "print.hpp"
...@@ -397,8 +399,8 @@ int main(int argc, char* argv[]) ...@@ -397,8 +399,8 @@ int main(int argc, char* argv[])
make_tuple(in_right_pad_h, in_right_pad_w), make_tuple(in_right_pad_h, in_right_pad_w),
activ_type); activ_type);
check_error(out_host, out_device); ck::utils::check_err(out_device.mData, out_host.mData);
check_error(max_host, max_device); ck::utils::check_err(max_device.mData, max_host.mData);
if(do_log) if(do_log)
{ {
......
...@@ -4,6 +4,8 @@ ...@@ -4,6 +4,8 @@
#include <cstdlib> #include <cstdlib>
#include <stdlib.h> #include <stdlib.h>
#include <half.hpp> #include <half.hpp>
#include "check_err.hpp"
#include "config.hpp" #include "config.hpp"
#include "debug.hpp" #include "debug.hpp"
#include "print.hpp" #include "print.hpp"
...@@ -517,7 +519,7 @@ int main(int argc, char* argv[]) ...@@ -517,7 +519,7 @@ int main(int argc, char* argv[])
make_tuple(in_right_pad_h, in_right_pad_w), make_tuple(in_right_pad_h, in_right_pad_w),
layout); layout);
check_error(wei_host, wei_device); ck::utils::check_err(wei_device.mData, wei_host.mData);
if(do_log) if(do_log)
{ {
......
...@@ -4,6 +4,8 @@ ...@@ -4,6 +4,8 @@
#include <cstdlib> #include <cstdlib>
#include <stdlib.h> #include <stdlib.h>
#include <half.hpp> #include <half.hpp>
#include "check_err.hpp"
#include "config.hpp" #include "config.hpp"
#include "debug.hpp" #include "debug.hpp"
#include "print.hpp" #include "print.hpp"
...@@ -441,7 +443,7 @@ int main(int argc, char* argv[]) ...@@ -441,7 +443,7 @@ int main(int argc, char* argv[])
{ {
host_gemm(a, b, c_host, layout); host_gemm(a, b, c_host, layout);
check_error(c_host, c_device); ck::utils::check_err(c_device.mData, c_host.mData);
if(do_log) if(do_log)
{ {
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment