Reduction in Composable Kernel (#82)

* Initial adding of generic reduction * Initial adding of generic reduction ... * Updates to make compiling done * clang-format all files * clang-format some files again * Renaming in profiler/include/profile_reduce.hpp * Updates and make BlockWise cases passed * Updates and make ThreadWise and MultiBlockTwoCall cases passed * Remove the support for MUL and NORM1 reduceOp from the profiler and the device instances * Change to replace the dim0_max_vector_size/dim1_max_vector_size template argument in the device reduce classes * format * adding pooling * added max and average pooling * comment out cout and kernel timing * Tiny simplification in profiler/reduce_profiler.cpp * Add example for reduce_blockwise * Tiny updates * Change to pass the ElementWiseOp from device layer to kernel * Fix the vectorDim and vectorSize in Device layer * Enable vector load on both dim0 and dim1 for Threadwise method * Tiny updates...

Reduction in Composable Kernel (#82)
* Initial adding of generic reduction * Initial adding of generic reduction ... * Updates to make compiling done * clang-format all files * clang-format some files again * Renaming in profiler/include/profile_reduce.hpp * Updates and make BlockWise cases passed * Updates and make ThreadWise and MultiBlockTwoCall cases passed * Remove the support for MUL and NORM1 reduceOp from the profiler and the device instances * Change to replace the dim0_max_vector_size/dim1_max_vector_size template argument in the device reduce classes * format * adding pooling * added max and average pooling * comment out cout and kernel timing * Tiny simplification in profiler/reduce_profiler.cpp * Add example for reduce_blockwise * Tiny updates * Change to pass the ElementWiseOp from device layer to kernel * Fix the vectorDim and vectorSize in Device layer * Enable vector load on both dim0 and dim1 for Threadwise method * Tiny updates...
e17c0d80 · Qianfeng · GitHub · 12dfba3d · e17c0d80 · e17c0d80
Unverified Commit e17c0d80 authored Mar 06, 2022 by Qianfeng Committed by GitHub Mar 05, 2022
16 changed files
--- a/example/12_pool2d_fwd/pool2d_fwd.cpp
+++ b/example/12_pool2d_fwd/pool2d_fwd.cpp
--- a/example/13_reduce_blockwise/reduce_blockwise.cpp
+++ b/example/13_reduce_blockwise/reduce_blockwise.cpp
--- a/example/CMakeLists.txt
+++ b/example/CMakeLists.txt
--- a/host/host_tensor/include/device.hpp
+++ b/host/host_tensor/include/device.hpp
--- a/host/host_tensor/include/host_conv.hpp
+++ b/host/host_tensor/include/host_conv.hpp
--- a/host/host_tensor/include/host_generic_reduction.hpp
+++ b/host/host_tensor/include/host_generic_reduction.hpp
--- a/host/host_tensor/include/host_reduce_util.hpp
+++ b/host/host_tensor/include/host_reduce_util.hpp
--- a/host/host_tensor/include/host_tensor.hpp
+++ b/host/host_tensor/include/host_tensor.hpp
--- a/host/host_tensor/include/host_tensor_generator.hpp
+++ b/host/host_tensor/include/host_tensor_generator.hpp
--- a/profiler/CMakeLists.txt
+++ b/profiler/CMakeLists.txt
--- a/profiler/include/profile_reduce_impl.hpp
+++ b/profiler/include/profile_reduce_impl.hpp
--- a/profiler/src/profile_gemm_bias_relu_add.cpp
+++ b/profiler/src/profile_gemm_bias_relu_add.cpp
--- a/profiler/src/profile_reduce.cpp
+++ b/profiler/src/profile_reduce.cpp
--- a/profiler/src/profiler.cpp
+++ b/profiler/src/profiler.cpp
--- a/script/profile_reduce_no_index.sh
+++ b/script/profile_reduce_no_index.sh
--- a/script/profile_reduce_with_index.sh
+++ b/script/profile_reduce_with_index.sh