1. 31 May, 2023 1 commit
  2. 28 Oct, 2022 1 commit
    • Qianfeng's avatar
      Batchnorm-forward implemented using welford method to calculate variance (#403) · 7fa892e6
      Qianfeng authored
      * Update to the batchnorm-forward API and base class
      
      * Fix leeked header including in gridwise_set_buffer_value.hpp
      
      * Add kernels and device file for batchnorm-forward welford supporting both blockwise and multi-block reduction
      
      * Update to the batchnorm-forward example to use the new batchnorm-forward device interface
      
      * Change the batchnorm-forward reference to use sequential welford method
      
      * Change to assign the workspace into four buffers in the host layer
      
      * Use GetReduceCountPerThread functor to replace the initial count for Blockwise and Multiblock welford
      
      * Tiny correction and remove un-used file under example/34_batchnorm
      
      * Renaming in the kernel arguments
      
      * Explicitly use ck::math::sqrt in batchnorm-forward kernels
      
      * Add some comments to some kernels
      
      * Tiny fix
      
      * Generalize the data types in reference_batchnorm_forward_nhwc_c
      
      * Use ck::ignore to mark un-used parameters
      
      * Move Ge...
      7fa892e6
  3. 15 Aug, 2022 1 commit
    • Qianfeng's avatar
      Batchnorm-forward and Batchnorm-infer Implemented using generic kernels (#320) · 53ea4713
      Qianfeng authored
      * Implement multiple-reduction in one kernel (kernels, device ops, examples)
      
      * Add generic elementwise kernel and device interface
      
      * Add generator for normal-distributed data initialization
      
      * Add host refer implementation of batchnorm-forward and batchnorm-infer
      
      * Add examples for implementing batchnorm-forward and batchnorm-infer using generic kernels
      
      * Remove un-needed including in batchnorm example
      
      * Renaming generic_elementwise to elementiwise in kernel and device classes/functions
      
      * Change in gemm_layernorm examples to use DeviceElementwise instead of Device5AryElementwise
      
      * Change in exampe 19_binary_elementwise to use DeviceElementwise instead of DeviceBinaryElementwise
      
      * Change in device_cgemm_4gemm_xdl_cshuffle.hpp to use kernel_elementwise instead of kernel_binary_elementwise
      
      * Add DeviceElementwiseBase and use it in device_normalize_instance.cpp
      
      * Removing and renaming files
      
      * Update to synchronize gemm_layernorm client example to the generic element-wise device op API
      
      * Update to synchronize with the latest headers directory and HostTensorDescriptor interface renaming
      
      * Merge two static member functions in device_elementwise.hpp
      
      * Remove unary_elementwise_1d kernel and device
      53ea4713