"vscode:/vscode.git/clone" did not exist on "6166233e05b821e55bcca2bba2d9adad16347f82"
  • Qianfeng's avatar
    Batchnorm-forward implemented using welford method to calculate variance (#403) · 7fa892e6
    Qianfeng authored
    
    
    * Update to the batchnorm-forward API and base class
    
    * Fix leeked header including in gridwise_set_buffer_value.hpp
    
    * Add kernels and device file for batchnorm-forward welford supporting both blockwise and multi-block reduction
    
    * Update to the batchnorm-forward example to use the new batchnorm-forward device interface
    
    * Change the batchnorm-forward reference to use sequential welford method
    
    * Change to assign the workspace into four buffers in the host layer
    
    * Use GetReduceCountPerThread functor to replace the initial count for Blockwise and Multiblock welford
    
    * Tiny correction and remove un-used file under example/34_batchnorm
    
    * Renaming in the kernel arguments
    
    * Explicitly use ck::math::sqrt in batchnorm-forward kernels
    
    * Add some comments to some kernels
    
    * Tiny fix
    
    * Generalize the data types in reference_batchnorm_forward_nhwc_c
    
    * Use ck::ignore to mark un-used parameters
    
    * Move GetReduceCountPerThread functor codes from kernel to device
    
    * Remove some un-used codes in device_batchnorm_forward_impl.hpp
    
    * Tiny fix in batchnorm_forward example
    
    * Move GetReduceCountPerThread() to welford_helper.hpp
    
    * Use seperate data type for Scale and Bias
    
    * Renaming in device Op
    
    * Tiny fix in forward example
    
    * Updata to batchnorm-infer (type spliting, renaming)
    
    * Add time and bandwidth measurement to the batchnorm-forward example
    
    * Add support of elementwise operation for batchnorm forward output
    
    * Reduce object copying by passing object as reference type
    
    * Tiny change for performance
    
    * Updates for performance again
    
    * Some Renamings
    
    * Add GetActualVariance template parameter for ThreadwiseWelfordMerge
    
    * Tiny update in reference batchnorm forward nhwc/c
    
    * Move batchnorm multiblock kernel files to grid/batchnorm_multiblock sub-directory
    
    * Fuse mean and bias in the normalization calculation
    Co-authored-by: default avatarroot <root@dc-smc-18.amd.com>
    Co-authored-by: default avatarrocking5566 <ChunYu.Lai@amd.com>
    7fa892e6
batchnorm_common.hpp 1.89 KB