"profiler/vscode:/vscode.git/clone" did not exist on "c254e5abd2b01b9d5a2ba3fe4531e178623396d0"
  1. 28 Oct, 2022 1 commit
    • Qianfeng's avatar
      Batchnorm-forward implemented using welford method to calculate variance (#403) · 7fa892e6
      Qianfeng authored
      
      
      * Update to the batchnorm-forward API and base class
      
      * Fix leeked header including in gridwise_set_buffer_value.hpp
      
      * Add kernels and device file for batchnorm-forward welford supporting both blockwise and multi-block reduction
      
      * Update to the batchnorm-forward example to use the new batchnorm-forward device interface
      
      * Change the batchnorm-forward reference to use sequential welford method
      
      * Change to assign the workspace into four buffers in the host layer
      
      * Use GetReduceCountPerThread functor to replace the initial count for Blockwise and Multiblock welford
      
      * Tiny correction and remove un-used file under example/34_batchnorm
      
      * Renaming in the kernel arguments
      
      * Explicitly use ck::math::sqrt in batchnorm-forward kernels
      
      * Add some comments to some kernels
      
      * Tiny fix
      
      * Generalize the data types in reference_batchnorm_forward_nhwc_c
      
      * Use ck::ignore to mark un-used parameters
      
      * Move GetReduceCountPerThread functor codes from kernel to device
      
      * Remove some un-used codes in device_batchnorm_forward_impl.hpp
      
      * Tiny fix in batchnorm_forward example
      
      * Move GetReduceCountPerThread() to welford_helper.hpp
      
      * Use seperate data type for Scale and Bias
      
      * Renaming in device Op
      
      * Tiny fix in forward example
      
      * Updata to batchnorm-infer (type spliting, renaming)
      
      * Add time and bandwidth measurement to the batchnorm-forward example
      
      * Add support of elementwise operation for batchnorm forward output
      
      * Reduce object copying by passing object as reference type
      
      * Tiny change for performance
      
      * Updates for performance again
      
      * Some Renamings
      
      * Add GetActualVariance template parameter for ThreadwiseWelfordMerge
      
      * Tiny update in reference batchnorm forward nhwc/c
      
      * Move batchnorm multiblock kernel files to grid/batchnorm_multiblock sub-directory
      
      * Fuse mean and bias in the normalization calculation
      Co-authored-by: default avatarroot <root@dc-smc-18.amd.com>
      Co-authored-by: default avatarrocking5566 <ChunYu.Lai@amd.com>
      7fa892e6
  2. 13 Oct, 2022 1 commit
    • Adam Osewski's avatar
      Refactor device op implementations into `impl` subdirectory. (#420) · 30480288
      Adam Osewski authored
      
      
      * Move kernel implementation files under impl directory.
      
      * Update examples paths.
      
      * Update device kernel impl include paths.
      
      * Update tensor operation instances include paths.
      
      * Update profiler and tests include paths.
      
      * Clang-format
      
      * Update include paths for batched gemm reduce
      
      * Refactor UnitTest ConvNDBwdWeight.
      
      * Refactor fwd and bwd data convND UT.
      
      * Fix used test macro.
      
      * Fix include path.
      
      * Fix include paths.
      
      * Fix include paths in profiler and tests.
      
      * Fix include paths.
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      30480288
  3. 15 Aug, 2022 1 commit
    • Qianfeng's avatar
      Batchnorm-forward and Batchnorm-infer Implemented using generic kernels (#320) · 53ea4713
      Qianfeng authored
      * Implement multiple-reduction in one kernel (kernels, device ops, examples)
      
      * Add generic elementwise kernel and device interface
      
      * Add generator for normal-distributed data initialization
      
      * Add host refer implementation of batchnorm-forward and batchnorm-infer
      
      * Add examples for implementing batchnorm-forward and batchnorm-infer using generic kernels
      
      * Remove un-needed including in batchnorm example
      
      * Renaming generic_elementwise to elementiwise in kernel and device classes/functions
      
      * Change in gemm_layernorm examples to use DeviceElementwise instead of Device5AryElementwise
      
      * Change in exampe 19_binary_elementwise to use DeviceElementwise instead of DeviceBinaryElementwise
      
      * Change in device_cgemm_4gemm_xdl_cshuffle.hpp to use kernel_elementwise instead of kernel_binary_elementwise
      
      * Add DeviceElementwiseBase and use it in device_normalize_instance.cpp
      
      * Removing and renaming files
      
      * Update to synchronize gemm_layernorm client example to the generic element-wise device op API
      
      * Update to synchronize with the latest headers directory and HostTensorDescriptor interface renaming
      
      * Merge two static member functions in device_elementwise.hpp
      
      * Remove unary_elementwise_1d kernel and device
      53ea4713