1. 06 Jul, 2023 1 commit
    • Qianfeng's avatar
      Batchnorm splitk single kernel (#771) · 8f5cafaf
      Qianfeng authored
      * Use dim 0 as faster dim for writing mean/var/count workspace in batchnorm multiblock method [performance]
      
      * Add CountDataType as template parameter in blockwise_welford
      
      * Add utility/get_shift.hpp
      
      * Add BatchNorm multiblock single-kernel implementation
      
      * Add smem inline assembly based implementation of gms_init/gms_barrier/gms_reset for gfx90a
      
      * Renaming in device_batchnorm_forward_impl.hpp
      
      * Tiny fix in the batchnorm_fwd profiler
      
      * Revert "Add smem inline assembly based implementation of gms_init/gms_barrier/gms_reset for gfx90a"
      
      This reverts commit d16d00919c43f10759e7b4e4d112125221ed9064.
      
      * Use the old two-kernel batchnorm multiblock method for gfx1030
      
      * Use the old two-kernel batchnorm multiblock method for gfx908
      
      * use the single-kernel batchnorm multiblock method only for gfx90a
      
      * Remove get_wave_id() from utility/get_id.hpp since it is not used
      
      * Set true for testing running mean/variance and saving mean/invvariance in the examples
      
      * Fix to copy-right words
      
      * Remove un-needed including in utility/get_id.hpp
      
      * Add comments to workgroup_synchronization.hpp
      
      * Remove un-used codes in gridwise_multiblock_batchnorm_forward.hpp
      
      * Renaming in the kernels
      
      * Remove un-used kernel file
      8f5cafaf
  2. 31 May, 2023 1 commit
  3. 01 Dec, 2022 1 commit
    • Po Yen Chen's avatar
      Modularize ckProfiler operations (#514) · 8784a72e
      Po Yen Chen authored
      
      
      * Re-structure ckProfiler source files
      
      * Rename profiler.cpp to main.cpp
      
      * Modularize ckProfiler operations
      
      * Add description for profiler operations
      
      * Use longer name to avoid name collision
      
      * Use macro to delay expansion
      
      * Use std::move() to avoid object copying
      
      * Prohibit users from calling dtor
      
      * Use macro to eliminate redundant code
      
      * Make friend function hidden
      
      * Add missing include directive <iostream>
      
      * Fix wrong include directives
      
      * Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test
      Co-authored-by: default avatarQianfeng Zhang <Qianfeng.Zhang@amd.com>
      8784a72e
  4. 28 Nov, 2022 1 commit
  5. 25 Nov, 2022 1 commit
    • Qianfeng's avatar
      BatchNorm forward instance/external api/profiler/tests/client example (#511) · 4e6a5575
      Qianfeng authored
      
      
      * Update to device_batchnorm_forward base class to include all template parameters for problem description
      
      * Add batchnorm forward instances and external api
      
      * Add batchnorm forward profiler module which uses the external api
      
      * Add some comments in batchnorm_forward example to explain the dimensions in lengths[]
      
      * Replace the reference_batchnorm_forward_nhwc_c by generic reference_batchnorm_forward
      
      * Improvement to the batchnorm infer base API
      
      * Add batchnorm forward client example which shows using the batchnorm forward external API
      
      * Add test for batchnorm forward
      
      * Tuning the batchnorm profiler initialized values and error threshold
      
      * Add support for bhalf_t in instances/external api/tests
      
      * Add support for int8_t in instances/external api/tests
      
      * Add support for double in instances/external api/tests
      
      * Let ScaleDataType and BiasDataType be same as XDataType and YDataType when creating instances
      
      * Checking before running best instance in batchnorm_fwd_nhwc client example
      
      * Add checking for YElementwiseOp in batchnorm_forward external API
      
      * Add more types in batchnorm forward profiler
      
      * Add more test lengths
      Co-authored-by: default avatarrocking5566 <ChunYu.Lai@amd.com>
      4e6a5575