"include/ck/ck.hpp" did not exist on "f95267f166927bee1d806cefbdc142b2e35f640f"
  1. 09 Mar, 2023 1 commit
  2. 22 Feb, 2023 1 commit
    • Rostyslav Geyyer's avatar
      Add Grouped Conv Backward Weight on Navi21 for ResNet50. (#505) · 246ceee4
      Rostyslav Geyyer authored
      
      
      * Add DeviceOp and examples
      
      * Format DeviceOp template arguments
      
      * Remove bf16 example
      
      * Format
      
      * Format
      
      * Update MakeABCGridDescriptor_A_K0_M_K1_B_K0_N_K1_C_M_N
      
      * Refactor argument preparation
      
      * Update conv_bwd_weight_dl to grouped_conv_bwd_weight_dl
      
      * Rename device op file
      
      * Update include directive in the example file
      
      * Update descriptor preparation for grouped op
      
      * Update the argument
      
      * Update batch handling
      
      * Add gridwise gemm supporting batched input
      
      * Update blockwise indexing, working version
      
      * Update copyright year
      
      * Update check if argument is supported
      
      * Refactor and make consistent with xdl examples
      
      * Update check if argument is supported
      
      * Add changelog entry
      
      * Added comments on Dl op split_k>1 support
      
      ---------
      Co-authored-by: default avatarRosty Geyyer <rosty.geyyer@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      246ceee4
  3. 15 Feb, 2023 1 commit
    • rocking5566's avatar
      Improve normalization (#580) · 6a6163a3
      rocking5566 authored
      * Sync the order of type string with template parameter
      
      * Add more instances
      
      * Check the vector size and remove redundant var
      
      * Extract var to static, prepare to separate sweep once kernel
      
      * Separate sweeponce flow and optimize the flow
      
      * 1. Rename AccDatatype in normalization to computeData
      2. Rename AccElementwiseOperation to YElementwiseOperation in normalization
      
      * Remove useless code
      
      * Update naive variance kernel
      
      * Refine string
      
      * Fix typo
      
      * Support naive variance for device_normalization
      
      * Check the blocksize
      
      * Share the VGPR of x and y
      
      * Share the VGPR of gamma and beta
      
      * Add more instances
      
      * Support fp16 sqrt for experiment
      
      * Add CHANGELOG
      
      * Fix typo
      
      * clang-format
      6a6163a3
  4. 08 Feb, 2023 1 commit