1. 15 Dec, 2022 1 commit
    • Rostyslav Geyyer's avatar
      Add padding device_gemm_add_add_fastgelu_xdl_c_shuffle instances to enable... · 9a1f2475
      Rostyslav Geyyer authored
      Add padding device_gemm_add_add_fastgelu_xdl_c_shuffle instances to enable arbitrary problem size (#535)
      
      * Add padding device_gemm_add_add_fastgelu_xdl_c_shuffle instances
      
      * Add padding device_gemm_add_fastgelu_xdl_c_shuffle instances
      
      * Add gemm_add_fastgelu profiler impl
      
      * Add padding device_gemm_fastgelu_xdl_c_shuffle instances
      
      * Add gemm_fastgelu profiler impl
      9a1f2475
  2. 01 Dec, 2022 1 commit
    • Po Yen Chen's avatar
      Modularize ckProfiler operations (#514) · 8784a72e
      Po Yen Chen authored
      
      
      * Re-structure ckProfiler source files
      
      * Rename profiler.cpp to main.cpp
      
      * Modularize ckProfiler operations
      
      * Add description for profiler operations
      
      * Use longer name to avoid name collision
      
      * Use macro to delay expansion
      
      * Use std::move() to avoid object copying
      
      * Prohibit users from calling dtor
      
      * Use macro to eliminate redundant code
      
      * Make friend function hidden
      
      * Add missing include directive <iostream>
      
      * Fix wrong include directives
      
      * Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test
      Co-authored-by: default avatarQianfeng Zhang <Qianfeng.Zhang@amd.com>
      8784a72e