1. 01 Dec, 2022 1 commit
    • Po Yen Chen's avatar
      Modularize ckProfiler operations (#514) · 8784a72e
      Po Yen Chen authored
      
      
      * Re-structure ckProfiler source files
      
      * Rename profiler.cpp to main.cpp
      
      * Modularize ckProfiler operations
      
      * Add description for profiler operations
      
      * Use longer name to avoid name collision
      
      * Use macro to delay expansion
      
      * Use std::move() to avoid object copying
      
      * Prohibit users from calling dtor
      
      * Use macro to eliminate redundant code
      
      * Make friend function hidden
      
      * Add missing include directive <iostream>
      
      * Fix wrong include directives
      
      * Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test
      Co-authored-by: default avatarQianfeng Zhang <Qianfeng.Zhang@amd.com>
      8784a72e
  2. 14 Sep, 2022 1 commit
    • ltqin's avatar
      batched_gemm + multiple_d + gemm + multiple_d (#394) · 370efa6c
      ltqin authored
      
      
      * refactor
      
      * start
      
      * add device gemm file
      
      * add BatchStrideD0
      
      * add stridd0
      
      * add gridwise file
      
      * add d0 parameters to gridwise gemm
      
      * add c layout transformer
      
      * add d0 threadwise copy
      
      * init kernel
      
      * init kernel
      
      * regular code
      
      * nm desc put to out
      
      * kernel parameter can not use reference
      
      * host add bias+gelu
      
      * run right for bias+gelu
      
      * change AddFastGelu into another file
      
      * interface add d1 bias parameters
      
      * add d1 parameter to argument
      
      * add d1 parameter to gridwise
      
      * first all code,not verify
      
      * gelu change to relu and GetElementSpaceSize bug
      
      * add instance
      
      * start add to ckprofiler
      
      * ckprofiler finish code
      
      * change input parameter for ckProfiler
      
      * fix host bias+gelu bug
      
      * show help for ckProfiler
      
      * fix bug for lunch kernel ignore parametes
      
      * add pad and fix about bug
      
      * mutiple d0
      
      * add dynamic d0_element_op
      
      * change profiler and  instance to mutiple d0
      
      * example have 2 d0
      
      * remove some comments not using
      
      * change 2 d0 have self  parameters
      
      * change d element_op name
      
      * change class name(multiple_d)
      
      * fix bug
      
      * fix bug that don't find file
      
      * update profiler
      
      * refactor
      
      * update profiler
      
      * clean
      
      * revert example change
      
      * add gon layout
      
      * optimize parameter for gno
      
      * add gon to gemm+gemm
      
      * change helping input parameters
      
      * change to GemmPadder_v2
      
      * using ForEach
      
      * fix gb_per_sec
      Co-authored-by: default avatarChao Liu <lc.roy86@gmail.com>
      Co-authored-by: default avatarltqin <letaoqin@amd.com>
      370efa6c