1. 14 Sep, 2022 1 commit
    • ltqin's avatar
      batched_gemm + multiple_d + gemm + multiple_d (#394) · 370efa6c
      ltqin authored
      
      
      * refactor
      
      * start
      
      * add device gemm file
      
      * add BatchStrideD0
      
      * add stridd0
      
      * add gridwise file
      
      * add d0 parameters to gridwise gemm
      
      * add c layout transformer
      
      * add d0 threadwise copy
      
      * init kernel
      
      * init kernel
      
      * regular code
      
      * nm desc put to out
      
      * kernel parameter can not use reference
      
      * host add bias+gelu
      
      * run right for bias+gelu
      
      * change AddFastGelu into another file
      
      * interface add d1 bias parameters
      
      * add d1 parameter to argument
      
      * add d1 parameter to gridwise
      
      * first all code,not verify
      
      * gelu change to relu and GetElementSpaceSize bug
      
      * add instance
      
      * start add to ckprofiler
      
      * ckprofiler finish code
      
      * change input parameter for ckProfiler
      
      * fix host bias+gelu bug
      
      * show help for ckProfiler
      
      * fix bug for lunch kernel ignore parametes
      
      * add pad and fix about bug
      
      * mutiple d0
      
      * add dynamic d0_element_op
      
      * change profiler and  instance to mutiple d0
      
      * example have 2 d0
      
      * remove some comments not using
      
      * change 2 d0 have self  parameters
      
      * change d element_op name
      
      * change class name(multiple_d)
      
      * fix bug
      
      * fix bug that don't find file
      
      * update profiler
      
      * refactor
      
      * update profiler
      
      * clean
      
      * revert example change
      
      * add gon layout
      
      * optimize parameter for gno
      
      * add gon to gemm+gemm
      
      * change helping input parameters
      
      * change to GemmPadder_v2
      
      * using ForEach
      
      * fix gb_per_sec
      Co-authored-by: default avatarChao Liu <lc.roy86@gmail.com>
      Co-authored-by: default avatarltqin <letaoqin@amd.com>
      370efa6c