• ltqin's avatar
    batched_gemm + multiple_d + gemm + multiple_d (#394) · 370efa6c
    ltqin authored
    
    
    * refactor
    
    * start
    
    * add device gemm file
    
    * add BatchStrideD0
    
    * add stridd0
    
    * add gridwise file
    
    * add d0 parameters to gridwise gemm
    
    * add c layout transformer
    
    * add d0 threadwise copy
    
    * init kernel
    
    * init kernel
    
    * regular code
    
    * nm desc put to out
    
    * kernel parameter can not use reference
    
    * host add bias+gelu
    
    * run right for bias+gelu
    
    * change AddFastGelu into another file
    
    * interface add d1 bias parameters
    
    * add d1 parameter to argument
    
    * add d1 parameter to gridwise
    
    * first all code,not verify
    
    * gelu change to relu and GetElementSpaceSize bug
    
    * add instance
    
    * start add to ckprofiler
    
    * ckprofiler finish code
    
    * change input parameter for ckProfiler
    
    * fix host bias+gelu bug
    
    * show help for ckProfiler
    
    * fix bug for lunch kernel ignore parametes
    
    * add pad and fix about bug
    
    * mutiple d0
    
    * add dynamic d0_element_op
    
    * change profiler and  instance to mutiple d0
    
    * example have 2 d0
    
    * remove some comments not using
    
    * change 2 d0 have self  parameters
    
    * change d element_op name
    
    * change class name(multiple_d)
    
    * fix bug
    
    * fix bug that don't find file
    
    * update profiler
    
    * refactor
    
    * update profiler
    
    * clean
    
    * revert example change
    
    * add gon layout
    
    * optimize parameter for gno
    
    * add gon to gemm+gemm
    
    * change helping input parameters
    
    * change to GemmPadder_v2
    
    * using ForEach
    
    * fix gb_per_sec
    Co-authored-by: default avatarChao Liu <lc.roy86@gmail.com>
    Co-authored-by: default avatarltqin <letaoqin@amd.com>
    370efa6c
CMakeLists.txt 2.14 KB