"megatron/git@developer.sourcefind.cn:OpenDAS/megatron-lm.git" did not exist on "4ae54b553f93ab472994d5f58cd06907924191ec"
  1. 24 Mar, 2022 1 commit
    • Chao Liu's avatar
      Gemm+Reduce Fusion (#128) · f95267f1
      Chao Liu authored
      * add gridwise gemm v4r1
      
      * rename
      
      * adding gemm+reduce
      
      * adding gemm+reduce
      
      * adding gemm+reduce
      
      * adding gemm+reduce
      
      * use sfc in shuffling
      
      * remove hardcode
      
      * remove hardcode
      
      * refactor
      
      * fix build
      
      * adding gemm+reduce
      
      * adding gemm+reduce
      
      * adding gemm+reduce
      
      * adding gemm+reduce
      
      * adding gemm+reduce
      
      * format
      
      * clean
      
      * adding gemm+reduce
      
      * adding profiler for gemm+reduce
      
      * adding gemm+reduce profiler
      
      * fix build
      
      * clean up
      
      * gemm+reduce
      
      * fix build
      
      * update DeviceGemm_Xdl_CShuffle; update enum to enum class
      
      * clean up
      
      * add test for gemm+reduce
      
      * clean up
      
      * refactor
      
      * fix build
      
      * fix build
      f95267f1
  2. 22 Mar, 2022 1 commit
  3. 21 Mar, 2022 1 commit
  4. 11 Feb, 2022 1 commit
    • zjing14's avatar
      Batched GEMM for fp16 (#79) · b53e9d08
      zjing14 authored
      * prepare host for batched_gemm
      
      * init commit of batched kernels
      
      * fixed
      
      * refine transform with freeze
      
      * m/n padding
      
      * fixed a bug; clean
      
      * add small tiles
      
      * clean
      
      * clean code
      
      * clean code
      
      * add nt, tn, tt layout
      
      * add missing file
      
      * use StaticBufferTupleOfVector instead
      
      * add reference_batched_gemm
      
      * fixed a macro
      b53e9d08