1. 12 Jun, 2022 1 commit
  2. 11 Jun, 2022 4 commits
  3. 10 Jun, 2022 7 commits
  4. 09 Jun, 2022 4 commits
  5. 08 Jun, 2022 4 commits
  6. 01 Jun, 2022 2 commits
  7. 31 May, 2022 2 commits
  8. 30 May, 2022 4 commits
    • rocking5566's avatar
      gemm + layernorm (#261) · d32a67a9
      rocking5566 authored
      * Implement reduction meand and reduction square mean
      
      * Refine file name
      
      * Add reduce mean and square mean
      
      * Fix parameter name
      
      * Add normalize device op (not implement invoker::run())
      
      * Remove epislon
      
      * Refine deviceop
      
      * Add 5ary elementwise for normalization
      
      * Add layernorm example
      
      * layerNorm verication
      
      * Fix compiler error due to merge from develop
      
      * Fix typo
      
      * Fix compile error
      
      * Refine naming
      
      * [What] Suport non pointer for invoker and argument
      [Why] Snyc coding style with gemm
      
      * Refine folder name
      
      * Refine class name
      
      * Evaluate perf of the kernel
      
      * Fix compile error
      
      * [What] Refine perf evaluation in example of gemm + reduction
      [Why] evaluation of gemm + reduction may cause verification fail. Because evaluation will not initial global memory
      
      * clang-format
      d32a67a9
    • ltqin's avatar
      fix bug after merge develop · b571256f
      ltqin authored
      b571256f
    • ltqin's avatar
      Merge branch 'develop' into bmatrix_skip_lds · f9c478e2
      ltqin authored
      f9c478e2
    • ltqin's avatar
      change file name · 7d85d04a
      ltqin authored
      7d85d04a
  9. 29 May, 2022 2 commits
  10. 28 May, 2022 4 commits
  11. 27 May, 2022 2 commits
  12. 26 May, 2022 3 commits
    • ltqin's avatar
      Add FP64 XDL GEMM built-in function (#199) · 3e6c2610
      ltqin authored
      
      
      * add intrin_mfma_f64_16x16x4f64
      
      * add example
      
      * gemm reference add double data type
      
      * chang init data
      
      * fix M N PerXdlops
      
      * fix ifdef
      
      * add comparsion config
      
      * add conv fwd example
      
      * format log out
      
      * change rc matrix egister layout
      
      * reorganize example
      
      * reorganize example 2
      
      * format,because merge develop
      
      * fix call impl adding acc data type
      
      * lost ;
      
      * add compiler warning
      
      * change example tunning parameters
      
      * add test for fp64
      
      * add instance
      
      * add test/gemm/gemm_fp64.cpp
      
      * fix get name issue
      
      * remove some tunning parameter
      
      * fix conflict
      
      * format
      
      * use integer value for GEMM test
      
      * add acc data type
      
      * remove typeid because fp16
      
      * fix streamconfig etc bug from merging develop
      
      * format
      
      * remove test_gemm_xdl_fp64
      
      * add AccDataType
      
      * AccDataType problem
      Co-authored-by: default avatarqinletao <letaoqin@amd.com>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      3e6c2610
    • Qianfeng's avatar
      Add pooling example (#257) · 97c4d486
      Qianfeng authored
      * Add example for computing LayerNorm mean and meansquare
      
      * Refactor the pool2d_fwd example and add example for float type testing
      
      * Revert "Add example for computing LayerNorm mean and meansquare"
      
      This reverts commit df52e6f9d897b00c981baa48f291450bcd60925d.
      
      * Tiny fix in pool2d_fwd_common.hpp
      97c4d486
    • ltqin's avatar
      fp16 tag · c5c32b4d
      ltqin authored
      c5c32b4d
  13. 25 May, 2022 1 commit
    • rocking5566's avatar
      Hotfix binary elementwise (for broadcast on fastest axis) (#254) · 82d7d993
      rocking5566 authored
      
      
      * Support different length of ScalarPerVector
      
      * Add example of broadcast on fastest axis
      
      * Typo
      
      * Refine fastest example
      
      * Add dimension check
      
      * Modify fastest broadcast example to 3d
      
      * Enforce users give scalarPerVector explicitely
      
      * 1. Add CscalarPerVedctor
      2. Not only broadcast on fastest need to set scalarPerVector to 1
      
      * Rename var
      
      * Move IsScalarPerVectorValid() inside IsSupportedArgument()
      
      * Separate GridDesc_M0 into A, B and C
      
      * rename var
      
      * Rename var of length
      Co-authored-by: default avatarrocking <chunylai@amd.com>
      82d7d993