1. 21 Jul, 2022 3 commits
  2. 20 Jul, 2022 5 commits
  3. 19 Jul, 2022 2 commits
  4. 18 Jul, 2022 13 commits
  5. 17 Jul, 2022 4 commits
  6. 15 Jul, 2022 1 commit
  7. 14 Jul, 2022 5 commits
  8. 13 Jul, 2022 4 commits
    • rocking5566's avatar
      Standalone layernorm (#315) · 7f216620
      rocking5566 authored
      
      
      * Implement layernorm kernel and deviceOp
      
      * verify gpu kernel with host code
      
      * 1. Separate gamma aand beta from affine
      2. Check if argument is valid
      
      * clean
      
      * Sync the naming
      
      * Support sweep once mode if we can put k dimension data inside one block
      
      * [What] Get length from upper length.
      [Why] if we get length directly, we may get length after padding.
      
      * We only use one block in K dimension.
      Hence, we can simplify the indexing of global R/W.
      
      * Use 1d descriptor for gamma and beta
      
      * Add accElementwiseOp
      
      * Extract layernorm host code
      
      * Support different YVectorDim in GridwiseLayernorm
      
      * Rename XSrcVectorDim to XYSrcVectorDim. Because we use same parameter in deviceOp
      
      * Gamma and beta can share the VGPR.
      
      * Add test for fp32 and fp16
      
      * Fix bug of concurrency and add test case which may fail orignally
      
      * Propagate NaN for layernorm
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      7f216620
    • Daming Feng's avatar
      minor fix in gemm client example (#328) · c5620ed0
      Daming Feng authored
      c5620ed0
    • Illia Silin's avatar
      Add switch between compilers, make 9110 compiler default, add full QA scripts. (#322) · 39acaea3
      Illia Silin authored
      * adding scripts for full perf test suite
      
      * uncomment the sql queries
      
      * fix typo and chmod a+x for scripts
      
      * dos2unix for all new scripts
      
      * disable verification in full performance test
      
      * fix reduction scripts, add gfrouped_gemm hotfix
      
      * fix the grouped_gemm hotfix and only run reduction for fp16
      
      * change compiler flag syntax
      
      * fix syntax
      
      * add predefinition of dockerArgs
      
      * avoid redefinitions of dockerArgs
      
      * add blank space at the end of dockerArgs
      
      * try to build with release compiler
      
      * adding spaces inside if condition
      
      * limit the number of threads for building 9110 compiler
      
      * change the way HIP_CLANG_PATH is set
      
      * remove the export command
      
      * change the conditional ENV syntax
      
      * set HIP_CLANG_PATH at docker run time
      
      * update scripts for full qa
      
      * enable the sql write query
      
      * fix typo
      
      * remove a comment from a script
      39acaea3
    • Chao Liu's avatar
      update reference conv · 0cb8ba92
      Chao Liu authored
      0cb8ba92
  9. 12 Jul, 2022 3 commits