"docs/archive_en_US/Tutorial/SetupNniDeveloperEnvironment.md" did not exist on "1d893dda8755fc2e29167584cd1fc18a44872c1b"
  1. 26 Jul, 2022 3 commits
  2. 25 Jul, 2022 6 commits
  3. 24 Jul, 2022 1 commit
  4. 22 Jul, 2022 2 commits
  5. 21 Jul, 2022 3 commits
    • Chao Liu's avatar
      refactor · c71e140d
      Chao Liu authored
      c71e140d
    • zjing14's avatar
      Grouped Gemm device with multiD grid (#319) · 7959dad5
      zjing14 authored
      
      
      * replace gridwise_v2r3 with multiD
      
      * adjust parameters
      
      * add instances
      
      * fixed test_grouped_gemm
      
      * fix standalone softmax race condition around blockwise reduction
      
      * fixed ci
      
      * fixed comment: remove redundant workspace
      
      * use instanceFactory
      
      * add test layout
      
      * add empty Ds
      
      * add bias example
      
      * use array
      
      * sperate examples
      Co-authored-by: default avatarAnthony Chang <ac.chang@outlook.com>
      7959dad5
    • Chao Liu's avatar
      refactor · 12585e57
      Chao Liu authored
      12585e57
  6. 20 Jul, 2022 4 commits
  7. 19 Jul, 2022 2 commits
  8. 18 Jul, 2022 8 commits
  9. 17 Jul, 2022 3 commits
  10. 14 Jul, 2022 3 commits
  11. 13 Jul, 2022 2 commits
    • rocking5566's avatar
      Standalone layernorm (#315) · 7f216620
      rocking5566 authored
      
      
      * Implement layernorm kernel and deviceOp
      
      * verify gpu kernel with host code
      
      * 1. Separate gamma aand beta from affine
      2. Check if argument is valid
      
      * clean
      
      * Sync the naming
      
      * Support sweep once mode if we can put k dimension data inside one block
      
      * [What] Get length from upper length.
      [Why] if we get length directly, we may get length after padding.
      
      * We only use one block in K dimension.
      Hence, we can simplify the indexing of global R/W.
      
      * Use 1d descriptor for gamma and beta
      
      * Add accElementwiseOp
      
      * Extract layernorm host code
      
      * Support different YVectorDim in GridwiseLayernorm
      
      * Rename XSrcVectorDim to XYSrcVectorDim. Because we use same parameter in deviceOp
      
      * Gamma and beta can share the VGPR.
      
      * Add test for fp32 and fp16
      
      * Fix bug of concurrency and add test case which may fail orignally
      
      * Propagate NaN for layernorm
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      7f216620
    • Chao Liu's avatar
      update reference conv · 0cb8ba92
      Chao Liu authored
      0cb8ba92
  12. 12 Jul, 2022 3 commits