1. 10 May, 2023 1 commit
  2. 27 Apr, 2023 2 commits
  3. 22 Apr, 2023 1 commit
  4. 21 Apr, 2023 1 commit
    • Haocong WANG's avatar
      fix layernorm, reduction Ops (#4) · 394dbf83
      Haocong WANG authored
      
      
      * [Navi3x] Fix Gridwise_multiple_d operation (#649)
      
      * Add CMake Option "USE_OPT_NAVI3X"
      
      * fix bug
      
      * standardize docs (#655)
      
      * Separate bibtex requirement from rocm-docs-core (#656)
      
      * separate bibtex requirement from rocm-docs-core
      
      * point requirements to source rocm-docs-core repo
      
      * Add CMake Option "USE_OPT_NAVI3X" (#647)
      
      * Add CMake Option "USE_OPT_NAVI3X"
      
      * remove navi3x opt compile option from cmake script
      
      * Conv + quantization + tanh  (#645)
      
      * Rename file. Prepare to support another activation
      
      * Add comment for quantization
      
      * Extract out_elementop
      
      * Add tanh example
      
      * Add conv + bias + tanh quantization instance
      
      * Add missing parameter
      
      * Refine cmake
      
      * Add external api and client example
      
      * Extract variable in example
      
      * Fix the comment
      
      ---------
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      
      * Add a denorm test fix (#603)
      
      * Add type_convert implementations for bf16
      
      * Add the fix for conv_fwd
      
      * Add the fix for conv_bwd_data
      
      * Add the fix for conv_bwd_weight
      
      * Format
      
      * Format
      
      * Another format
      
      * Add a macro to use workaround on MI200 only
      
      * Format
      
      ---------
      Co-authored-by: default avatarRosty Geyyer <rosty.geyyer@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      
      * simplify karg in device/grid of split-k op (#644)
      
      * simplify karg in device/grid split-k op
      
      * fix mk_kn_mn instances
      
      * add more instances
      
      * use name from tensor layout
      
      * fix 3rd dword of buffer source descriptor (#659)
      
      * add fp64 instances (#658)
      Co-authored-by: default avatarroot <root@ctr-ubbsmc15.amd.com>
      
      * Issue #666: Revert "simplify karg in device/grid of split-k op (#644)" (#665)
      
      This reverts commit bb5530af
      
      .
      
      * Groupnorm + swish external api (#668)
      
      * Rename to proper naming
      
      * Add example of groupnorm + swish
      
      * Extract duplicate code in example
      
      * Add groupnorm + swish instances
      
      * Ractor instance generation, split into multiple cpp file
      
      * Add external api and client example
      
      * Refine profiler message
      
      * Use ck math version of exp
      
      * Refine problem size in example
      
      * Add host version of exp
      
      * add a marco to turn on/off denorm fix (off by default) (#673)
      
      * add a marco to turn off denorm fix by default
      
      * expose the marco
      
      ---------
      Co-authored-by: default avatarroot <root@ctr-ubbsmc15.amd.com>
      
      * fixed quant example (#672)
      Co-authored-by: default avatarroot <root@ctr-ubbsmc15.amd.com>
      
      * Add dependabot config and pin rocm-docs-core (#663)
      
      * [gtest] suppress unsafe buffer warn (#670)
      
      ref: https://github.com/ROCmSoftwarePlatform/MIOpen/pull/1912
      
      
      
      * Add memory index guard in wmma device ops (#667)
      
      * Add more macros to turn on/off denorm fix (#678)
      Co-authored-by: default avatarRosty Geyyer <rosty.geyyer@amd.com>
      
      * Fix a typo (#676)
      
      * Add (#677)
      
      * Allow using ROCm release candidate compilers. (#679)
      
      * enable use of rocm5.5 release candidate 4
      
      * upgrade to ROCM5.5 RC5
      
      * try fix the PUB_KEY error, remove the cmake-data package
      
      * upgrade to latest cmake version
      
      * use private dockerhub repo for rocm5.5 rc5
      
      * add missing bracket
      
      * Disable SkipLDS & Align AIT api
      
      * Update dependabot config (#682)
      Co-authored-by: default avatarsamjwu <samjwu@users.noreply.github.com>
      
      * update attn api
      
      * solve type_convert bug + enable
      
      ---------
      Co-authored-by: default avatarSam Wu <sjwu@ualberta.ca>
      Co-authored-by: default avatarSam Wu <sam.wu2@amd.com>
      Co-authored-by: default avatarrocking5566 <ChunYu.Lai@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      Co-authored-by: default avatarRostyslav Geyyer <46627076+geyyer@users.noreply.github.com>
      Co-authored-by: default avatarRosty Geyyer <rosty.geyyer@amd.com>
      Co-authored-by: default avatarcarlushuang <carlus.huang@amd.com>
      Co-authored-by: default avatarroot <root@ctr-ubbsmc15.amd.com>
      Co-authored-by: default avatarJun Liu <Liu.Jun@amd.com>
      Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
      Co-authored-by: default avatarsamjwu <samjwu@users.noreply.github.com>
      Co-authored-by: default avatarhaocwang <Haocong.WANG@amd.com>
      394dbf83
  5. 20 Apr, 2023 1 commit
  6. 19 Apr, 2023 1 commit
    • Haocong WANG's avatar
      Merge origin dev (#2) · cad3212d
      Haocong WANG authored
      
      
      * [Navi3x] Fix Gridwise_multiple_d operation (#649)
      
      * Add CMake Option "USE_OPT_NAVI3X"
      
      * fix bug
      
      * standardize docs (#655)
      
      * Separate bibtex requirement from rocm-docs-core (#656)
      
      * separate bibtex requirement from rocm-docs-core
      
      * point requirements to source rocm-docs-core repo
      
      * Add CMake Option "USE_OPT_NAVI3X" (#647)
      
      * Add CMake Option "USE_OPT_NAVI3X"
      
      * remove navi3x opt compile option from cmake script
      
      * Conv + quantization + tanh  (#645)
      
      * Rename file. Prepare to support another activation
      
      * Add comment for quantization
      
      * Extract out_elementop
      
      * Add tanh example
      
      * Add conv + bias + tanh quantization instance
      
      * Add missing parameter
      
      * Refine cmake
      
      * Add external api and client example
      
      * Extract variable in example
      
      * Fix the comment
      
      ---------
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      
      * Add a denorm test fix (#603)
      
      * Add type_convert implementations for bf16
      
      * Add the fix for conv_fwd
      
      * Add the fix for conv_bwd_data
      
      * Add the fix for conv_bwd_weight
      
      * Format
      
      * Format
      
      * Another format
      
      * Add a macro to use workaround on MI200 only
      
      * Format
      
      ---------
      Co-authored-by: default avatarRosty Geyyer <rosty.geyyer@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      
      * simplify karg in device/grid of split-k op (#644)
      
      * simplify karg in device/grid split-k op
      
      * fix mk_kn_mn instances
      
      * add more instances
      
      * use name from tensor layout
      
      * fix 3rd dword of buffer source descriptor (#659)
      
      * add fp64 instances (#658)
      Co-authored-by: default avatarroot <root@ctr-ubbsmc15.amd.com>
      
      * Issue #666: Revert "simplify karg in device/grid of split-k op (#644)" (#665)
      
      This reverts commit bb5530af
      
      .
      
      * Groupnorm + swish external api (#668)
      
      * Rename to proper naming
      
      * Add example of groupnorm + swish
      
      * Extract duplicate code in example
      
      * Add groupnorm + swish instances
      
      * Ractor instance generation, split into multiple cpp file
      
      * Add external api and client example
      
      * Refine profiler message
      
      * Use ck math version of exp
      
      * Refine problem size in example
      
      * Add host version of exp
      
      * add a marco to turn on/off denorm fix (off by default) (#673)
      
      * add a marco to turn off denorm fix by default
      
      * expose the marco
      
      ---------
      Co-authored-by: default avatarroot <root@ctr-ubbsmc15.amd.com>
      
      * fixed quant example (#672)
      Co-authored-by: default avatarroot <root@ctr-ubbsmc15.amd.com>
      
      * Add dependabot config and pin rocm-docs-core (#663)
      
      * [gtest] suppress unsafe buffer warn (#670)
      
      ref: https://github.com/ROCmSoftwarePlatform/MIOpen/pull/1912
      
      
      
      * Add memory index guard in wmma device ops (#667)
      
      * Add more macros to turn on/off denorm fix (#678)
      Co-authored-by: default avatarRosty Geyyer <rosty.geyyer@amd.com>
      
      * Fix a typo (#676)
      
      * Add (#677)
      
      * Allow using ROCm release candidate compilers. (#679)
      
      * enable use of rocm5.5 release candidate 4
      
      * upgrade to ROCM5.5 RC5
      
      * try fix the PUB_KEY error, remove the cmake-data package
      
      * upgrade to latest cmake version
      
      * use private dockerhub repo for rocm5.5 rc5
      
      * add missing bracket
      
      * add vector load check
      
      * solve conflicts
      
      ---------
      Co-authored-by: default avatarSam Wu <sjwu@ualberta.ca>
      Co-authored-by: default avatarSam Wu <sam.wu2@amd.com>
      Co-authored-by: default avatarrocking5566 <ChunYu.Lai@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      Co-authored-by: default avatarRostyslav Geyyer <46627076+geyyer@users.noreply.github.com>
      Co-authored-by: default avatarRosty Geyyer <rosty.geyyer@amd.com>
      Co-authored-by: default avatarcarlushuang <carlus.huang@amd.com>
      Co-authored-by: default avatarroot <root@ctr-ubbsmc15.amd.com>
      Co-authored-by: default avatarJun Liu <Liu.Jun@amd.com>
      Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
      cad3212d
  7. 07 Apr, 2023 1 commit
  8. 29 Mar, 2023 1 commit
  9. 27 Mar, 2023 1 commit
  10. 23 Mar, 2023 2 commits
  11. 15 Mar, 2023 2 commits
    • rocking5566's avatar
      gemm/Conv xdlops + dlops quantization (#625) · 16dc18e0
      rocking5566 authored
      
      
      * Add conv perlayer quantization
      
      * Add gemm_dlops quantization
      
      * Support int8 for innerproduct
      
      * Refine gemm dlops int8 kernel parameter
      
      * Support gfx908(MI100) and gfx90a(MI200)
      
      * clang-format
      
      * Rename example number
      
      * Support different layout for d tensor
      
      * Add conv dlops perchannel quantization example
      
      * Move to example 40
      
      * Extract the common code for different platform (dlops and xdlops)
      
      * Move ot subfolder. Prepare to add other op of quantization
      
      * Refine the quantization instance library
      
      * Add conv dl instances and client example
      
      * Remove unnecessary type
      
      * Add gemm quantization instance
      
      * Add external api and client example
      
      * Refine num_bytes
      
      * Separete different layout to different cpp
      
      * Add more xdl instances
      
      * Revert "Remove unnecessary type"
      
      This reverts commit 820869182f6a8f62b2c9004101ba6bf76b96be14.
      
      * Remove CShuffleDataType in dlops
      Let acc and CShuffleDataType be the same in xdlops
      
      ---------
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      16dc18e0
    • Adam Osewski's avatar
      Device Op GroupedGemmMultipleD + example fp16 (#633) · a2d5ca8e
      Adam Osewski authored
      
      
      * Pass shared mem pointer as pointer to void.
      
      * Device Op GroupedGEMM Multiple D
      
      * Example for grouped gemm multiple d.
      
      * Add MI200 to supported archs.
      
      ---------
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      a2d5ca8e
  12. 09 Mar, 2023 1 commit
  13. 06 Mar, 2023 8 commits
  14. 01 Mar, 2023 1 commit
  15. 28 Feb, 2023 2 commits
  16. 27 Feb, 2023 2 commits
  17. 24 Feb, 2023 1 commit
  18. 22 Feb, 2023 1 commit
    • Rostyslav Geyyer's avatar
      Add Grouped Conv Backward Weight on Navi21 for ResNet50. (#505) · 246ceee4
      Rostyslav Geyyer authored
      
      
      * Add DeviceOp and examples
      
      * Format DeviceOp template arguments
      
      * Remove bf16 example
      
      * Format
      
      * Format
      
      * Update MakeABCGridDescriptor_A_K0_M_K1_B_K0_N_K1_C_M_N
      
      * Refactor argument preparation
      
      * Update conv_bwd_weight_dl to grouped_conv_bwd_weight_dl
      
      * Rename device op file
      
      * Update include directive in the example file
      
      * Update descriptor preparation for grouped op
      
      * Update the argument
      
      * Update batch handling
      
      * Add gridwise gemm supporting batched input
      
      * Update blockwise indexing, working version
      
      * Update copyright year
      
      * Update check if argument is supported
      
      * Refactor and make consistent with xdl examples
      
      * Update check if argument is supported
      
      * Add changelog entry
      
      * Added comments on Dl op split_k>1 support
      
      ---------
      Co-authored-by: default avatarRosty Geyyer <rosty.geyyer@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      246ceee4
  19. 16 Feb, 2023 4 commits
  20. 15 Feb, 2023 4 commits
    • zjing14's avatar
      Add contraction_fp64 example (#570) · 24c9ee1d
      zjing14 authored
      
      
      * add contraction_bilinear
      
      * add contraction_scale_xdl_fp64
      
      * reduce tile size to avoid register spill
      
      ---------
      Co-authored-by: default avatarroot <root@ctr-ubbsmc16.amd.com>
      24c9ee1d
    • rocking5566's avatar
      Improve normalization (#580) · 6a6163a3
      rocking5566 authored
      * Sync the order of type string with template parameter
      
      * Add more instances
      
      * Check the vector size and remove redundant var
      
      * Extract var to static, prepare to separate sweep once kernel
      
      * Separate sweeponce flow and optimize the flow
      
      * 1. Rename AccDatatype in normalization to computeData
      2. Rename AccElementwiseOperation to YElementwiseOperation in normalization
      
      * Remove useless code
      
      * Update naive variance kernel
      
      * Refine string
      
      * Fix typo
      
      * Support naive variance for device_normalization
      
      * Check the blocksize
      
      * Share the VGPR of x and y
      
      * Share the VGPR of gamma and beta
      
      * Add more instances
      
      * Support fp16 sqrt for experiment
      
      * Add CHANGELOG
      
      * Fix typo
      
      * clang-format
      6a6163a3
    • Haocong WANG's avatar
      [Navi3x] Add Device Operations (#567) · 0cfda84d
      Haocong WANG authored
      * wmma_op + unit test
      
      * add arch limitation to wmma test
      
      * change arch limitation
      
      * Refactor + Add all type unit test(int4 compile failed)
      
      * Add f32_16x16x16_bf16 unit test
      
      * tempsave
      
      * tempsave
      
      * tempsave
      
      * runtime bug, cannot find symbol
      
      * workaround for incorrect HIP warpSize return value
      
      * debugging
      
      * tempsave
      
      * Correctness OK, waiting for optimization
      
      * Tidy up + format
      
      * temp save
      
      * temp save, reproduce the v_bfi_b32 issue
      
      * add inline asm for wmmaop test
      
      * tidy up
      
      * clean some debug purpose code
      
      * discard some codes
      
      * clang format
      
      * clang format
      
      * compiler issue fixed + increase tile size
      
      * navi3x_multipleD+example
      
      * temp save
      
      * workable
      
      * batchedgemm[OK], groupconv[debug]
      
      * groupconv: Sanity check[OK], Performance[Bad]
      
      * navi3x_groupconv_need_optimization
      
      * format
      
      * Add arch limitation to all wmma examples
      
      * fix bug: example30 input conv args
      0cfda84d
    • Adam Osewski's avatar
      Conv3D FWD BWD WRW fp16 fp32 client examples (#559) · e9fd1228
      Adam Osewski authored
      
      
      * Conv3d bwd weight client example.
      
      * Update year in license
      
      * Convolution bwd data 3D fp16/fp32 client example.
      
      * Client example for convnd fwd fp16 fp32
      
      * clang-format
      
      * Review remarks.
      
      * Fix compiler err.
      
      * Update data layout to standard one.
      
      * Add conv 3d fwd NDHWGC instances
      
      * clang-format
      
      * Conv3d fwd NDHWGC instances.
      
      ---------
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      e9fd1228
  21. 14 Feb, 2023 1 commit
  22. 11 Feb, 2023 1 commit