1. 28 Jul, 2023 1 commit
  2. 25 Jul, 2023 1 commit
  3. 20 Jul, 2023 1 commit
  4. 19 Jun, 2023 2 commits
  5. 15 Jun, 2023 1 commit
  6. 13 Jun, 2023 2 commits
    • aska-0096's avatar
      deprecate inline asm wmma · d44f6660
      aska-0096 authored
      d44f6660
    • Haocong WANG's avatar
      AIT Attention API refactor (#8) · efee4541
      Haocong WANG authored
      * sanity pass
      
      * sanity pass 2
      
      * confirm significant performance regression.
      
      * turn on all instances
      
      * turn off instance format
      
      * Fix bug & tunning & format
      
      * DML meta, self_attn+cross_attn
      
      * sanity pass
      
      * remove useless flag
      
      * update tile and problem size used in AIT attention
      
      * bug fix in grouped conv supporting check
      efee4541
  7. 12 Jun, 2023 1 commit
  8. 01 Jun, 2023 1 commit
    • Po Yen Chen's avatar
      Simplify kernel argument of device operator Device(Batched)GemmXdl<> (#723) · 9eae73df
      Po Yen Chen authored
      
      
      * Remove M/N/KPad local variables
      
      * Use M/N/KPad to name padded lengths
      
      * Replace duplicated local variable by parameters
      
      * Rename variables M/N/KRaw to M/N/K
      
      * Move AK0/BK0 compute logic into GridwiseGemm
      
      * Use macro to shorten code
      
      * Move CalculateGridSize() logic into GridwiseGemm
      
      * Add comment to credit the implementation source
      
      * Reuse the existing implementation
      
      * Remove no-longer used data members
      
      * Remove elementwise-op objects from interfaces
      
      * Reserve kernel arg as whole object in interfaces
      
      * Remove redundant data member
      
      * Make 3rd type parameter optional
      
      * Remove unnesscary type parameters
      
      * Remove no-longer used descriptor-creation methods
      
      * Move kernel arg type definition into GridwiseGemm
      
      * Add macro to switch between code sections
      
      * Move argument field computing logic into device op side
      
      * Make utility method 'static'
      
      * Declare special methods
      
      * Unify MakeArgument() usage
      
      * Adapt the new GridwiseGemm interface
      
      * Push-down class 'GridwiseGemm::Argument' fields
      
      * Remove no-longer used methods
      
      * Add unused parameters
      
      * Force copying parameters in 'Embed' ctor
      
      * Remove no-longer used descriptors
      
      * Fallback change on BaseArgument
      
      * Remove macro 'INTEGER_DIVIDE_CEIL'
      
      * Make variable naming more consistent
      
      * Make sure methods are only invoked on right place
      
      * Remove tailing underscore in public attribute name
      
      * Remove necessary methods
      
      * Hide computing logic of derived attributes
      
      * Make new 'Embed' ctor only available for device code
      
      * Make sure 'Embed' type args are not references
      
      * Move check for karg.K into CheckValidity()
      
      * Remove more integer division logic form device code
      
      * Undo changes on Embed
      
      * Separate 'Problem' concept out from 'Argument'
      
      * Add overloaded version of __builtin_amdgcn_readfirstlane()
      
      * Remove 'static' specifiers
      
      * Remove more 'static' specifier
      
      * Replace unsigne char by std::byte
      
      * Add 'const' specifier to never changing variable
      
      * Add 'inline' specifier to funcion definition
      
      * Share same name for kernel interfaces
      
      * Fix wrong boundar calculation logic
      
      * Leave the third template arg for compatibility
      
      * Remove unnecessary parameters
      
      * Fix wrong error message (for type name)
      
      * Create descriptor on device side
      
      * Fix wrong debug message
      
      * Remove no-longer used data members
      
      * Rename type trait
      
      * Remove std:: qualifier from standard types
      
      * Replace 'size_t' by 'unsigned'
      
      * Use type alias to hint usage
      
      * Replace static_for<> by ordinary 'for' loop
      
      * Reject unsupported argument
      
      * Rename readfirstlane() to amd_wave_read_first_lane()
      
      * Rename file readfirstlance.hpp as amd_wave_read_first_lane.hpp
      
      * Update function calls
      
      * Reorder statements
      
      * Re-format files
      
      ---------
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      9eae73df
  9. 31 May, 2023 1 commit
  10. 30 May, 2023 1 commit
  11. 24 May, 2023 1 commit
    • rocking's avatar
      Pool3d fwd (#697) · 76ec0089
      rocking authored
      * Expand the base class of pool2d, prepare to share base class with pool3d
      
      * Add pool3d device op
      
      * Add pool3d f16 example
      
      * Refactor the base class. implement generic pooling in the future
      
      * clang format
      
      * get original index in max pooling
      
      * Add outputindex to base class
      
      * Fix dimension
      
      * Add pooling instance
      
      * Use indexType instead
      
      * Remove useless header
      
      * Extract IndexDataType to template
      
      * Extract pooling reference code
      
      * clang format
      
      * clang format
      
      * Fix typo
      
      * Add tensor stride
      
      * Add missing header
      
      * Add index stride and output stride
      
      * Refine naming
      
      * Add type to base class
      
      * Rename file
      
      * Use proper size
      
      * Fix typo
      
      * Refine naming
      
      * Modify the argument into vector.
      
      * Add max pool profiler
      
      * Refine naming
      
      * Support f32 pool
      
      * Fix typo
      
      * Add avg pool2d fwd in profiler
      
      * clang format
      
      * Rename AccDatatype to ComputeDatatype
      
      * Fix init
      
      * test pool
      
      * Extract variable
      
      * Add client example
      
      * Check the pooling dim
      
      * clang format
      
      * Connect argv and arg_parser
      
      * Add found check
      
      * Remove useless header
      
      * Refine naming
      
      * Adjust the order of device_pool_fwd
      76ec0089
  12. 23 May, 2023 1 commit
    • Illia Silin's avatar
      Enable gemm_dl and other kernels on Navi3x. (#714) · d821d1e5
      Illia Silin authored
      * enable dl kernels on navi3
      
      * do not build xdl tests and examples on Navi
      
      * run tests before building everything on jenkins
      
      * disable gemm_bilinear on gfx1030
      
      * add gpu targets to installer on Navi
      
      * put tests in the same order as before
      
      * reduce the number of navi targets in CI
      
      * build CI installed for gfx940 as well
      
      * only build for MI300 during QA runs
      d821d1e5
  13. 19 May, 2023 5 commits
  14. 18 May, 2023 1 commit
  15. 15 May, 2023 1 commit
    • Bartłomiej Kocot's avatar
      Add contraction profiler and tests (#701) · 642d5e91
      Bartłomiej Kocot authored
      * Add contraction profiler and tests
      
      * Build and style fixes
      
      * Allow to use any elementwise operator for ref_contraction
      
      * Introduce profile_contraction_scale and profile_contraction_bilinear
      
      * Make ref_contraction generic and extend interface tests
      
      * Stylistic minor fixes
      
      * Extend test_contraction_interface
      642d5e91
  16. 11 May, 2023 1 commit
  17. 10 May, 2023 1 commit
  18. 03 May, 2023 1 commit
  19. 28 Apr, 2023 1 commit
  20. 27 Apr, 2023 2 commits
  21. 24 Apr, 2023 2 commits
    • Adam Osewski's avatar
      Grouped Gemm + SplitK + simplified Kernel Args (#669) · 8bb2bb4a
      Adam Osewski authored
      
      
      * simplify karg in device/grid split-k op
      
      * fix mk_kn_mn instances
      
      * add more instances
      
      * B2C with 3D grid for KSplit
      
      * Remove unused code.
      
      * Use default B2C (3D grid) in grid gemm v2r4r2.
      
      * Device gemm splitk use B2C map.
      
      * Device GroupedGemmXdlSplitKCShuffle
      
      * Example for GroupedGemm Xdl SplitK
      
      * Introduce Device GroupedGemmSplitK
      
      * Fix updating kbatch size.
      
      * Add instance mk-nk-mn
      
      * Enable set kbatch in profiler.
      
      * Add GGemmSplitK mk-kn-mn instances
      
      * Add more instances & split into multiple files.
      
      * minor fix
      
      * tuning
      
      * clean
      
      * disabled failed instances
      
      * use pipe v2
      
      * Ignore arg on not supported arch.
      
      * fix warning
      
      ---------
      Co-authored-by: default avatarcarlushuang <carlus.huang@amd.com>
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      Co-authored-by: default avatarJing Zhang <jizhan@amd.com>
      Co-authored-by: default avatarroot <root@ctr-ubbsmc15.amd.com>
      8bb2bb4a
    • rocking's avatar
      Revise layout of group convolution (#675) · 3eecbfb6
      rocking authored
      * [What] Remove pure conv int8 instance
      [Why] We will never use pure int8 conv in AI, use int8 quantization instead
      
      * Change layout
      
      * Share the kernel parameter
      
      * Support more type of NHWGC for group conv
      
      * Revise client example of conv 2d, use NHWGC layout
      
      * Add instance to cmake
      
      * Revise layout of group conv quantization instance
      
      * Revise layout of external api of group conv quantization
      
      * Revise layout of group conv quantization client example
      
      * Fix clang format
      
      * Add comment to describe meaning of each parameter
      3eecbfb6
  22. 22 Apr, 2023 1 commit
  23. 21 Apr, 2023 1 commit
    • Haocong WANG's avatar
      fix layernorm, reduction Ops (#4) · 394dbf83
      Haocong WANG authored
      
      
      * [Navi3x] Fix Gridwise_multiple_d operation (#649)
      
      * Add CMake Option "USE_OPT_NAVI3X"
      
      * fix bug
      
      * standardize docs (#655)
      
      * Separate bibtex requirement from rocm-docs-core (#656)
      
      * separate bibtex requirement from rocm-docs-core
      
      * point requirements to source rocm-docs-core repo
      
      * Add CMake Option "USE_OPT_NAVI3X" (#647)
      
      * Add CMake Option "USE_OPT_NAVI3X"
      
      * remove navi3x opt compile option from cmake script
      
      * Conv + quantization + tanh  (#645)
      
      * Rename file. Prepare to support another activation
      
      * Add comment for quantization
      
      * Extract out_elementop
      
      * Add tanh example
      
      * Add conv + bias + tanh quantization instance
      
      * Add missing parameter
      
      * Refine cmake
      
      * Add external api and client example
      
      * Extract variable in example
      
      * Fix the comment
      
      ---------
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      
      * Add a denorm test fix (#603)
      
      * Add type_convert implementations for bf16
      
      * Add the fix for conv_fwd
      
      * Add the fix for conv_bwd_data
      
      * Add the fix for conv_bwd_weight
      
      * Format
      
      * Format
      
      * Another format
      
      * Add a macro to use workaround on MI200 only
      
      * Format
      
      ---------
      Co-authored-by: default avatarRosty Geyyer <rosty.geyyer@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      
      * simplify karg in device/grid of split-k op (#644)
      
      * simplify karg in device/grid split-k op
      
      * fix mk_kn_mn instances
      
      * add more instances
      
      * use name from tensor layout
      
      * fix 3rd dword of buffer source descriptor (#659)
      
      * add fp64 instances (#658)
      Co-authored-by: default avatarroot <root@ctr-ubbsmc15.amd.com>
      
      * Issue #666: Revert "simplify karg in device/grid of split-k op (#644)" (#665)
      
      This reverts commit bb5530af
      
      .
      
      * Groupnorm + swish external api (#668)
      
      * Rename to proper naming
      
      * Add example of groupnorm + swish
      
      * Extract duplicate code in example
      
      * Add groupnorm + swish instances
      
      * Ractor instance generation, split into multiple cpp file
      
      * Add external api and client example
      
      * Refine profiler message
      
      * Use ck math version of exp
      
      * Refine problem size in example
      
      * Add host version of exp
      
      * add a marco to turn on/off denorm fix (off by default) (#673)
      
      * add a marco to turn off denorm fix by default
      
      * expose the marco
      
      ---------
      Co-authored-by: default avatarroot <root@ctr-ubbsmc15.amd.com>
      
      * fixed quant example (#672)
      Co-authored-by: default avatarroot <root@ctr-ubbsmc15.amd.com>
      
      * Add dependabot config and pin rocm-docs-core (#663)
      
      * [gtest] suppress unsafe buffer warn (#670)
      
      ref: https://github.com/ROCmSoftwarePlatform/MIOpen/pull/1912
      
      
      
      * Add memory index guard in wmma device ops (#667)
      
      * Add more macros to turn on/off denorm fix (#678)
      Co-authored-by: default avatarRosty Geyyer <rosty.geyyer@amd.com>
      
      * Fix a typo (#676)
      
      * Add (#677)
      
      * Allow using ROCm release candidate compilers. (#679)
      
      * enable use of rocm5.5 release candidate 4
      
      * upgrade to ROCM5.5 RC5
      
      * try fix the PUB_KEY error, remove the cmake-data package
      
      * upgrade to latest cmake version
      
      * use private dockerhub repo for rocm5.5 rc5
      
      * add missing bracket
      
      * Disable SkipLDS & Align AIT api
      
      * Update dependabot config (#682)
      Co-authored-by: default avatarsamjwu <samjwu@users.noreply.github.com>
      
      * update attn api
      
      * solve type_convert bug + enable
      
      ---------
      Co-authored-by: default avatarSam Wu <sjwu@ualberta.ca>
      Co-authored-by: default avatarSam Wu <sam.wu2@amd.com>
      Co-authored-by: default avatarrocking5566 <ChunYu.Lai@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      Co-authored-by: default avatarRostyslav Geyyer <46627076+geyyer@users.noreply.github.com>
      Co-authored-by: default avatarRosty Geyyer <rosty.geyyer@amd.com>
      Co-authored-by: default avatarcarlushuang <carlus.huang@amd.com>
      Co-authored-by: default avatarroot <root@ctr-ubbsmc15.amd.com>
      Co-authored-by: default avatarJun Liu <Liu.Jun@amd.com>
      Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
      Co-authored-by: default avatarsamjwu <samjwu@users.noreply.github.com>
      Co-authored-by: default avatarhaocwang <Haocong.WANG@amd.com>
      394dbf83
  24. 20 Apr, 2023 1 commit
  25. 19 Apr, 2023 1 commit
    • Haocong WANG's avatar
      Merge origin dev (#2) · cad3212d
      Haocong WANG authored
      
      
      * [Navi3x] Fix Gridwise_multiple_d operation (#649)
      
      * Add CMake Option "USE_OPT_NAVI3X"
      
      * fix bug
      
      * standardize docs (#655)
      
      * Separate bibtex requirement from rocm-docs-core (#656)
      
      * separate bibtex requirement from rocm-docs-core
      
      * point requirements to source rocm-docs-core repo
      
      * Add CMake Option "USE_OPT_NAVI3X" (#647)
      
      * Add CMake Option "USE_OPT_NAVI3X"
      
      * remove navi3x opt compile option from cmake script
      
      * Conv + quantization + tanh  (#645)
      
      * Rename file. Prepare to support another activation
      
      * Add comment for quantization
      
      * Extract out_elementop
      
      * Add tanh example
      
      * Add conv + bias + tanh quantization instance
      
      * Add missing parameter
      
      * Refine cmake
      
      * Add external api and client example
      
      * Extract variable in example
      
      * Fix the comment
      
      ---------
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      
      * Add a denorm test fix (#603)
      
      * Add type_convert implementations for bf16
      
      * Add the fix for conv_fwd
      
      * Add the fix for conv_bwd_data
      
      * Add the fix for conv_bwd_weight
      
      * Format
      
      * Format
      
      * Another format
      
      * Add a macro to use workaround on MI200 only
      
      * Format
      
      ---------
      Co-authored-by: default avatarRosty Geyyer <rosty.geyyer@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      
      * simplify karg in device/grid of split-k op (#644)
      
      * simplify karg in device/grid split-k op
      
      * fix mk_kn_mn instances
      
      * add more instances
      
      * use name from tensor layout
      
      * fix 3rd dword of buffer source descriptor (#659)
      
      * add fp64 instances (#658)
      Co-authored-by: default avatarroot <root@ctr-ubbsmc15.amd.com>
      
      * Issue #666: Revert "simplify karg in device/grid of split-k op (#644)" (#665)
      
      This reverts commit bb5530af
      
      .
      
      * Groupnorm + swish external api (#668)
      
      * Rename to proper naming
      
      * Add example of groupnorm + swish
      
      * Extract duplicate code in example
      
      * Add groupnorm + swish instances
      
      * Ractor instance generation, split into multiple cpp file
      
      * Add external api and client example
      
      * Refine profiler message
      
      * Use ck math version of exp
      
      * Refine problem size in example
      
      * Add host version of exp
      
      * add a marco to turn on/off denorm fix (off by default) (#673)
      
      * add a marco to turn off denorm fix by default
      
      * expose the marco
      
      ---------
      Co-authored-by: default avatarroot <root@ctr-ubbsmc15.amd.com>
      
      * fixed quant example (#672)
      Co-authored-by: default avatarroot <root@ctr-ubbsmc15.amd.com>
      
      * Add dependabot config and pin rocm-docs-core (#663)
      
      * [gtest] suppress unsafe buffer warn (#670)
      
      ref: https://github.com/ROCmSoftwarePlatform/MIOpen/pull/1912
      
      
      
      * Add memory index guard in wmma device ops (#667)
      
      * Add more macros to turn on/off denorm fix (#678)
      Co-authored-by: default avatarRosty Geyyer <rosty.geyyer@amd.com>
      
      * Fix a typo (#676)
      
      * Add (#677)
      
      * Allow using ROCm release candidate compilers. (#679)
      
      * enable use of rocm5.5 release candidate 4
      
      * upgrade to ROCM5.5 RC5
      
      * try fix the PUB_KEY error, remove the cmake-data package
      
      * upgrade to latest cmake version
      
      * use private dockerhub repo for rocm5.5 rc5
      
      * add missing bracket
      
      * add vector load check
      
      * solve conflicts
      
      ---------
      Co-authored-by: default avatarSam Wu <sjwu@ualberta.ca>
      Co-authored-by: default avatarSam Wu <sam.wu2@amd.com>
      Co-authored-by: default avatarrocking5566 <ChunYu.Lai@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      Co-authored-by: default avatarRostyslav Geyyer <46627076+geyyer@users.noreply.github.com>
      Co-authored-by: default avatarRosty Geyyer <rosty.geyyer@amd.com>
      Co-authored-by: default avatarcarlushuang <carlus.huang@amd.com>
      Co-authored-by: default avatarroot <root@ctr-ubbsmc15.amd.com>
      Co-authored-by: default avatarJun Liu <Liu.Jun@amd.com>
      Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
      cad3212d
  26. 10 Apr, 2023 1 commit
    • rocking5566's avatar
      Groupnorm + swish external api (#668) · ed3a2e52
      rocking5566 authored
      * Rename to proper naming
      
      * Add example of groupnorm + swish
      
      * Extract duplicate code in example
      
      * Add groupnorm + swish instances
      
      * Ractor instance generation, split into multiple cpp file
      
      * Add external api and client example
      
      * Refine profiler message
      
      * Use ck math version of exp
      
      * Refine problem size in example
      
      * Add host version of exp
      ed3a2e52
  27. 07 Apr, 2023 1 commit
  28. 29 Mar, 2023 3 commits
  29. 27 Mar, 2023 1 commit
  30. 23 Mar, 2023 1 commit