1. 14 Jul, 2023 1 commit
  2. 13 Jul, 2023 1 commit
  3. 12 Jul, 2023 1 commit
  4. 06 Jul, 2023 2 commits
    • Qianfeng's avatar
      Batchnorm splitk single kernel (#771) · 8f5cafaf
      Qianfeng authored
      * Use dim 0 as faster dim for writing mean/var/count workspace in batchnorm multiblock method [performance]
      
      * Add CountDataType as template parameter in blockwise_welford
      
      * Add utility/get_shift.hpp
      
      * Add BatchNorm multiblock single-kernel implementation
      
      * Add smem inline assembly based implementation of gms_init/gms_barrier/gms_reset for gfx90a
      
      * Renaming in device_batchnorm_forward_impl.hpp
      
      * Tiny fix in the batchnorm_fwd profiler
      
      * Revert "Add smem inline assembly based implementation of gms_init/gms_barrier/gms_reset for gfx90a"
      
      This reverts commit d16d00919c43f10759e7b4e4d112125221ed9064.
      
      * Use the old two-kernel batchnorm multiblock method for gfx1030
      
      * Use the old two-kernel batchnorm multiblock method for gfx908
      
      * use the single-kernel batchnorm multiblock method only for gfx90a
      
      * Remove get_wave_id() from utility/get_id.hpp since it is not used
      
      * Set true for testing running mean/variance and saving mean/invvariance in the examples
      
      * Fix to copy-right words
      
      * Remove un-needed including in utility/get_id.hpp
      
      * Add comments to workgroup_synchronization.hpp
      
      * Remove un-used codes in gridwise_multiblock_batchnorm_forward.hpp
      
      * Renaming in the kernels
      
      * Remove un-used kernel file
      8f5cafaf
    • Adam Osewski's avatar
      f4dfc060
  5. 05 Jul, 2023 1 commit
  6. 30 Jun, 2023 1 commit
  7. 25 Jun, 2023 1 commit
  8. 19 Jun, 2023 4 commits
    • Illia Silin's avatar
      do not build gemm-gemm and conv-conv examples for gfx94* (#761) · 645eb2f2
      Illia Silin authored
      * do not build gemm-gemm and conv-conv examples for gfx94*
      
      * do not build gemm-gemm and conv-conv examples on navi
      645eb2f2
    • rocking's avatar
      Maxpool bwd (#750) · 341ad956
      rocking authored
      * Add maxpool f32 kernel and example
      
      * Revise copyright
      
      * Add device pool bwd device op
      
      * Support f16 and bf16
      
      * Add compute datatype for reference code.
      Prevent error in bf16
      
      * Fix type error
      
      * Remove layout
      
      * Fix bf16 error
      
      * Add f16 and bf16 example
      
      * Add more operations
      
      * Implement IsSupportedArgument
      
      * Add changelog
      
      * Add comment
      
      * Add comment
      
      * Remove useless header
      
      * Move initialize of workspace to the run
      
      * Move set din zero to the device operator
      
      * Save din_length_raw
      
      * Remove useless header
      
      * Calculate gridsize according to the number of CU
      
      * Calculate gridSize according to the number of CU.
      Remove useless header
      
      * Add put example
      
      * Remove useless header
      
      * Fix CI fail
      341ad956
    • danyao12's avatar
      rename all fwd related files · 6d63c311
      danyao12 authored
      6d63c311
    • danyao12's avatar
      rename all bwd related files · 656ebe9a
      danyao12 authored
      656ebe9a
  9. 16 Jun, 2023 6 commits
  10. 15 Jun, 2023 3 commits
  11. 14 Jun, 2023 2 commits
  12. 13 Jun, 2023 1 commit
  13. 12 Jun, 2023 3 commits
  14. 01 Jun, 2023 1 commit
    • Po Yen Chen's avatar
      Simplify kernel argument of device operator Device(Batched)GemmXdl<> (#723) · 9eae73df
      Po Yen Chen authored
      
      
      * Remove M/N/KPad local variables
      
      * Use M/N/KPad to name padded lengths
      
      * Replace duplicated local variable by parameters
      
      * Rename variables M/N/KRaw to M/N/K
      
      * Move AK0/BK0 compute logic into GridwiseGemm
      
      * Use macro to shorten code
      
      * Move CalculateGridSize() logic into GridwiseGemm
      
      * Add comment to credit the implementation source
      
      * Reuse the existing implementation
      
      * Remove no-longer used data members
      
      * Remove elementwise-op objects from interfaces
      
      * Reserve kernel arg as whole object in interfaces
      
      * Remove redundant data member
      
      * Make 3rd type parameter optional
      
      * Remove unnesscary type parameters
      
      * Remove no-longer used descriptor-creation methods
      
      * Move kernel arg type definition into GridwiseGemm
      
      * Add macro to switch between code sections
      
      * Move argument field computing logic into device op side
      
      * Make utility method 'static'
      
      * Declare special methods
      
      * Unify MakeArgument() usage
      
      * Adapt the new GridwiseGemm interface
      
      * Push-down class 'GridwiseGemm::Argument' fields
      
      * Remove no-longer used methods
      
      * Add unused parameters
      
      * Force copying parameters in 'Embed' ctor
      
      * Remove no-longer used descriptors
      
      * Fallback change on BaseArgument
      
      * Remove macro 'INTEGER_DIVIDE_CEIL'
      
      * Make variable naming more consistent
      
      * Make sure methods are only invoked on right place
      
      * Remove tailing underscore in public attribute name
      
      * Remove necessary methods
      
      * Hide computing logic of derived attributes
      
      * Make new 'Embed' ctor only available for device code
      
      * Make sure 'Embed' type args are not references
      
      * Move check for karg.K into CheckValidity()
      
      * Remove more integer division logic form device code
      
      * Undo changes on Embed
      
      * Separate 'Problem' concept out from 'Argument'
      
      * Add overloaded version of __builtin_amdgcn_readfirstlane()
      
      * Remove 'static' specifiers
      
      * Remove more 'static' specifier
      
      * Replace unsigne char by std::byte
      
      * Add 'const' specifier to never changing variable
      
      * Add 'inline' specifier to funcion definition
      
      * Share same name for kernel interfaces
      
      * Fix wrong boundar calculation logic
      
      * Leave the third template arg for compatibility
      
      * Remove unnecessary parameters
      
      * Fix wrong error message (for type name)
      
      * Create descriptor on device side
      
      * Fix wrong debug message
      
      * Remove no-longer used data members
      
      * Rename type trait
      
      * Remove std:: qualifier from standard types
      
      * Replace 'size_t' by 'unsigned'
      
      * Use type alias to hint usage
      
      * Replace static_for<> by ordinary 'for' loop
      
      * Reject unsupported argument
      
      * Rename readfirstlane() to amd_wave_read_first_lane()
      
      * Rename file readfirstlance.hpp as amd_wave_read_first_lane.hpp
      
      * Update function calls
      
      * Reorder statements
      
      * Re-format files
      
      ---------
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      9eae73df
  15. 31 May, 2023 3 commits
  16. 30 May, 2023 1 commit
  17. 29 May, 2023 1 commit
  18. 24 May, 2023 1 commit
    • rocking's avatar
      Pool3d fwd (#697) · 76ec0089
      rocking authored
      * Expand the base class of pool2d, prepare to share base class with pool3d
      
      * Add pool3d device op
      
      * Add pool3d f16 example
      
      * Refactor the base class. implement generic pooling in the future
      
      * clang format
      
      * get original index in max pooling
      
      * Add outputindex to base class
      
      * Fix dimension
      
      * Add pooling instance
      
      * Use indexType instead
      
      * Remove useless header
      
      * Extract IndexDataType to template
      
      * Extract pooling reference code
      
      * clang format
      
      * clang format
      
      * Fix typo
      
      * Add tensor stride
      
      * Add missing header
      
      * Add index stride and output stride
      
      * Refine naming
      
      * Add type to base class
      
      * Rename file
      
      * Use proper size
      
      * Fix typo
      
      * Refine naming
      
      * Modify the argument into vector.
      
      * Add max pool profiler
      
      * Refine naming
      
      * Support f32 pool
      
      * Fix typo
      
      * Add avg pool2d fwd in profiler
      
      * clang format
      
      * Rename AccDatatype to ComputeDatatype
      
      * Fix init
      
      * test pool
      
      * Extract variable
      
      * Add client example
      
      * Check the pooling dim
      
      * clang format
      
      * Connect argv and arg_parser
      
      * Add found check
      
      * Remove useless header
      
      * Refine naming
      
      * Adjust the order of device_pool_fwd
      76ec0089
  19. 23 May, 2023 1 commit
    • Illia Silin's avatar
      Enable gemm_dl and other kernels on Navi3x. (#714) · d821d1e5
      Illia Silin authored
      * enable dl kernels on navi3
      
      * do not build xdl tests and examples on Navi
      
      * run tests before building everything on jenkins
      
      * disable gemm_bilinear on gfx1030
      
      * add gpu targets to installer on Navi
      
      * put tests in the same order as before
      
      * reduce the number of navi targets in CI
      
      * build CI installed for gfx940 as well
      
      * only build for MI300 during QA runs
      d821d1e5
  20. 18 May, 2023 2 commits
  21. 17 May, 2023 2 commits
  22. 15 May, 2023 1 commit
    • Bartłomiej Kocot's avatar
      Add contraction profiler and tests (#701) · 642d5e91
      Bartłomiej Kocot authored
      * Add contraction profiler and tests
      
      * Build and style fixes
      
      * Allow to use any elementwise operator for ref_contraction
      
      * Introduce profile_contraction_scale and profile_contraction_bilinear
      
      * Make ref_contraction generic and extend interface tests
      
      * Stylistic minor fixes
      
      * Extend test_contraction_interface
      642d5e91