1. 08 Jun, 2023 1 commit
  2. 02 Jun, 2023 1 commit
  3. 01 Jun, 2023 2 commits
    • who who who's avatar
      e2ebc8e7
    • Po Yen Chen's avatar
      Simplify kernel argument of device operator Device(Batched)GemmXdl<> (#723) · 9eae73df
      Po Yen Chen authored
      
      
      * Remove M/N/KPad local variables
      
      * Use M/N/KPad to name padded lengths
      
      * Replace duplicated local variable by parameters
      
      * Rename variables M/N/KRaw to M/N/K
      
      * Move AK0/BK0 compute logic into GridwiseGemm
      
      * Use macro to shorten code
      
      * Move CalculateGridSize() logic into GridwiseGemm
      
      * Add comment to credit the implementation source
      
      * Reuse the existing implementation
      
      * Remove no-longer used data members
      
      * Remove elementwise-op objects from interfaces
      
      * Reserve kernel arg as whole object in interfaces
      
      * Remove redundant data member
      
      * Make 3rd type parameter optional
      
      * Remove unnesscary type parameters
      
      * Remove no-longer used descriptor-creation methods
      
      * Move kernel arg type definition into GridwiseGemm
      
      * Add macro to switch between code sections
      
      * Move argument field computing logic into device op side
      
      * Make utility method 'static'
      
      * Declare special methods
      
      * Unify MakeArgument() usage
      
      * Adapt the new GridwiseGemm interface
      
      * Push-down class 'GridwiseGemm::Argument' fields
      
      * Remove no-longer used methods
      
      * Add unused parameters
      
      * Force copying parameters in 'Embed' ctor
      
      * Remove no-longer used descriptors
      
      * Fallback change on BaseArgument
      
      * Remove macro 'INTEGER_DIVIDE_CEIL'
      
      * Make variable naming more consistent
      
      * Make sure methods are only invoked on right place
      
      * Remove tailing underscore in public attribute name
      
      * Remove necessary methods
      
      * Hide computing logic of derived attributes
      
      * Make new 'Embed' ctor only available for device code
      
      * Make sure 'Embed' type args are not references
      
      * Move check for karg.K into CheckValidity()
      
      * Remove more integer division logic form device code
      
      * Undo changes on Embed
      
      * Separate 'Problem' concept out from 'Argument'
      
      * Add overloaded version of __builtin_amdgcn_readfirstlane()
      
      * Remove 'static' specifiers
      
      * Remove more 'static' specifier
      
      * Replace unsigne char by std::byte
      
      * Add 'const' specifier to never changing variable
      
      * Add 'inline' specifier to funcion definition
      
      * Share same name for kernel interfaces
      
      * Fix wrong boundar calculation logic
      
      * Leave the third template arg for compatibility
      
      * Remove unnecessary parameters
      
      * Fix wrong error message (for type name)
      
      * Create descriptor on device side
      
      * Fix wrong debug message
      
      * Remove no-longer used data members
      
      * Rename type trait
      
      * Remove std:: qualifier from standard types
      
      * Replace 'size_t' by 'unsigned'
      
      * Use type alias to hint usage
      
      * Replace static_for<> by ordinary 'for' loop
      
      * Reject unsupported argument
      
      * Rename readfirstlane() to amd_wave_read_first_lane()
      
      * Rename file readfirstlance.hpp as amd_wave_read_first_lane.hpp
      
      * Update function calls
      
      * Reorder statements
      
      * Re-format files
      
      ---------
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      9eae73df
  4. 31 May, 2023 2 commits
    • Illia Silin's avatar
      update copyright headers (#726) · b94fd0b2
      Illia Silin authored
      b94fd0b2
    • Po Yen Chen's avatar
      Add class type support for __builtin_amdgcn_readfirstlane() (#711) · 582e31e8
      Po Yen Chen authored
      * Add overloaded version of __builtin_amdgcn_readfirstlane()
      
      * Remove 'static' specifiers
      
      * Remove more 'static' specifier
      
      * Replace unsigne char by std::byte
      
      * Add 'const' specifier to never changing variable
      
      * Add 'inline' specifier to funcion definition
      
      * Fix wrong boundar calculation logic
      
      * Rename type trait
      
      * Remove std:: qualifier from standard types
      
      * Replace 'size_t' by 'unsigned'
      
      * Use type alias to hint usage
      
      * Replace static_for<> by ordinary 'for' loop
      
      * Rename readfirstlane() to amd_wave_read_first_lane()
      
      * Rename file readfirstlance.hpp as amd_wave_read_first_lane.hpp
      
      * Reorder statements
      582e31e8
  5. 30 May, 2023 3 commits
    • Haocong WANG's avatar
      6eef0755
    • Po Yen Chen's avatar
      Simplify kernel argument of device operator DeviceGemm_Xdl_CShuffle<> (#696) · 1344a0f2
      Po Yen Chen authored
      
      
      * Remove M/N/KPad local variables
      
      * Use M/N/KPad to name padded lengths
      
      * Replace duplicated local variable by parameters
      
      * Rename variables M/N/KRaw to M/N/K
      
      * Move AK0/BK0 compute logic into GridwiseGemm
      
      * Use macro to shorten code
      
      * Move CalculateGridSize() logic into GridwiseGemm
      
      * Add comment to credit the implementation source
      
      * Reuse the existing implementation
      
      * Remove no-longer used data members
      
      * Remove elementwise-op objects from interfaces
      
      * Reserve kernel arg as whole object in interfaces
      
      * Remove redundant data member
      
      * Make 3rd type parameter optional
      
      * Remove unnesscary type parameters
      
      * Remove no-longer used descriptor-creation methods
      
      * Move kernel arg type definition into GridwiseGemm
      
      * Add macro to switch between code sections
      
      * Move argument field computing logic into device op side
      
      * Make utility method 'static'
      
      * Declare special methods
      
      * Unify MakeArgument() usage
      
      * Adapt the new GridwiseGemm interface
      
      * Push-down class 'GridwiseGemm::Argument' fields
      
      * Remove no-longer used methods
      
      * Add unused parameters
      
      * Force copying parameters in 'Embed' ctor
      
      * Remove no-longer used descriptors
      
      * Fallback change on BaseArgument
      
      * Remove macro 'INTEGER_DIVIDE_CEIL'
      
      * Make variable naming more consistent
      
      * Make sure methods are only invoked on right place
      
      * Remove tailing underscore in public attribute name
      
      * Remove necessary methods
      
      * Hide computing logic of derived attributes
      
      * Make new 'Embed' ctor only available for device code
      
      * Make sure 'Embed' type args are not references
      
      * Move check for karg.K into CheckValidity()
      
      * Remove more integer division logic form device code
      
      * Undo changes on Embed
      
      * Separate 'Problem' concept out from 'Argument'
      
      * Share same name for kernel interfaces
      
      * Reject unsupported argument
      
      ---------
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      1344a0f2
    • Adam Osewski's avatar
      Multiple fixes to GroupedGemm+SplitK (#707) · 70e4eb56
      Adam Osewski authored
      
      
      * Add license header.
      
      * Reduce number of logged output. Add constant initialization.
      
      * Add functional tests for grouped_gemm with different kbatch value.
      
      * Add debug log informations + remove unused code.
      
      * Don't pass kbatch to CalculateKPadded.
      
      * Turn on logging in grouped gemm and gemm splitk profiler
      
      * Debug: limit number of test cases to run;
      
      * Log more information and initialize with constant value.
      
      * Turn on DEBUG_LOG
      
      * Add more debug log informations.
      
      * Limit the number of instances to compile.
      
      * Use GridwiseGemmPipeline
      
      * Use KBatch to calculate K0
      
      * Multiple DebugLog messages.
      
      * Unit tests for multiple KBatch values.
      
      * Refactoring
      
      * Disable logging
      * extract out of if statement KBatch update.
      
      * Uncomment instances.
      
      * Disable DebugLog.
      
      * Use Kbatch when calculate KPadded.
      
      * Fix CGridDesc padding.
      
      * Use available helper functions.
      
      * Uncomment code commented for debuggin.
      
      * Remove unnecessary debug log messages.
      
      * Uncomment previously commented code for debug purposes.
      
      * Add KBatch info to profiler output summary log.
      
      * Add gtests for gemm splitk using ckProfiler API.
      
      * Add more test-cases for different data layout.
      
      * Add more test cases for gemm splitk
      
      * Remove old test.
      
      * Unit tests for MKNK ggemm interface.
      
      * Fix and add more unit-tests.
      
      * Constepxr everything!
      
      * Increase error threshold for fp16 and splitk.
      
      Since we're using fp16 atomic add for splitk there's a
      known precision loss.
      
      ---------
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      70e4eb56
  6. 24 May, 2023 2 commits
    • Illia Silin's avatar
      Clean-up the headers (#713) · ac9e01e2
      Illia Silin authored
      
      
      * fix headers for gpu instances
      
      * remove unused headers
      
      ---------
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      ac9e01e2
    • rocking's avatar
      Pool3d fwd (#697) · 76ec0089
      rocking authored
      * Expand the base class of pool2d, prepare to share base class with pool3d
      
      * Add pool3d device op
      
      * Add pool3d f16 example
      
      * Refactor the base class. implement generic pooling in the future
      
      * clang format
      
      * get original index in max pooling
      
      * Add outputindex to base class
      
      * Fix dimension
      
      * Add pooling instance
      
      * Use indexType instead
      
      * Remove useless header
      
      * Extract IndexDataType to template
      
      * Extract pooling reference code
      
      * clang format
      
      * clang format
      
      * Fix typo
      
      * Add tensor stride
      
      * Add missing header
      
      * Add index stride and output stride
      
      * Refine naming
      
      * Add type to base class
      
      * Rename file
      
      * Use proper size
      
      * Fix typo
      
      * Refine naming
      
      * Modify the argument into vector.
      
      * Add max pool profiler
      
      * Refine naming
      
      * Support f32 pool
      
      * Fix typo
      
      * Add avg pool2d fwd in profiler
      
      * clang format
      
      * Rename AccDatatype to ComputeDatatype
      
      * Fix init
      
      * test pool
      
      * Extract variable
      
      * Add client example
      
      * Check the pooling dim
      
      * clang format
      
      * Connect argv and arg_parser
      
      * Add found check
      
      * Remove useless header
      
      * Refine naming
      
      * Adjust the order of device_pool_fwd
      76ec0089
  7. 23 May, 2023 1 commit
    • Illia Silin's avatar
      Enable gemm_dl and other kernels on Navi3x. (#714) · d821d1e5
      Illia Silin authored
      * enable dl kernels on navi3
      
      * do not build xdl tests and examples on Navi
      
      * run tests before building everything on jenkins
      
      * disable gemm_bilinear on gfx1030
      
      * add gpu targets to installer on Navi
      
      * put tests in the same order as before
      
      * reduce the number of navi targets in CI
      
      * build CI installed for gfx940 as well
      
      * only build for MI300 during QA runs
      d821d1e5
  8. 11 May, 2023 1 commit
  9. 04 May, 2023 1 commit
    • Rostyslav Geyyer's avatar
      Optimize bf16 conversion (#664) · b076a02a
      Rostyslav Geyyer authored
      * Add TypeConvert class and start refactoring
      
      * Refactor TypeConvert as a struct
      
      * Get back to template functions type_convert
      
      * Add a type_convert_bf16_rtn, set rtz as default
      
      * Clean up
      
      * Add UnaryConvertPrecision struct for high-precision workloads
      
      * Format
      
      * Update type_convert to UnaryConvert on threadwise level
      
      * Update UnaryConvertPrecision
      
      * Format
      
      * Fix chmod
      
      * Add a flag to pick converion method
      
      * Format
      
      * Remove the added flag
      
      * Merge elementwise op with type conversion
      
      * Move type_convert to elemwise op, update the op
      
      * Update type_convert_precision -> bf16_convert_rtn
      
      * Clean up
      
      * Update comments
      
      * Update the CK_WORKAROUND_DENORM_FIX flag handling
      
      * Update the unneeded op to work but warn user
      
      * Remove the message
      
      * Use a PassThrough instead of ConvertBF16RTN to calcaulate reference
      
      * Format
      
      * Add missing include
      b076a02a
  10. 03 May, 2023 2 commits
  11. 28 Apr, 2023 1 commit
  12. 26 Apr, 2023 1 commit
  13. 24 Apr, 2023 1 commit
  14. 22 Apr, 2023 1 commit
  15. 16 Apr, 2023 2 commits
  16. 11 Apr, 2023 2 commits
  17. 10 Apr, 2023 1 commit
    • rocking5566's avatar
      Groupnorm + swish external api (#668) · ed3a2e52
      rocking5566 authored
      * Rename to proper naming
      
      * Add example of groupnorm + swish
      
      * Extract duplicate code in example
      
      * Add groupnorm + swish instances
      
      * Ractor instance generation, split into multiple cpp file
      
      * Add external api and client example
      
      * Refine profiler message
      
      * Use ck math version of exp
      
      * Refine problem size in example
      
      * Add host version of exp
      ed3a2e52
  18. 07 Apr, 2023 1 commit
  19. 30 Mar, 2023 2 commits
  20. 29 Mar, 2023 2 commits
  21. 23 Mar, 2023 1 commit
  22. 22 Mar, 2023 1 commit
  23. 20 Mar, 2023 2 commits
  24. 15 Mar, 2023 5 commits
  25. 10 Mar, 2023 1 commit