1. 16 Jun, 2023 1 commit
  2. 15 Jun, 2023 2 commits
  3. 12 Jun, 2023 1 commit
    • Po Yen Chen's avatar
      Fix incomplete object size (=4n + 3) support of amd_wave_read_first_lane() (#738) · 7c24654c
      Po Yen Chen authored
      * Fix wrong pointer type
      
      * Rename type trait get_unsigned_int<> to get_carrier<>
      
      * Add 3-bytes carrier type
      
      * Add missing __device__ specifier
      
      * Rename template non-type parameter
      
      * Leave the rest byte uninitialized
      
      * Avoid invoking (host) STL algorithms
      
      * Remove unnecessary 'inline' specifier
      
      * Extract common logic out as helper method
      
      * Hide dummy member function
      
      * Add missing __device__ specifier
      7c24654c
  4. 09 Jun, 2023 5 commits
  5. 08 Jun, 2023 1 commit
  6. 31 May, 2023 2 commits
    • Illia Silin's avatar
      update copyright headers (#726) · b94fd0b2
      Illia Silin authored
      b94fd0b2
    • Po Yen Chen's avatar
      Add class type support for __builtin_amdgcn_readfirstlane() (#711) · 582e31e8
      Po Yen Chen authored
      * Add overloaded version of __builtin_amdgcn_readfirstlane()
      
      * Remove 'static' specifiers
      
      * Remove more 'static' specifier
      
      * Replace unsigne char by std::byte
      
      * Add 'const' specifier to never changing variable
      
      * Add 'inline' specifier to funcion definition
      
      * Fix wrong boundar calculation logic
      
      * Rename type trait
      
      * Remove std:: qualifier from standard types
      
      * Replace 'size_t' by 'unsigned'
      
      * Use type alias to hint usage
      
      * Replace static_for<> by ordinary 'for' loop
      
      * Rename readfirstlane() to amd_wave_read_first_lane()
      
      * Rename file readfirstlance.hpp as amd_wave_read_first_lane.hpp
      
      * Reorder statements
      582e31e8
  7. 24 May, 2023 3 commits
  8. 23 May, 2023 7 commits
  9. 18 May, 2023 1 commit
  10. 16 May, 2023 1 commit
  11. 15 May, 2023 2 commits
  12. 12 May, 2023 5 commits
  13. 11 May, 2023 3 commits
  14. 08 May, 2023 4 commits
  15. 04 May, 2023 1 commit
    • Rostyslav Geyyer's avatar
      Optimize bf16 conversion (#664) · b076a02a
      Rostyslav Geyyer authored
      * Add TypeConvert class and start refactoring
      
      * Refactor TypeConvert as a struct
      
      * Get back to template functions type_convert
      
      * Add a type_convert_bf16_rtn, set rtz as default
      
      * Clean up
      
      * Add UnaryConvertPrecision struct for high-precision workloads
      
      * Format
      
      * Update type_convert to UnaryConvert on threadwise level
      
      * Update UnaryConvertPrecision
      
      * Format
      
      * Fix chmod
      
      * Add a flag to pick converion method
      
      * Format
      
      * Remove the added flag
      
      * Merge elementwise op with type conversion
      
      * Move type_convert to elemwise op, update the op
      
      * Update type_convert_precision -> bf16_convert_rtn
      
      * Clean up
      
      * Update comments
      
      * Update the CK_WORKAROUND_DENORM_FIX flag handling
      
      * Update the unneeded op to work but warn user
      
      * Remove the message
      
      * Use a PassThrough instead of ConvertBF16RTN to calcaulate reference
      
      * Format
      
      * Add missing include
      b076a02a
  16. 28 Apr, 2023 1 commit