1. 28 Jul, 2023 1 commit
  2. 25 Jul, 2023 1 commit
  3. 20 Jul, 2023 1 commit
  4. 31 May, 2023 1 commit
  5. 10 May, 2023 1 commit
  6. 06 Mar, 2023 1 commit
  7. 24 Feb, 2023 1 commit
  8. 15 Dec, 2022 2 commits
  9. 09 Dec, 2022 1 commit
  10. 02 Dec, 2022 1 commit
  11. 11 Nov, 2022 1 commit
    • Po Yen Chen's avatar
      Rangify constructor of HostTensorDescriptor & Tensor<> (#445) · 4a2a56c2
      Po Yen Chen authored
      * Rangify STL algorithms
      
      This commit adapts rangified std::copy(), std::fill() & std::transform()
      
      * Rangify check_err()
      
      By rangifying check_err(), we can not only compare values between
      std::vector<>s, but also compare any ranges which have same value
      type.
      
      * Allow constructing Tensor<> like a HostTensorDescriptor
      
      * Simplify Tensor<> object construction logics
      
      * Remove more unnecessary 'HostTensorDescriptor' objects
      
      * Re-format example code
      
      * Re-write more HostTensorDescriptor ctor call
      4a2a56c2
  12. 10 Nov, 2022 1 commit
  13. 01 Sep, 2022 1 commit
  14. 23 Aug, 2022 1 commit
    • Po Yen Chen's avatar
      Add examples of Gemm (data type: int4) (#367) · fa2d894b
      Po Yen Chen authored
      * Add GEMM examples for int4
      
      Currently the source files are just copied from int8 examples
      
      * Re-use pre-defined alias in int4 exmples
      
      * Distinguish user-side type from kernel-side type
      
      * Add int4_t support for check_err()
      
      * Allow conversion between Tensor<> specializations
      
      * Re-format source files
      
      * Use different type for host tensors
      
      * Re-use CopyAsType<>() to implement copy ctor
      
      * Re-use element-wise operation type alias
      
      * Fix typo in alias names
      
      * Complete the int4 examples
      
      * Add constraint to Tensor<> templated methods
      
      * Add type traits 'is_signed_integral<>'
      
      * Add type constraints for integer version check_err<>()
      
      * Allow comparing different-sized integral types in check_err()
      
      * Check converted Tensor<int4_t> with golden Tensor<int8_t>
      
      * Remove constraint of Tensor<>::CopyAsType()
      
      * Avoid compilation error while disabling ck::int4_t support
      
      * Remove debug messages
      
      * Add #error directive to prevent compile sources with wrong setting
      
      * Simplify tensor usages in examples
      
      * Add constraint to check_err() input reference type
      
      * Align design with other PR
      
      * Use ""_uz to simplify example code
      
      * Avoid too much generalizing check_err()
      
      * Re-format GEMM instance template arguments
      
      * Extract int4 example common codes
      
      * Sort include directives
      
      * Move #include directives into new header
      
      * Move common codes together
      
      * Re-format template argument in example code
      
      * Reuse same implementation code for most of GEMM examples
      
      * Re-format common.hpp
      
      * Unify structured comment in examples
      
      * Use reinterpret_cast<>() for cross-type pointer conversion
      
      * Revert "Add type traits 'is_signed_integral<>'"
      
      This reverts commit f2c148efaedf42c8ee66032dac6d13a1003b0f3a.
      
      * Allow unsigned integer arguments for check_err()
      
      * Fix compilation error in check_err()
      
      * Remove unnecessary copy ctor for Tensor<>
      
      * Mark Tensor<> special member functions as 'default'
      
      * Use more strict condition to add code in examples
      
      * Fix wrong program return value of GEMM examples
      
      * Handle the case while user specify all the strides
      
      * Fix never-ran examples
      
      * Exit successfully if GEMM instance does not support given problem
      
      * Add missing 'else' keyword
      
      * Re-format CMakeLists.txt
      
      * Add wrapper function to hide value conversion while copying memory
      
      * Add new DeviceMem API to copy memory
      
      * Use new DeviceMem API to implement examples
      
      * Revert "Add new DeviceMem API to copy memory"
      
      This reverts commit 3f190b0779ceedf7aaf0b380712fda0518de72c1.
      
      * Add conversion ctor for Tensor<>
      
      * Write Tensor<> conversion logics explicitly in example code
      
      * Convert Tensor<> values after transfer data to host
      fa2d894b