Commits · 66e610768de48c56d6bfc5e082c5e1b5a4e4530d · gaoqiong / composable_kernel

28 Jul, 2023 1 commit
- Sanity pass. · 66e61076
  aska-0096 authored Jul 28, 2023
  
  66e61076
25 Jul, 2023 1 commit
- fpAintB kernel compile pass · 0c51a35e
  aska-0096 authored Jul 25, 2023
  
  0c51a35e
20 Jul, 2023 1 commit
- Temp save · febd76e4
  aska-0096 authored Jul 20, 2023
  
  febd76e4
31 May, 2023 1 commit
- update copyright headers (#726) · b94fd0b2
  Illia Silin authored May 31, 2023
  
  b94fd0b2
10 May, 2023 1 commit
- 1. Enable 2-stage global Prefetch ( May cause VGPR spilling) · 0bb08f4b
  aska-0096 authored May 10, 2023
```
2. Enable FP16 accumulator blockwise_gemm
```
  0bb08f4b
06 Mar, 2023 1 commit
- tempsave · 579f84c6
  aska-0096 authored Mar 06, 2023
  
  579f84c6
24 Feb, 2023 1 commit
- Mat-A LDS Bypass sanity pass · d4adc71a
  aska-0096 authored Feb 24, 2023
  
  d4adc71a
15 Dec, 2022 2 commits
- discard some codes · 3941bd1f
  aska-0096 authored Dec 15, 2022
  
  3941bd1f
- clean some debug purpose code · 2a0e5439
  aska-0096 authored Dec 15, 2022
  
  2a0e5439
09 Dec, 2022 1 commit
- Correctness OK, waiting for optimization · 9bd44685
  aska-0096 authored Dec 09, 2022
  
  9bd44685
02 Dec, 2022 1 commit
- debugging · 43a20997
  aska-0096 authored Dec 02, 2022
  
  43a20997
11 Nov, 2022 1 commit

Rangify constructor of HostTensorDescriptor & Tensor<> (#445) · 4a2a56c2

Po Yen Chen authored Nov 12, 2022

* Rangify STL algorithms

This commit adapts rangified std::copy(), std::fill() & std::transform()

* Rangify check_err()

By rangifying check_err(), we can not only compare values between
std::vector<>s, but also compare any ranges which have same value
type.

* Allow constructing Tensor<> like a HostTensorDescriptor

* Simplify Tensor<> object construction logics

* Remove more unnecessary 'HostTensorDescriptor' objects

* Re-format example code

* Re-write more HostTensorDescriptor ctor call

4a2a56c2

10 Nov, 2022 1 commit
- Rangify FillUniformDistributionIntegerValue<> (#443) · 6f0564f0
  Po Yen Chen authored Nov 11, 2022
```
Allow passing forward range to its call operator
```
  6f0564f0
01 Sep, 2022 1 commit

add more datatype to gemm+gemm and conv+conv example (#397) · 204ef976

Chao Liu authored Sep 01, 2022

* refactor

* refactor

* adding int4/int8/fp16/bf16 for conv+conv and gemm+gemm

* adding int4/int8/fp16/bf16 for conv+conv and gemm+gemm

* clean

204ef976

23 Aug, 2022 1 commit

Add examples of Gemm (data type: int4) (#367) · fa2d894b

Po Yen Chen authored Aug 24, 2022

* Add GEMM examples for int4

Currently the source files are just copied from int8 examples

* Re-use pre-defined alias in int4 exmples

* Distinguish user-side type from kernel-side type

* Add int4_t support for check_err()

* Allow conversion between Tensor<> specializations

* Re-format source files

* Use different type for host tensors

* Re-use CopyAsType<>() to implement copy ctor

* Re-use element-wise operation type alias

* Fix typo in alias names

* Complete the int4 examples

* Add constraint to Tensor<> templated methods

* Add type traits 'is_signed_integral<>'

* Add type constraints for integer version check_err<>()

* Allow comparing different-sized integral types in check_err()

* Check converted Tensor<int4_t> with golden Tensor<int8_t>

* Remove constraint of Tensor<>::CopyAsType()

* Avoid compilation error while disabling ck::int4_t support

* Remove debug messages

* Add #error directive to prevent compile sources with wrong setting

* Simplify tensor usages in examples

* Add constraint to check_err() input reference type

* Align design with other PR

* Use ""_uz to simplify example code

* Avoid too much generalizing check_err()

* Re-format GEMM instance template arguments

* Extract int4 example common codes

* Sort include directives

* Move #include directives into new header

* Move common codes together

* Re-format template argument in example code

* Reuse same implementation code for most of GEMM examples

* Re-format common.hpp

* Unify structured comment in examples

* Use reinterpret_cast<>() for cross-type pointer conversion

* Revert "Add type traits 'is_signed_integral<>'"

This reverts commit f2c148efaedf42c8ee66032dac6d13a1003b0f3a.

* Allow unsigned integer arguments for check_err()

* Fix compilation error in check_err()

* Remove unnecessary copy ctor for Tensor<>

* Mark Tensor<> special member functions as 'default'

* Use more strict condition to add code in examples

* Fix wrong program return value of GEMM examples

* Handle the case while user specify all the strides

* Fix never-ran examples

* Exit successfully if GEMM instance does not support given problem

* Add missing 'else' keyword

* Re-format CMakeLists.txt

* Add wrapper function to hide value conversion while copying memory

* Add new DeviceMem API to copy memory

* Use new DeviceMem API to implement examples

* Revert "Add new DeviceMem API to copy memory"

This reverts commit 3f190b0779ceedf7aaf0b380712fda0518de72c1.

* Add conversion ctor for Tensor<>

* Write Tensor<> conversion logics explicitly in example code

* Convert Tensor<> values after transfer data to host

fa2d894b