- 25 Aug, 2023 1 commit
-
-
aska-0096 authored
-
- 22 Aug, 2023 1 commit
-
-
aska-0096 authored
-
- 16 Aug, 2023 2 commits
-
-
Haocong WANG authored
New implementation fpAintB
-
aska-0096 authored
-
- 15 Aug, 2023 1 commit
-
-
aska-0096 authored
-
- 09 Aug, 2023 1 commit
-
-
aska-0096 authored
-
- 08 Aug, 2023 2 commits
-
-
Haocong WANG authored
GQA-4 example
-
aska-0096 authored
-
- 07 Aug, 2023 2 commits
-
-
Haocong WANG authored
MQA implementation
-
aska-0096 authored
-
- 03 Aug, 2023 3 commits
-
-
Haocong WANG authored
fpAintB_gemm implementation
-
aska-0096 authored
-
aska-0096 authored
-
- 01 Aug, 2023 1 commit
-
-
aska-0096 authored
-
- 28 Jul, 2023 1 commit
-
-
aska-0096 authored
-
- 25 Jul, 2023 1 commit
-
-
aska-0096 authored
-
- 20 Jul, 2023 1 commit
-
-
aska-0096 authored
-
- 07 Jul, 2023 1 commit
-
-
aska-0096 authored
-
- 26 Jun, 2023 1 commit
-
-
aska-0096 authored
-
- 25 Jun, 2023 1 commit
-
-
aska-0096 authored
-
- 20 Jun, 2023 2 commits
- 19 Jun, 2023 3 commits
- 15 Jun, 2023 1 commit
-
-
aska-0096 authored
-
- 13 Jun, 2023 5 commits
-
-
aska-0096 authored
-
aska-0096 authored
-
-
aska-0096 authored
Merge branch 'develop' of https://github.com/ROCmSoftwarePlatform/composable_kernel into e2e_kernellib
-
Haocong WANG authored
* sanity pass * sanity pass 2 * confirm significant performance regression. * turn on all instances * turn off instance format * Fix bug & tunning & format * DML meta, self_attn+cross_attn * sanity pass * remove useless flag * update tile and problem size used in AIT attention * bug fix in grouped conv supporting check
-
- 12 Jun, 2023 4 commits
-
-
Rostyslav Geyyer authored
-
Bartłomiej Kocot authored
* Add DeviceBatchedGemmMultipleD_Dl * Fix batched_gemm tests * Fix comments * test_batched_gemm_multi_d fixes * Fix args for isSupported batchedGemmMultipleDDl * Disable tests for gfx90a
-
Po Yen Chen authored
* Fix wrong pointer type * Rename type trait get_unsigned_int<> to get_carrier<> * Add 3-bytes carrier type * Add missing __device__ specifier * Rename template non-type parameter * Leave the rest byte uninitialized * Avoid invoking (host) STL algorithms * Remove unnecessary 'inline' specifier * Extract common logic out as helper method * Hide dummy member function * Add missing __device__ specifier
-
ltqin authored
* add check input parameter * add instance for vector load = 1 * move gerneral instance to first pos * fix read bias code * regular code for bias load --------- Co-authored-by:zjing14 <zhangjing14@gmail.com>
-
- 08 Jun, 2023 1 commit
-
-
carlushuang authored
-
- 07 Jun, 2023 1 commit
-
-
Illia Silin authored
* update dockerfile to build rocm5.6 rc3 * fix couple of docker issues
-
- 02 Jun, 2023 1 commit
-
-
Illia Silin authored
-
- 01 Jun, 2023 2 commits
-
-
who who who authored
-
Po Yen Chen authored
* Remove M/N/KPad local variables * Use M/N/KPad to name padded lengths * Replace duplicated local variable by parameters * Rename variables M/N/KRaw to M/N/K * Move AK0/BK0 compute logic into GridwiseGemm * Use macro to shorten code * Move CalculateGridSize() logic into GridwiseGemm * Add comment to credit the implementation source * Reuse the existing implementation * Remove no-longer used data members * Remove elementwise-op objects from interfaces * Reserve kernel arg as whole object in interfaces * Remove redundant data member * Make 3rd type parameter optional * Remove unnesscary type parameters * Remove no-longer used descriptor-creation methods * Move kernel arg type definition into GridwiseGemm * Add macro to switch between code sections * Move argument field computing logic into device op side * Make utility method 'static' * Declare special methods * Unify MakeArgument() usage * Adapt the new GridwiseGemm interface * Push-down class 'GridwiseGemm::Argument' fields * Remove no-longer used methods * Add unused parameters * Force copying parameters in 'Embed' ctor * Remove no-longer used descriptors * Fallback change on BaseArgument * Remove macro 'INTEGER_DIVIDE_CEIL' * Make variable naming more consistent * Make sure methods are only invoked on right place * Remove tailing underscore in public attribute name * Remove necessary methods * Hide computing logic of derived attributes * Make new 'Embed' ctor only available for device code * Make sure 'Embed' type args are not references * Move check for karg.K into CheckValidity() * Remove more integer division logic form device code * Undo changes on Embed * Separate 'Problem' concept out from 'Argument' * Add overloaded version of __builtin_amdgcn_readfirstlane() * Remove 'static' specifiers * Remove more 'static' specifier * Replace unsigne char by std::byte * Add 'const' specifier to never changing variable * Add 'inline' specifier to funcion definition * Share same name for kernel interfaces * Fix wrong boundar calculation logic * Leave the third template arg for compatibility * Remove unnecessary parameters * Fix wrong error message (for type name) * Create descriptor on device side * Fix wrong debug message * Remove no-longer used data members * Rename type trait * Remove std:: qualifier from standard types * Replace 'size_t' by 'unsigned' * Use type alias to hint usage * Replace static_for<> by ordinary 'for' loop * Reject unsupported argument * Rename readfirstlane() to amd_wave_read_first_lane() * Rename file readfirstlance.hpp as amd_wave_read_first_lane.hpp * Update function calls * Reorder statements * Re-format files --------- Co-authored-by:zjing14 <zhangjing14@gmail.com>
-