- 21 Oct, 2024 16 commits
-
-
rocking authored
-
rocking authored
-
rocking authored
-
carlushuang authored
-
rocking authored
-
rocking authored
-
rocking authored
-
letaoqin authored
-
rocking authored
-
carlushuang authored
-
carlushuang authored
-
carlushuang authored
-
carlushuang authored
-
carlushuang authored
-
Po Yen Chen authored
* Use smaller width for lse_accum dist tensor * Update pipeline comment * Fix wrong distribution for lse_accum * Remove duplicate dim in lse_accum dist encoding * Decide fmha splitkv combine kernel kBlockSize by kM0 * Remove assumption of MPerThread=1 * Add log<4> & log<8> specialization * Enlarge occupancy array * Fix vector size for small tile * Add support for kMaxSplits=8 * Re-format gemm.hpp * Use 16x16x16 warp gemm for fwd_splitkv * Centralize policy code changes * Leave fp8/bf8 tile settings unchanged
-
carlushuang authored
-
- 20 Oct, 2024 5 commits
-
-
carlushuang authored
-
carlushuang authored
-
carlushuang authored
-
carlushuang authored
-
carlushuang authored
-
- 18 Oct, 2024 2 commits
-
-
Haocong WANG authored
-
Illia Silin authored
-
- 17 Oct, 2024 4 commits
- 16 Oct, 2024 13 commits