Performance Update (2025.04.22) (#71)
* Fix benchmark script * Performance optimization for compute-bound cases * Add new testcase (s_k = 16384) * Update README.md * Update comment * Update README.md * Add the deep-dive blog * Add background color for MLA Kernel Sched.drawio.svg * Use relative path for the schedule image * Move flash_mla.h to kernels/params.h
Showing
This diff is collapsed.
csrc/kernels/config.h
0 → 100644
csrc/kernels/mla_combine.cu
0 → 100644
csrc/kernels/mla_combine.h
0 → 100644
csrc/kernels/splitkv_mla.cu
0 → 100644
This diff is collapsed.
csrc/kernels/splitkv_mla.h
0 → 100644
csrc/kernels/traits.h
0 → 100644
csrc/named_barrier.h
deleted
100644 → 0
csrc/softmax.h
deleted
100644 → 0
csrc/utils.h
deleted
100644 → 0
Please register or sign in to comment