• Shengyu Liu's avatar
    Performance Update (2025.04.22) (#71) · c2067be3
    Shengyu Liu authored
    * Fix benchmark script
    
    * Performance optimization for compute-bound cases
    
    * Add new testcase (s_k = 16384)
    
    * Update README.md
    
    * Update comment
    
    * Update README.md
    
    * Add the deep-dive blog
    
    * Add background color for MLA Kernel Sched.drawio.svg
    
    * Use relative path for the schedule image
    
    * Move flash_mla.h to kernels/params.h
    c2067be3
20250422-new-kernel-deep-dive.md 8.14 KB