- 09 May, 2025 1 commit
-
-
Cunxiao Ni authored
* [CI] Add elementwise and gemv examples to CI. * fix lint * test * fix gemv lint * fix lint
-
- 31 Mar, 2025 1 commit
-
-
Lei Wang authored
* [Enhancement] Add support for CUDA architecture 8.9 in GEMM template - Introduced conditional inclusion of "gemm_sm89.h" for CUDA architectures 8.9 and above, enhancing compatibility with newer hardware. - This change ensures that the GEMM template can leverage optimizations specific to the 8.9 architecture, improving performance for users with compatible GPUs. * lintfix * [Refactor] Clean up includes in gemm_sm89.h - Removed duplicate inclusion of "common.h" and added "cuda_fp8.h" for improved clarity and organization. - This change enhances the maintainability of the code by ensuring that header files are included only once and in a logical order. * [Enhancement] Improve KernelCache with in-memory caching and detailed docstrings - Added an in-memory cache to the KernelCache class to enhance performance by reducing disk access. - Updated the __new__ method to initialize the memory cache and added logic to check the cache before loading from disk. - Enhanced docstrings across multiple methods to provide clearer explanations of parameters and return values, improving code readability and maintainability. - Implemented a clear_cache method to clear both in-memory and disk caches, ensuring efficient cache management. * lint fix * typofix * [Refactor] Update matmul and flashattn function calls to return structured results - Modified the matmul and flashattn function calls to return a single object containing latency, configuration, and reference latency, improving code clarity and reducing the number of returned variables. - Updated all relevant instances in benchmark and example scripts to accommodate the new return structure, ensuring consistent usage across the codebase. * lint fix
-
- 28 Mar, 2025 1 commit
-
-
botbw authored
* [doc/example] init gemv doc and examples * [example] add vectorized read * [example] use local register instead of smem * [example] add bench * [doc] update doc * [doc] refine doc * [lint] format code * [doc] add tips * [doc/example] fix typo * [example] use tmv_all_reduce * [doc] update doc accordingly * [doc] add benchmark table * [lint] format code
-