1. 31 Mar, 2025 1 commit
    • Lei Wang's avatar
      [Bugfix] Updated autotune usage in the examples to align with the latest changes (#309) · 66c7f6a1
      Lei Wang authored
      * [Enhancement] Add support for CUDA architecture 8.9 in GEMM template
      
      - Introduced conditional inclusion of "gemm_sm89.h" for CUDA architectures 8.9 and above, enhancing compatibility with newer hardware.
      - This change ensures that the GEMM template can leverage optimizations specific to the 8.9 architecture, improving performance for users with compatible GPUs.
      
      * lintfix
      
      * [Refactor] Clean up includes in gemm_sm89.h
      
      - Removed duplicate inclusion of "common.h" and added "cuda_fp8.h" for improved clarity and organization.
      - This change enhances the maintainability of the code by ensuring that header files are included only once and in a logical order.
      
      * [Enhancement] Improve KernelCache with in-memory caching and detailed docstrings
      
      - Added an in-memory cache to the KernelCache class to enhance performance by reducing disk access.
      - Updated the __new__ method to initialize the memory cache and added logic to check the cache before loading from disk.
      - Enhanced docstrings across multiple methods to provide clearer explanations of parameters and return values, improving code readability and maintainability.
      - Implemented a clear_cache method to clear both in-memory and disk caches, ensuring efficient cache management.
      
      * lint fix
      
      * typofix
      
      * [Refactor] Update matmul and flashattn function calls to return structured results
      
      - Modified the matmul and flashattn function calls to return a single object containing latency, configuration, and reference latency, improving code clarity and reducing the number of returned variables.
      - Updated all relevant instances in benchmark and example scripts to accommodate the new return structure, ensuring consistent usage across the codebase.
      
      * lint fix
      66c7f6a1
  2. 28 Mar, 2025 1 commit
    • botbw's avatar
      [doc/example] add gemv doc and examples (#293) · ff3cfa59
      botbw authored
      * [doc/example] init gemv doc and examples
      
      * [example] add vectorized read
      
      * [example] use local register instead of smem
      
      * [example] add bench
      
      * [doc] update doc
      
      * [doc] refine doc
      
      * [lint] format code
      
      * [doc] add tips
      
      * [doc/example] fix typo
      
      * [example] use tmv_all_reduce
      
      * [doc] update doc accordingly
      
      * [doc] add benchmark table
      
      * [lint] format code
      ff3cfa59