"src/vscode:/vscode.git/clone" did not exist on "3c133f818b7dd51e2ee2ec8f68e5211de736f2f2"
  1. 11 Oct, 2025 1 commit
    • Lei Wang's avatar
      [Refactor] Refactor Pass `InjectFenceProxy` and expose some warp group... · ddfaac36
      Lei Wang authored
      [Refactor] Refactor Pass `InjectFenceProxy` and expose some warp group primitives in frontend (#977)
      
      * • InjectFenceProxy docs and tests
      
        - annotate proxy fence injector with context comments for async/generic detection
        - add compiler internals doc covering the pass mechanics and link it in docs index
        - repair fence proxy test by fixing descriptor init usage and fence counter logic
      
      * do not consider call_extern as async.
      
      * doc update.
      
      * reduce test size for sparse mla
      ddfaac36
  2. 24 Aug, 2025 1 commit
    • Lei Wang's avatar
      [Bugfix] Add missing FP8 header include (#752) · cf7be057
      Lei Wang authored
      
      
      * [Enhancement] Add DispatchInstruction specialization for fp8 types in gemm_sm90.h
      
      - Introduced specialized DispatchInstruction templates for fp8_e4_t and fp8_e5_t types, enhancing support for new data formats in CUDA GEMM operations.
      - Each specialization defines the corresponding MMA and MMA_Group types, optimizing performance for specific configurations.
      Co-authored-by: default avatarLeiWang1999 <leiwang1999@outlook.com>
      
      * [Enhancement] Include cuda_fp8.h in gemm_sm90.h
      
      - Added the inclusion of the "cuda_fp8.h" header file to support new data formats in CUDA GEMM operations, enhancing compatibility with recent updates for fp8 types.
      Co-authored-by: default avatarLeiWang1999 <leiwang1999@outlook.com>
      
      * lint fix
      
      * [Refactor] Remove unused tl_shuffle_elect and related functions from common.h
      
      - Deleted the `tl_shuffle_elect` function and its associated comments to streamline the codebase.
      - Added inclusion of "intrin.h" for improved intrinsic support in CUDA operations.
      - Cleaned up the file by removing unnecessary template parameters and functions, enhancing clarity and maintainability.
      
      * lint fix
      
      * [Refactor] Update header inclusions in common.h and gemm_sm90.h
      
      - Removed the inclusion of "intrin.h" from common.h to streamline dependencies.
      - Added "intrin.h" inclusion in gemm_sm90.h to ensure intrinsic support for CUDA operations, enhancing functionality and maintainability.
      
      * bug fix
      cf7be057