1. 29 Oct, 2025 1 commit
    • Cunxiao Ni's avatar
      [BugFix] Correct direct copy from bf16 to fp8 (#1090) · e1b12bd0
      Cunxiao Ni authored
      
      
      * [BugFix] Correct direct copy from bf16 to fp8
      
      * fix lint
      
      * implement overloaded cast codegen for type conversion
      
      * fix lint
      
      * remove test
      
      * fix lint
      
      * trigger CI
      
      * Overload fp8 for implicit conversion
      
      * format
      
      * new format
      
      * fix: Reinterpret types to cute types in GEMM
      
      * new format
      
      * fix lint
      
      * new format
      
      * fix lint
      
      * format
      
      * trigger ci
      
      ---------
      Co-authored-by: default avatarnicunxiao <nicunxiao@bytedance.com>
      e1b12bd0
  2. 02 Oct, 2025 1 commit
    • Zhiwen Mo's avatar
      [Bugfix] Fix tensor memory copy layout (#933) · 5ccac4fa
      Zhiwen Mo authored
      * Implements tcgen05.ld instruction support for copying from shared.tmem
        to local.fragment on SM100/Blackwell architecture. Adds layout inference
        and lowering logic for tensor memory operations with proper physical
        coordinate range analysis and warpgroup alignment checks.
      
        Changes:
        - Add kTMemLoad and kTMemStore to CopyInst enumeration
        - Implement CheckTMemLoad() and CheckTMemStore() validation functions
        - Add LowerTmemCopy() to generate tcgen05.ld/st/cp PTX intrinsics
        - Add tmem layout inference in InferLayout() using expandTcgen05Layout
        - Support multiple instruction variants (32dp32b/64b/128b/256b)
        - Add physical layout bounds analysis for tmem coordinates
        - Change clear_accum from bool to PrimExpr in GEMM operations
        - Fix std::optional access checks in layout_inference.cc
        - Add tmem_allocate/deallocate PTX intrinsic support
        - Fix cooperative_groups grid.sync() code generation
      
      * fix
      
      * pipeline fix
      
      * bug fix
      
      * bool fix
      5ccac4fa
  3. 28 Sep, 2025 1 commit
    • Zhiwen Mo's avatar
      [SM100] Add sm100 GEMM layouts and tcgen05 support (#887) · f58bcd43
      Zhiwen Mo authored
      * update sm100 related utcmma, tmem, ld/st256 in src
      * update sm100 related utcmma, tmem, ld/st256 in tilelang
      * Remove deprecated GEMM examples and related README documentation for SM100 architecture support
      * Update GEMM implementation to replace UTCMMA with TCGEN5MMA across relevant files
      * Remove gemm_umma.py example and update README to reflect TCGEN5MMA terminology changes
      * Update README.md for gemm_sm100 example by removing outdated API sections and streamlining documentation
      * Update README and source files to reflect TCGEN5.MMA terminology changes
      * Refactor CUDA GEMM header for improved readability
      f58bcd43