1. 18 Dec, 2025 1 commit
    • Gabriel Wu's avatar
      feat(cutedsl): add CuTeDSL backend (#1421) · 7248a810
      Gabriel Wu authored
      
      
      * feat: CuTeDSL backend
      
      * fix: clang-tidy
      
      * fix: clang-format
      
      * fix: ci
      
      * fix: revert example gemm fp8
      
      * fix: remove duplicate code
      
      * fix: switch-case
      
      * fix: fp16 silence
      
      * fix: TVM IR print
      
      * fix: useless tir
      
      * fix: clang-format
      
      * fix: remove tilelang/contrib/cutedsl/.gitignore
      
      * fix: use hexfloat
      
      * fix: gsym guard
      
      * fix: unknown storage sync type
      
      * fix: string literal
      
      * fix: add args guard
      
      * fix: name hint dedup
      
      * fix: better find_kernel_by_pattern
      
      * fix: set libpath for from_database path
      
      * fix: guard buffer.strides
      
      * fix: from guard
      
      * fix: eviction guard
      
      * fix: use thread local tma descs
      
      * fix: ruff
      
      * fix: drop tma_init_cpp
      
      * fix: exc_info
      
      * fix: negative unmatch early return
      
      * fix: rename postproc func and add test
      
      * fix: handle fast math according to pass config
      
      * fix: dyn_sym parse
      
      * fix: wrap_forward
      
      * fix: use tvm_ffi.libinfo instead of cli
      
      * fix: keep signature
      
      * fix: C++ string safety
      
      * fix: mark tma_store_add as unsupported
      
      * fix: tvm version
      
      * resolve ldsm and cpasync issues.
      
      * fix: minor fixes
      
      * fix: parse signature using ast
      
      * fix: guard global_addr
      
      * fix: create tempfile only when necessary
      
      * fix: use logger.execption for exceptions
      
      * fix: guard lib_path and host_func
      
      * fix: remove tma_cpp_init and add timeout for cpp compile
      
      * add timeout for mbarrier_wait.
      
      * fix: _load_kernel_from_disk signature
      
      * resolve codegen issues.
      
      * fix: logger.exception
      
      * add comment for div_by=1
      
      * merge
      
      * fix: reserve cutlass,cute,tl
      
      * fix: guard tma_store
      
      * fix: allow int64 offset in make_tensor_at_offset
      
      * fix: guard barrier
      
      * fix: add comments for div_by=16
      
      * fix: div_by=1 issue
      
      * delete div_by when offset is 0
      
      * use tl.make_tensor when offset is 0
      
      * fix: explicitly check cutedsl target
      
      * fix: use param.torch_dtype()
      
      ---------
      Co-authored-by: default avataryuxic <yuxic@nvidia.com>
      Co-authored-by: default avatarYong <yong@local>
      Co-authored-by: default avatarLeiWang1999 <leiwang1999@outlook.com>
      7248a810