1. 19 Mar, 2025 1 commit
    • Wenhao Xie's avatar
      [BugFix] Fix bug of mismatching dtype in testing (#245) · 71537ba5
      Wenhao Xie authored
      * [Typo] Fix formatting in installation instructions in README.md
      
      * [Enhancement] Improve CUDA path detection and update configuration handling
      
      * fix typo
      
      * remove IS_WINDOWS constant
      
      * lint fix
      
      * Improve error messages for CUDA detection failure
      
      * lint fix
      
      * lint fix
      
      * Fix .gitignore to correctly include venv directory
      
      * [Doc] Add instructions for installing nightly version of TileLang
      
      * update installation instructions
      
      * update install instruction
      
      * fix bug of mismatching dtype in testing and set the default value of check_dtype in torch_assert_close to true
      
      * lint fix
      
      * fix bug
      
      * use map_torch_type
      71537ba5
  2. 16 Mar, 2025 1 commit
    • Lei Wang's avatar
      [Refactor] Introduce KernelParam integration across modules (#223) · 3de9f13c
      Lei Wang authored
      * [Refactor] Update KernelParam integration across modules
      
      - Replaced instances of TensorType with KernelParam in various modules to standardize parameter handling.
      - Updated JITKernel, BaseKernelAdapter, and CythonKernelAdapter to utilize KernelParam for improved type consistency.
      - Enhanced Profiler class to include KernelParam in its parameters, ensuring better integration with the new parameter structure.
      - Adjusted tensor handling in utility functions to accommodate the new KernelParam type, improving overall code clarity and maintainability.
      - Updated copyright headers to reflect the correct organization.
      
      * [Refactor] Clean up whitespace in kernel, profiler, and tensor modules
      
      - Added blank lines for improved readability in kernel.py, __init__.py, and tensor.py.
      - Enhanced code clarity by ensuring consistent formatting across these modules.
      
      * [Enhancement] Add detailed docstrings to KernelParam and Profiler classes
      
      - Enhanced KernelParam class with comprehensive docstrings for better understanding of its purpose and methods.
      - Updated Profiler class to include detailed docstrings for its attributes and methods, improving code documentation and usability.
      - Removed unused do_bench function to streamline the profiler module and improve clarity.
      
      * [Refactor] Update type hints in do_bench function and clean up whitespace in profiler module
      
      - Changed type hints for grad_to_none and quantiles parameters in do_bench function to use Optional for better clarity.
      - Added a blank line in __init__.py for improved readability and consistency in the profiler module.
      
      * [Refactor] Update type hint in do_bench function for consistency
      
      - Changed the return type hint in the do_bench function from a union type to a more explicit List type for better clarity and consistency in type annotations.
      
      * [Refactor] Update return type hint in do_bench function for clarity
      
      - Changed the return type hint in the do_bench function from a union type to Union[float, List[float]] for improved clarity and consistency in type annotations.
      
      * [Enhancement] Add func property to Profiler class for adapter access
      
      - Introduced a new property `func` in the Profiler class to provide access to the adapter, ensuring that the adapter is set before retrieval. This enhancement improves the usability of the Profiler class by allowing easier access to the adapter functionality.
      
      * [Refactor] Update kernel compilation and profiling in tests
      
      - Replaced instances of `TL.lower` and `TL.Profiler` with `tilelang.compile` and the new profiler interface across multiple test files.
      - Enhanced the kernel compilation process to utilize the updated API, improving consistency and maintainability in the testing framework.
      - Updated assertions to use the new profiler methods for better clarity and functionality in performance testing.
      
      * [Refactor] Simplify kernel invocation and remove unused parameters in tests
      
      - Updated the kernel invocation in `test_tilelang_dynamic_symbolic.py` to directly assign the result to `C`, improving clarity.
      - Removed the `execution_backend` parameter from `tilelang.compile` calls in `test_tilelang_jit_callback.py` and `test_tilelang_jit_gemm.py` for consistency with the updated API.
      - Commented out the call to `tilelang.testing.main()` in `test_tilelang_jit_callback.py` and replaced it with a direct call to `test_gemm_jit_kernel()` to streamline test execution.
      - Adjusted the dtype mapping in `TorchDLPackKernelAdapter` to use the parameter's dtype directly, enhancing code simplicity.
      
      * [Refactor] Remove unused imports in test files for cleaner code
      
      - Eliminated unnecessary imports of `tilelang` as `TL` in various test files to enhance code clarity and maintainability.
      - Updated multiple test files to streamline the codebase and reduce potential confusion from unused references.
      
      * [Refactor] Simplify kernel invocation in tilelang kernel test
      
      - Updated the kernel invocation in `test_tilelang_kernel_bf16_gemm_mma.py` to directly assign the result to `C`, enhancing code clarity and consistency with recent changes in the API.
      
      * [Refactor] Simplify kernel invocation in tilelang kernel tests
      
      - Updated kernel invocations in multiple test files to directly assign the result to `C`, improving code clarity and consistency with the updated API.
      - Removed unnecessary initialization of `C` as a zero tensor, streamlining the code further.
      
      * [Refactor] Update kernel invocation in tilelang transform tests
      
      - Replaced the use of `TL.Profiler` with `tilelang.compile` in `test_tilelang_transform_simplify.py`, enhancing code clarity and consistency with the updated API.
      - Streamlined the kernel invocation process by directly assigning the result to `C`, improving readability and maintainability of the test code.
      3de9f13c
  3. 14 Mar, 2025 1 commit
  4. 13 Mar, 2025 1 commit
    • zqh-wz's avatar
      [Feature] Upgrade cutlass version and support fp8 T.gemm (#202) · 2cccf1f5
      zqh-wz authored
      
      
      * upgrade cutlass to upstream v3.8.0
      
      * Implement fp8 gemm and add example script
      
      * Fix dtype retrieval with map_torch_type for fp8 inputs
      
      * Disable vectorization of fp8 values
      
      * Make MMA declaration compatible with cutlass 3.4.0+
      
      * Add test for fp8 T.gemm
      
      * fix indent
      
      * fix indent
      
      * Add copyright and license header
      
      * Add copyright and license header
      
      * lint fix
      
      * Refactor matmul_nt and assert_matmul_correctness functions for improved readability by consolidating parameter definitions and adjusting formatting.
      
      * clang format lint
      
      ---------
      Co-authored-by: default avatarLei Wang <34334180+LeiWang1999@users.noreply.github.com>
      Co-authored-by: default avatarLeiWang1999 <leiwang1999@outlook.com>
      2cccf1f5
  5. 09 Mar, 2025 1 commit
    • Lei Wang's avatar
      [Feat] Introduce new caching mechanism for compiled kernels (#176) · 7bde63d5
      Lei Wang authored
      * Add kernel caching mechanism to TileLang
      
      - Implement a new `cached` function in `tilelang/cache/__init__.py` to cache and reuse compiled kernels
      - Expose the `cached` function in the main `tilelang/__init__.py`
      - Add a test case for cached matrix multiplication in `testing/python/cache/test_tilelang_cache_matmul.py`
      - Provide a `clear_cache()` function to reset the kernel cache when needed
      
      * Refactor kernel caching test and implementation
      
      - Simplify the `cached` function in `tilelang/cache/__init__.py`
      - Update test script `test_tilelang_cache_matmul.py` to use `tilelang.testing.main()`
      - Remove unnecessary whitespace and improve code formatting
      
      * Update import for `cached` function in MHA examples
      
      - Modify import statement in `example_mha_bwd.py` and `test_tilelang_kernel_mha_bwd.py`
      - Change import from `tilelang.profiler import cached` to `tilelang import cached`
      - Align with recent refactoring of kernel caching mechanism
      
      * Refactor `cached` function signature in kernel caching
      
      - Update function signature to use keyword-only arguments for `target` and `target_host`
      - Improve parameter order and readability of the `cached` decorator
      - Maintain existing functionality while enhancing function definition
      7bde63d5
  6. 07 Mar, 2025 2 commits
    • Lei Wang's avatar
      [Example] Implement tilelang native sparse attention varlen example (#170) · 8e1845d2
      Lei Wang authored
      * [Refactor] Update BitBLAS Benchmark with TileLang Carver Imports and Roller Hints Generation
      
      - Replace BitBLAS imports with TileLang Carver imports in benchmark_matmul.py
      - Modify roller hints generation using new TileLang Carver template and utility functions
      - Update get_roller_hints_from_func to handle None cases and improve return logic
      - Adjust DefaultPolicy to handle different codegen dictionary formats
      
      * [Refactor] Update Thread Binding and Import Statements in TileLang Kernels
      
      - Replace T.thread_binding() with T.get_thread_binding() across multiple kernel test files
      - Update import statements for MMA layout and macro generator in dequantize GEMM and FP8 examples
      - Move map_torch_type utility function to tilelang.utils.tensor
      - Remove unnecessary imports and improve code organization
      
      * Refactor Native Sparse Attention Example with Enhanced Triton Kernel
      
      - Update parallel_nsa_fwd_kernel to support more flexible sparse attention computation
      - Add support for block counts and offsets in the Triton kernel
      - Modify kernel grid and computation logic for improved performance
      - Update example script to use naive_nsa_simple reference implementation
      - Improve type hints and kernel configuration
      
      * Add Native Sparse Attention Examples with Tilelang and Triton Implementations
      
      - Introduce new example scripts for native sparse attention:
        * example_tilelang_nsa_fwd.py: Forward pass implementation using TileLang
        * example_tilelang_nsa_decode.py: Decoding-specific sparse attention implementation
        * example_triton_nsa_fwd.py: Triton-based sparse attention forward pass
      - Update reference.py with naive implementations for sparse attention
      - Support different sparse attention scenarios including forward pass and inference
      - Add comprehensive testing and validation against reference implementations
      
      * lint fix
      
      * Add Variable-Length Native Sparse Attention Examples for TileLang and Triton
      
      - Introduce new example scripts for variable-length native sparse attention:
        * example_tilelang_nsa_fwd_varlen.py: TileLang implementation with variable sequence lengths
        * example_triton_nsa_fwd_varlen.py: Triton implementation with variable sequence lengths
      - Update reference.py to support variable-length sparse attention scenarios
      - Enhance existing sparse attention implementations to handle variable-length inputs
      - Add comprehensive testing and validation for variable-length sparse attention
      
      * Refactor Native Sparse Attention Examples: Code Style and Formatting Improvements
      
      - Standardize function and parameter formatting across NSA example files
      - Improve code readability by adjusting indentation and line breaks
      - Enhance type hints and parameter alignment
      - Remove unnecessary whitespaces and optimize imports
      - Maintain consistent code style across TileLang and Triton implementations
      8e1845d2
    • Lei Wang's avatar
      [Refactor] Replace `T.thread_binding` with `T.get_thread_binding` in examples and test cases (#163) · de1ba1e4
      Lei Wang authored
      * [Refactor] Update BitBLAS Benchmark with TileLang Carver Imports and Roller Hints Generation
      
      - Replace BitBLAS imports with TileLang Carver imports in benchmark_matmul.py
      - Modify roller hints generation using new TileLang Carver template and utility functions
      - Update get_roller_hints_from_func to handle None cases and improve return logic
      - Adjust DefaultPolicy to handle different codegen dictionary formats
      
      * [Refactor] Update Thread Binding and Import Statements in TileLang Kernels
      
      - Replace T.thread_binding() with T.get_thread_binding() across multiple kernel test files
      - Update import statements for MMA layout and macro generator in dequantize GEMM and FP8 examples
      - Move map_torch_type utility function to tilelang.utils.tensor
      - Remove unnecessary imports and improve code organization
      de1ba1e4
  7. 05 Mar, 2025 1 commit
    • Lei Wang's avatar
      [Enhancement] Enable runtime tensor data type validation (#146) · d0434c3e
      Lei Wang authored
      * Fix debug print buffer template for unsigned char type
      
      - Update debug_print_buffer_value template specialization for unsigned char
      - Modify test_tilelang_debug_print.py to include additional dtype tests
      - Add test case for uint8 dtype in debug print buffer function
      
      * Refactor debug print buffer template formatting for unsigned char
      
      - Improve code formatting for debug_print_buffer_value template specialization
      - Adjust line breaks and indentation for better readability
      - Maintain consistent code style with other template specializations
      
      * Extract map_torch_type utility function to tilelang.utils.tensor
      
      - Move map_torch_type function from multiple test files to a centralized location
      - Import map_torch_type from tilelang.utils.tensor in kernel test files
      - Improve code reusability by creating a shared utility function for type mapping
      
      * Add buffer dtype mapping for Cython kernel adapter
      
      - Introduce buffer_dtype_map in CythonKernelAdapter to track buffer variable dtypes
      - Add _process_buffer_dtype method to extract dtype information from TIR function
      - Update CythonKernelWrapper to support setting and validating buffer dtypes
      - Enhance type checking during kernel execution with dtype verification
      - Improve logging message for Cython JIT adapter compilation
      
      * Add static shape mapping for Cython kernel adapter
      
      - Introduce static_shape_map in CythonKernelAdapter to track buffer variable static shapes
      - Add _process_static_shape method to extract static shape information from TIR function
      - Update CythonKernelWrapper to support setting and validating static shapes
      - Enhance type checking during kernel execution with static shape verification
      
      * Add Multi-Head Attention (MHA) Backward Pass Test for TileLang Kernel
      
      - Implement comprehensive test for Multi-Head Attention backward pass
      - Support both causal and non-causal attention scenarios
      - Add reference implementation for comparing kernel outputs
      - Test different batch sizes, head counts, sequence lengths, and head dimensions
      - Verify forward and backward pass correctness using torch.testing.assert_close
      
      * Set random seed for MHA backward pass test
      
      - Add random seed initialization for consistent test reproducibility
      - Use tilelang.testing.set_random_seed(42) to ensure deterministic test results
      d0434c3e
  8. 04 Mar, 2025 1 commit
    • Lei Wang's avatar
      [Bugfix] Add missing definition for AtomicAdd (#138) · 3960d3d0
      Lei Wang authored
      * Change default log level from WARNING to INFO in TileLang initialization
      
      * Refactor Flash Attention Variable-Length MHA Example with Cython Backend Support
      
      - Update `example_mha_fwd_varlen.py` to use Cython backend for kernel compilation
      - Remove unused imports and simplify function signature
      - Modify `flashattn` function to handle max sequence length as a separate argument
      - Update kernel call to include max sequence length parameter
      - Improve code readability and remove commented-out code
      - Add print statement to confirm successful assertion
      
      * Refactor code formatting in TileLang lowering and example files
      
      - Improve line breaks and code formatting in `lower.py`, `wrapper.py`, and `tensor.py`
      - Simplify line breaks and reduce unnecessary whitespace
      - Enhance code readability by adjusting indentation and line breaks
      - Update example MHA forward pass script with cleaner tensor initialization
      
      * Update TileLang kernel test with import path changes for MMA layout and macro generator
      
      - Modify import statements in test_tilelang_kernel_dequantize_gemm.py
      - Replace bitblas imports with tilelang.intrinsics imports for MMA-related utilities
      - Update main function to use tilelang.testing.main()
      
      * Add Block Sparse Attention Examples for TileLang and Triton
      
      - Implement block sparse attention kernels for both TileLang and Triton
      - Add utility functions for generating sparse attention masks using top-k and threshold methods
      - Support causal and variable-length attention scenarios
      - Include test cases for different sequence length configurations
      - Demonstrate block-level sparse attention with configurable parameters
      
      * Refactor Block Sparse Attention Examples with Code Style Improvements
      
      - Improve code formatting in block_sparse_attn_tilelang.py and block_sparse_attn_triton.py
      - Enhance readability by adjusting line breaks and indentation
      - Simplify kernel and function calls with better formatting
      - Add whitespace and line break improvements for better code clarity
      
      * Enhance Layout Plotting with Multi-Replication and Dynamic Visualization
      
      - Update plot_layout function to support multiple replications in thread and value mapping
      - Improve thread and value mapping to handle replicated layouts
      - Dynamically adjust figure size and legend positioning
      - Add print statements for saved plot file paths
      - Modify example fragment_mma_load_a.py to uncomment and enable warp and block layout plotting
      
      * Refactor AtomicAdd functions in CUDA common header
      
      - Implement a generic template for AtomicAdd function
      - Specialize templates for half_t, bfloat16_t, and pointer types
      - Reorganize and clean up existing AtomicAdd implementations
      - Improve type handling and conversion in atomic operations
      
      * Remove unused import in MHA backward test file
      
      - Remove unnecessary argparse import from test_tilelang_kenrel_mha_bwd.py
      - Add blank line for improved code formatting
      - Minor code cleanup in test file
      3960d3d0
  9. 02 Mar, 2025 1 commit
    • Lei Wang's avatar
      [Kernel] Implement different SEQ Q/KV examples with block sparse (#133) · 159af5df
      Lei Wang authored
      * Change default log level from WARNING to INFO in TileLang initialization
      
      * Refactor Flash Attention Variable-Length MHA Example with Cython Backend Support
      
      - Update `example_mha_fwd_varlen.py` to use Cython backend for kernel compilation
      - Remove unused imports and simplify function signature
      - Modify `flashattn` function to handle max sequence length as a separate argument
      - Update kernel call to include max sequence length parameter
      - Improve code readability and remove commented-out code
      - Add print statement to confirm successful assertion
      
      * Refactor code formatting in TileLang lowering and example files
      
      - Improve line breaks and code formatting in `lower.py`, `wrapper.py`, and `tensor.py`
      - Simplify line breaks and reduce unnecessary whitespace
      - Enhance code readability by adjusting indentation and line breaks
      - Update example MHA forward pass script with cleaner tensor initialization
      
      * Update TileLang kernel test with import path changes for MMA layout and macro generator
      
      - Modify import statements in test_tilelang_kernel_dequantize_gemm.py
      - Replace bitblas imports with tilelang.intrinsics imports for MMA-related utilities
      - Update main function to use tilelang.testing.main()
      
      * Add Block Sparse Attention Examples for TileLang and Triton
      
      - Implement block sparse attention kernels for both TileLang and Triton
      - Add utility functions for generating sparse attention masks using top-k and threshold methods
      - Support causal and variable-length attention scenarios
      - Include test cases for different sequence length configurations
      - Demonstrate block-level sparse attention with configurable parameters
      
      * Refactor Block Sparse Attention Examples with Code Style Improvements
      
      - Improve code formatting in block_sparse_attn_tilelang.py and block_sparse_attn_triton.py
      - Enhance readability by adjusting line breaks and indentation
      - Simplify kernel and function calls with better formatting
      - Add whitespace and line break improvements for better code clarity
      159af5df
  10. 27 Feb, 2025 1 commit
    • Lei Wang's avatar
      [JIT] Enhance cython/ctypes wrapper for tma descriptor (#126) · 7b74bb01
      Lei Wang authored
      
      
      * refactor code
      
      * enhance tutorial
      
      * Enhance error handling and code generation in CUDA and TileLang components
      
      This commit introduces several improvements across multiple files:
      - Added more informative error messages in GEMM layout checks
      - Updated CUDA codegen to support more flexible function signature generation
      - Improved TMA descriptor initialization and kernel dispatch logic
      - Refined library generation and source code parsing utilities
      - Enhanced error handling in various adapter and wrapper classes
      
      * Add thread tag validation for warp specialization
      
      Introduce a ThreadTagChecker to validate that a PrimFunc only uses threadIdx.x before applying warp specialization. This prevents unintended transformations on kernels with complex thread binding and provides a clear warning to users about potential issues with warp specialization.
      
      * Update TileLang Profiling and Compilation in Flash Decoding Examples
      
      Refactor the profiling and compilation workflow in two flash decoding example scripts:
      - Replace `tilelang.lower()` and `tilelang.Profiler()` with `tilelang.compile()`
      - Simplify profiler initialization using `get_profiler()`
      - Update method calls to use the new profiler and compiled kernel objects
      - Maintain existing performance benchmarking and validation logic
      
      * Refactor and clean up code formatting in TileLang testing and adapter modules
      
      This commit includes several code style and formatting improvements:
      - Adjust whitespace and line breaks in test files
      - Improve code formatting in CUDA source wrapper and adapter utilities
      - Enhance readability of function calls and argument handling
      - Remove unnecessary whitespace and standardize indentation
      - Simplify function signatures and argument parsing
      
      * Refactor CUDA codegen and improve code formatting
      
      This commit includes several improvements to CUDA code generation and formatting:
      - Enhance function signature generation in CodeGenTileLangCUDA
      - Improve code formatting and readability in CUDA-related files
      - Simplify parameter handling and type annotations
      - Clean up whitespace and line breaks in codegen and layout files
      
      ---------
      Co-authored-by: default avatarUbuntu <dlisuser@h100testl730RPS.xu5snccwrbtejcqqalluoku5hb.xx.internal.cloudapp.net>
      7b74bb01
  11. 20 Feb, 2025 1 commit
    • Lei Wang's avatar
      [Feature] Add CTypes JIT kernel support (#100) · 7c817d51
      Lei Wang authored
      * [Feature] Add CTypes JIT kernel support for dynamic shapes and multi-stream execution
      
      - Enhance CtypesKernelAdapter to handle dynamic symbolic shapes
      - Add support for multi-stream kernel execution in CTypes backend
      - Implement dynamic shape handling in test_tilelang_jit_gemm_ctypes.py
      - Add symbolic shape utility function in tilelang.language
      - Update profiler to improve flexibility in benchmark selection
      
      * Remove redundant thread binding in GEMM kernel implementations
      
      - Remove unnecessary `thread_binding` line in GEMM kernel functions
      - Clean up code in `examples/gemm/README.md` and `testing/python/kernel/test_tilelang_kernel_int4_gemm_mma.py`
      - Enhance code readability by removing redundant thread binding annotation
      
      * Fix indentation in int4 GEMM kernel test file
      
      - Correct indentation for function calls in `test_tilelang_kernel_int4_gemm_mma.py`
      - Remove extra indentation in `mma_emitter.ldmatrix_a()` and `mma_emitter.ldmatrix_b()` calls
      - Improve code formatting for better readability
      7c817d51
  12. 11 Feb, 2025 1 commit
  13. 06 Feb, 2025 2 commits
    • Lei Wang's avatar
      [Dev] Add test case for bfloat16 and int4 gemm with mma (#65) · 0d19e268
      Lei Wang authored
      * [Enhancement] Add VectorizeLoop function and update imports for compatibility
      
      * [CI][Test] Improve test cases for vectorization and fix typos in parser comments
      
      * lint fix
      
      * Fix incorrect module reference for VectorizeLoop transformation
      
      * Refactor vectorize_loop transformation by removing unused extent mutation logic
      
      * [Enhancement] Add support for FP8 data types and global barriers in CUDA codegen
      
      * Fix formatting in CUDA FP8 header file for consistency
      
      * Refactor CI workflow to use 'tilelang_ci' virtual environment and update CUDA type printing for better clarity
      
      * Update submodule 'tvm' to latest commit for improved functionality
      
      * Refactor execution backend references from 'dl_pack' to 'dlpack' for consistency and clarity; add apply_simplify function to simplify PrimFunc or IRModule.
      
      * Refactor CUDA code for improved readability; clean up formatting and remove unnecessary whitespace in multiple files.
      
      * Refactor import statement in test_tilelang_kernel_dequantize_gemm.py to use 'tilelang.language' for consistency
      
      * Add CUDA requirements to FP8 test cases and update references for clarity
      
      * Add a blank line for improved readability in test_tilelang_kernel_fp8_gemm_mma.py
      
      * Fix data type in reference result calculation for consistency in test_tilelang_kernel_gemm_mma_intrinsic.py
      
      * Add CUDA requirements and FP8 test cases for matmul and gemv simulations
      
      * Remove debug print statements and use tilelang's testing assertion for result validation in test_tilelang_kernel_gemm_mma_intrinsic.py
      
      * Remove outdated comment regarding FP8 tests in test_tilelang_kernel_gemv_simt.py
      
      * Add BF16 support to matrix multiplication and introduce corresponding test cases
      
      * Add a blank line for improved readability in BF16 GEMM test
      
      * Update acknowledgements in README to include supervision by Zhi Yang at Peking University
      0d19e268
    • Lei Wang's avatar
      [Dev] Support FP8 Codegen for cuda backend (#64) · 61de5288
      Lei Wang authored
      * [Enhancement] Add VectorizeLoop function and update imports for compatibility
      
      * [CI][Test] Improve test cases for vectorization and fix typos in parser comments
      
      * lint fix
      
      * Fix incorrect module reference for VectorizeLoop transformation
      
      * Refactor vectorize_loop transformation by removing unused extent mutation logic
      
      * [Enhancement] Add support for FP8 data types and global barriers in CUDA codegen
      
      * Fix formatting in CUDA FP8 header file for consistency
      
      * Refactor CI workflow to use 'tilelang_ci' virtual environment and update CUDA type printing for better clarity
      
      * Update submodule 'tvm' to latest commit for improved functionality
      
      * Refactor execution backend references from 'dl_pack' to 'dlpack' for consistency and clarity; add apply_simplify function to simplify PrimFunc or IRModule.
      
      * Refactor CUDA code for improved readability; clean up formatting and remove unnecessary whitespace in multiple files.
      
      * Refactor import statement in test_tilelang_kernel_dequantize_gemm.py to use 'tilelang.language' for consistency
      
      * Add CUDA requirements to FP8 test cases and update references for clarity
      
      * Add a blank line for improved readability in test_tilelang_kernel_fp8_gemm_mma.py
      
      * Fix data type in reference result calculation for consistency in test_tilelang_kernel_gemm_mma_intrinsic.py
      
      * Add CUDA requirements and FP8 test cases for matmul and gemv simulations
      
      * Remove debug print statements and use tilelang's testing assertion for result validation in test_tilelang_kernel_gemm_mma_intrinsic.py
      
      * Remove outdated comment regarding FP8 tests in test_tilelang_kernel_gemv_simt.py
      61de5288
  14. 26 Jan, 2025 1 commit
    • Lei Wang's avatar
      [Doc] Addd debug relevant testing and documentations (#58) · 5e259239
      Lei Wang authored
      * implement jit test case
      
      * [Dev] implement auto tune test case for matrix multiplication
      
      * Implement test for legalize memory access and vectorized loop
      
      * lint fix
      
      * introduce run_once
      
      * Refactor callback function names for consistency and improve code readability
      
      * enhance documentations
      
      * lint fix
      
      * lint fix
      
      * lint fix
      
      * lint fix
      
      * fix formatting issues in rt_mod_hip.cc
      
      * add random seed initialization for deterministic testing
      5e259239
  15. 25 Jan, 2025 4 commits
    • Cunxiao Ni's avatar
      [CI][Test] Add test cases for element_add (#47) · f944b79e
      Cunxiao Ni authored
      * [CI][Test] Add test cases for element_add
      
      * [Doc] fix typo
      
      * Parallelization
      
      * format
      
      * remove useless condition
      
      * format
      f944b79e
    • Yu Cheng's avatar
      [CI][Test] Add test cases for tilelang kernel FlashAttention (#54) · bedab1a0
      Yu Cheng authored
      * [Dev] Add FlashDecoding example
      
      * [CI][Test] Add test cases for tilelang kernel convolution
      
      * [CI][Test] Add test cases for tilelang kernel FlashAttention
      
      * Reduce the number of stages to ensure the shared memory allocation is valid
      
      * Temporarily remove the dim128 case
      
      * lint
      
      * update einops in requirements-dev.txt
      
      * update einops in requirements-test.txt
      
      * remove einops in requirements-dev.txt
      bedab1a0
    • Yu Cheng's avatar
      [CI][Test] Add test cases for tilelang kernel convolution (#51) · 34de04a6
      Yu Cheng authored
      * [CI][Test] Add test cases for tilelang kernel convolution
      34de04a6
    • Lei Wang's avatar
      [Doc] Remove unnecessary layout annotation (#49) · 47ecc791
      Lei Wang authored
      * [Doc] Update documentation structure and content: add overview section, revise project name, and change theme to Furo
      
      * [Feature] Add device-side debug printing functions and integrate into kernel interface
      
      * lint fix
      
      * remove debug print
      
      * implement test for debug
      
      * lint fix
      
      * add some comments
      
      * Enhance fragment design and assert fragment print
      
      * enhance debug print
      
      * add test for msg
      
      * lint fix
      
      * format
      
      * add flash decoding exmaples
      
      * remove comment
      
      * test simplified
      47ecc791
  16. 23 Jan, 2025 2 commits
    • Lei Wang's avatar
      [Refactor] Simplify interface via replacing argument thread binding of... · 362b3520
      Lei Wang authored
      [Refactor] Simplify interface via replacing argument thread binding of intrinsics with `KernelFrame.Current` (#34)
      
      * installation script fix
      
      * readme typo fix
      
      * doc fix for dequantize gemm
      
      * [Doc] remove CODE_OF_CONDUCT.md and SECURITY.md; update references in CONTRIBUTING.md
      
      * [Doc] add unit tests for AnnotateDeviceRegions transform; remove SUPPORT.md
      
      * update license
      
      * [Enhancement] add tensor supply handling for unsigned integers; improve error message for execution backend assertion
      
      * [Refactor] improve code readability by reformatting function signatures and assertions
      
      * [Refactor] replace torch.manual_seed with tilelang.testing.set_random_seed for consistency in random seed handling
      
      * [Refactor] unify thread binding variable naming across kernel and example files
      
      * [Refactor] remove unused thread binding parameter from matrix multiplication functions
      
      * [Refactor] remove unused thread binding parameter from matrix multiplication functions
      
      * [Refactor] enable main testing function in tilelang kernel gemm test
      
      * bug fix
      362b3520
    • Lei Wang's avatar
      [CI] Comprehensive Test cases Implementation of Matmul Dequantize (#32) · 7959d786
      Lei Wang authored
      * installation script fix
      
      * readme typo fix
      
      * doc fix for dequantize gemm
      
      * [Doc] remove CODE_OF_CONDUCT.md and SECURITY.md; update references in CONTRIBUTING.md
      
      * [Doc] add unit tests for AnnotateDeviceRegions transform; remove SUPPORT.md
      
      * update license
      
      * [Enhancement] add tensor supply handling for unsigned integers; improve error message for execution backend assertion
      
      * [Refactor] improve code readability by reformatting function signatures and assertions
      
      * [Refactor] replace torch.manual_seed with tilelang.testing.set_random_seed for consistency in random seed handling
      7959d786
  17. 12 Jan, 2025 2 commits
  18. 11 Jan, 2025 2 commits
    • Lei Wang's avatar
      [Lint] Overall Typo and Linting Fixes (#13) · fa511857
      Lei Wang authored
      * README.md fixed
      
      * update test ci
      
      * Lint and Typo Fix
      
      * Clang Format Lint Fix
      fa511857
    • Lei Wang's avatar
      [Initialization] Migration of Codebase from Dev Branch into Main (#10) · 57ab687c
      Lei Wang authored
      
      
      * Add format.sh script for code formatting and linting
      
      * docs update
      
      * center align the title
      
      * lint fix
      
      * add ignore
      
      * Add .gitignore for 3rdparty directory
      
      * Add requirements-dev.txt, requirements-test.txt, and requirements.txt
      
      * 3rdparty
      
      * Add gemm.h, CMakeLists.txt, _ffi_api.py, __init__.py, runtime.h, reduce.h, loop_partition.h, utils.h, and loop_vectorize.h
      
      * Refactor CMakeLists.txt and include statements
      
      - Update CMakeLists.txt to use a newer version of CMake and add project name
      - Remove unnecessary include directories
      
      Fix include paths in layout.cc, codegen.cc, codegen.h, rt_mod.cc, frontend_legalize.cc, inject_pipeline.cc, layout_inference.cc, loop_vectorize.cc, and lower_tile_op.cc
      
      - Update include paths to use relative paths instead of absolute paths
      
      * Update submodule for 3rdparty/tvm
      
      * update
      
      * load dll first
      
      * Refactor CMakeLists.txt and include statements
      
      * Refactor CMakeLists.txt and include statements
      
      * git keep update
      
      * Refactor CMakeLists.txt and include statements
      
      * Refactor CMakeLists.txt and include statements
      
      * refactor code structure
      
      * Update Readme
      
      * CMakeLists Customized
      
      * update readme
      
      * update README
      
      * update readme
      
      * update usage
      
      * with TVM_IMPORT_PYTHON_PATH to handle own tvm build python import
      
      * annotate lower transform global func with `transform` prefix
      
      * Migrate Simplify Pass from tilelang tvm branch
      
      * enhance system environment handling with __init__ and CMake
      
      * Initial commit
      
      * CODE_OF_CONDUCT.md committed
      
      * LICENSE committed
      
      * README.md committed
      
      * SECURITY.md committed
      
      * SUPPORT.md committed
      
      * CODE_OF_CONDUCT Commit
      
      * LICENSE Commit
      
      * SECURITY Commit
      
      * SUPPORT Commit
      
      * Modify Support
      
      * Update README.md
      
      * security ci update
      
      * remove examples
      
      * Update and implement clang-format
      
      * add composable kernel components
      
      * Migrate from latest update
      
      * submodule update
      
      * Test update
      
      * Update License
      
      * Spell check
      
      * lint fix
      
      * add clang-tidy to apply static analysis for c source
      
      * update tilelang examples
      
      * Update Install Docs
      
      * Refactor filetree
      
      * Enhance Install
      
      * conflict resloved
      
      * annotate_version
      
      * Initial Update
      
      * test fix
      
      * install
      
      * Implement setup.py
      
      * lint fix
      
      * Separate Init
      
      * Separate test
      
      * docker file commit
      
      * add logo
      
      * Update Readme and Examples
      
      * update readme
      
      * update logo
      
      * Implement AMD Installation
      
      * Add License
      
      * Update AMD MI300x Benchmark
      
      * update README
      
      * update mi300 benchmark scripts
      
      * update ignore
      
      * enhance build scirpt
      
      * update image
      
      * enhance setup.py to remove duplicated libraries
      
      * remove debug files
      
      * update readme
      
      * update image
      
      * update gemm examples
      
      * update flashattention README
      
      * readme update
      
      * add cmake into requirements
      
      * libinfo fix
      
      * auto update submodule
      
      * lint fix
      
      * Fix AMD Build and Test
      
      * Update check for transpose attribute for CDNA Arch
      
      * typo fix for amd
      
      * Implement Matmul Benchmark
      
      * Refactor Code
      
      * [TypoFix] Fix GEMM Example
      
      * [Docs] Init Linear Attention README
      
      * [TYPO] Typo fix
      
      * [Lint] Lint Fix
      
      * enhance example with intrinsics
      
      * [Enhancement] Improve Buffer Collection during IR Parser
      
      * [Dev] Introduce Current classmethod to get current frame
      
      * submodule update
      
      * fake test pass update
      
      * support thread_extent_api
      
      * code optimize
      
      * Add GEMM function implementation for matrix multiplication
      
      * Update logging format to reflect TileLang in logger messages
      
      * Refactor CMakeLists.txt for improved readability and set default build type to Release
      
      * Support Gemm SS Primitives Implementation
      
      * [README] Upload Tile Language Logo (#5)
      
      * update logo
      
      * Update README.md to enhance formatting and center the title
      
      ---------
      Co-authored-by: default avatarmicrosoft-github-operations[bot] <55726097+microsoft-github-operations[bot]@users.noreply.github.com>
      Co-authored-by: default avatarMicrosoft Open Source <microsoftopensource@users.noreply.github.com>
      Co-authored-by: default avatarYu Cheng <yu.cheng@pku.edu.cn>
      57ab687c