1. 20 Mar, 2025 1 commit
    • Lei Wang's avatar
      [Refactor] Phaseout LLVM Dependency by Making it Optional (#247) · f2e99180
      Lei Wang authored
      * remove llvm build
      
      * [Refactor] Update kernel compilation and profiling in examples
      
      - Replaced `tilelang.lower` with `tilelang.compile` in multiple example scripts to streamline kernel compilation.
      - Updated profiling calls to utilize the new `get_profiler` method, enhancing performance measurement consistency.
      - Adjusted assertions and benchmarking methods to align with the new profiling structure across various examples, ensuring correctness and clarity in performance evaluations.
      
      * lint fix
      
      * License Update
      
      * [Refactor] Improve code formatting and documentation in CUDA header and HIP runtime files
      
      - Adjusted formatting in `cuda.h` for better readability, including alignment of comments and struct fields.
      - Cleaned up whitespace and improved comment clarity in `rt_mod_hip.cc` to enhance code maintainability.
      
      * [Refactor] Enhance formatting and clarity in CUDA header and HIP runtime files
      
      - Improved comment alignment and readability in `cuda.h`.
      - Cleaned up whitespace and formatting in `rt_mod_hip.cc` to enhance maintainability.
      
      * lint fix
      
      * lint fix
      
      * lint fix
      
      * lint fix
      
      * fix
      
      * License update
      
      * [Enhancement] Update JITKernel to use artifact for kernel source
      
      - Assigned the generated artifact to `self.artifact` for better management.
      - Updated kernel source references to use `artifact.kernel_source` for consistency in execution backend handling.
      
      * lint fix
      
      * Add @tilelang.testing.requires_llvm decorator to vectorization tests
      
      * Enhance setup.py and env.py for library management
      
      - Added functionality to remove original files after copying in CMakeBuild.
      - Updated TVM_LIBRARY_PATH in env.py to include the PyPI build library path for better integration.
      
      * Refactor TVM_LIBRARY_PATH assignment for improved readability in env.py
      
      * Refactor CMakeBuild file handling in setup.py
      
      - Added a check to ensure the target library directory exists before copying .so files.
      - Improved the logic for creating the target directory and copying files to enhance robustness.
      
      * bugfix
      
      * Rename BuildTLDebug to BuildTileLangCUDAWithoutCompile and update registration. Add @tilelang.testing.requires_llvm decorator to multiple tests for LLVM requirement.
      
      * lint fix
      
      * Enhance TileLang code generation by adding support for device code generation without compilation. Updated `host_codegen` and `device_codegen` functions to include new transformations and registration for `tilelang_hip_without_compile`. Refactored JIT kernel adapters to accommodate host and device modules, improving overall integration and flexibility.
      
      * lint fix
      
      * Add support for C target in device code generation
      
      - Updated `device_codegen_without_compile` to include handling for the C target by registering the `tilelang_cpp` function.
      
      * [Enhancement] Implement auto-clear cache feature based on environment variable
      
      * Added TILELANG_CLEAR_CACHE environment variable to control cache clearing.
      * Updated CI workflow to set TILELANG_CLEAR_CACHE during testing.
      * Modified cache initialization to clear cache if TILELANG_CLEAR_CACHE is set to true.
      
      * [Refactor] Update kernel invocation and import paths in tests and cache
      
      * Changed kernel invocation in `test_tilelang_kernel_dequantize_gemm.py` to return the result.
      * Updated import statements in `test_tilelang_kernel_int4_gemm_mma.py` to use `bitblas` instead of `tilelang`.
      * Refactored paths for artifact and parameters in `kernel_cache.py` for better maintainability.
      
      * [Refactor] Clean up whitespace and improve code formatting in kernel_cache.py
      
      * Removed unnecessary blank lines and adjusted spacing for better readability in the KernelCache class.
      * Enhanced overall code formatting to align with project standards.
      
      * [Enhancement] Add bfloat16 test case and improve kernel caching logic
      
      * Introduced a new test case for bfloat16 matrix multiplication in `test_tilelang_kernel_gemm_mma_intrinsic.py`.
      * Updated `KernelCache` to handle multiple kernel source files and improve error handling during saving and loading.
      * Refactored `JITKernel` to support instantiation from a database, enhancing flexibility in kernel management.
      * Adjusted `CtypesKernelAdapter` and `CythonKernelAdapter` to utilize the new kernel loading mechanism from the database.
      * Improved code formatting and readability across several files.
      
      * lint fix
      
      * Update bfloat16 matrix multiplication test case to use larger dimensions for improved coverage
      f2e99180
  2. 19 Mar, 2025 1 commit
    • Wenhao Xie's avatar
      [BugFix] Fix bug of mismatching dtype in testing (#245) · 71537ba5
      Wenhao Xie authored
      * [Typo] Fix formatting in installation instructions in README.md
      
      * [Enhancement] Improve CUDA path detection and update configuration handling
      
      * fix typo
      
      * remove IS_WINDOWS constant
      
      * lint fix
      
      * Improve error messages for CUDA detection failure
      
      * lint fix
      
      * lint fix
      
      * Fix .gitignore to correctly include venv directory
      
      * [Doc] Add instructions for installing nightly version of TileLang
      
      * update installation instructions
      
      * update install instruction
      
      * fix bug of mismatching dtype in testing and set the default value of check_dtype in torch_assert_close to true
      
      * lint fix
      
      * fix bug
      
      * use map_torch_type
      71537ba5
  3. 07 Mar, 2025 1 commit
    • Lei Wang's avatar
      [Refactor] Replace `T.thread_binding` with `T.get_thread_binding` in examples and test cases (#163) · de1ba1e4
      Lei Wang authored
      * [Refactor] Update BitBLAS Benchmark with TileLang Carver Imports and Roller Hints Generation
      
      - Replace BitBLAS imports with TileLang Carver imports in benchmark_matmul.py
      - Modify roller hints generation using new TileLang Carver template and utility functions
      - Update get_roller_hints_from_func to handle None cases and improve return logic
      - Adjust DefaultPolicy to handle different codegen dictionary formats
      
      * [Refactor] Update Thread Binding and Import Statements in TileLang Kernels
      
      - Replace T.thread_binding() with T.get_thread_binding() across multiple kernel test files
      - Update import statements for MMA layout and macro generator in dequantize GEMM and FP8 examples
      - Move map_torch_type utility function to tilelang.utils.tensor
      - Remove unnecessary imports and improve code organization
      de1ba1e4
  4. 05 Mar, 2025 1 commit
    • Lei Wang's avatar
      [Enhancement] Enable runtime tensor data type validation (#146) · d0434c3e
      Lei Wang authored
      * Fix debug print buffer template for unsigned char type
      
      - Update debug_print_buffer_value template specialization for unsigned char
      - Modify test_tilelang_debug_print.py to include additional dtype tests
      - Add test case for uint8 dtype in debug print buffer function
      
      * Refactor debug print buffer template formatting for unsigned char
      
      - Improve code formatting for debug_print_buffer_value template specialization
      - Adjust line breaks and indentation for better readability
      - Maintain consistent code style with other template specializations
      
      * Extract map_torch_type utility function to tilelang.utils.tensor
      
      - Move map_torch_type function from multiple test files to a centralized location
      - Import map_torch_type from tilelang.utils.tensor in kernel test files
      - Improve code reusability by creating a shared utility function for type mapping
      
      * Add buffer dtype mapping for Cython kernel adapter
      
      - Introduce buffer_dtype_map in CythonKernelAdapter to track buffer variable dtypes
      - Add _process_buffer_dtype method to extract dtype information from TIR function
      - Update CythonKernelWrapper to support setting and validating buffer dtypes
      - Enhance type checking during kernel execution with dtype verification
      - Improve logging message for Cython JIT adapter compilation
      
      * Add static shape mapping for Cython kernel adapter
      
      - Introduce static_shape_map in CythonKernelAdapter to track buffer variable static shapes
      - Add _process_static_shape method to extract static shape information from TIR function
      - Update CythonKernelWrapper to support setting and validating static shapes
      - Enhance type checking during kernel execution with static shape verification
      
      * Add Multi-Head Attention (MHA) Backward Pass Test for TileLang Kernel
      
      - Implement comprehensive test for Multi-Head Attention backward pass
      - Support both causal and non-causal attention scenarios
      - Add reference implementation for comparing kernel outputs
      - Test different batch sizes, head counts, sequence lengths, and head dimensions
      - Verify forward and backward pass correctness using torch.testing.assert_close
      
      * Set random seed for MHA backward pass test
      
      - Add random seed initialization for consistent test reproducibility
      - Use tilelang.testing.set_random_seed(42) to ensure deterministic test results
      d0434c3e
  5. 06 Feb, 2025 1 commit
    • Lei Wang's avatar
      [Dev] Support FP8 Codegen for cuda backend (#64) · 61de5288
      Lei Wang authored
      * [Enhancement] Add VectorizeLoop function and update imports for compatibility
      
      * [CI][Test] Improve test cases for vectorization and fix typos in parser comments
      
      * lint fix
      
      * Fix incorrect module reference for VectorizeLoop transformation
      
      * Refactor vectorize_loop transformation by removing unused extent mutation logic
      
      * [Enhancement] Add support for FP8 data types and global barriers in CUDA codegen
      
      * Fix formatting in CUDA FP8 header file for consistency
      
      * Refactor CI workflow to use 'tilelang_ci' virtual environment and update CUDA type printing for better clarity
      
      * Update submodule 'tvm' to latest commit for improved functionality
      
      * Refactor execution backend references from 'dl_pack' to 'dlpack' for consistency and clarity; add apply_simplify function to simplify PrimFunc or IRModule.
      
      * Refactor CUDA code for improved readability; clean up formatting and remove unnecessary whitespace in multiple files.
      
      * Refactor import statement in test_tilelang_kernel_dequantize_gemm.py to use 'tilelang.language' for consistency
      
      * Add CUDA requirements to FP8 test cases and update references for clarity
      
      * Add a blank line for improved readability in test_tilelang_kernel_fp8_gemm_mma.py
      
      * Fix data type in reference result calculation for consistency in test_tilelang_kernel_gemm_mma_intrinsic.py
      
      * Add CUDA requirements and FP8 test cases for matmul and gemv simulations
      
      * Remove debug print statements and use tilelang's testing assertion for result validation in test_tilelang_kernel_gemm_mma_intrinsic.py
      
      * Remove outdated comment regarding FP8 tests in test_tilelang_kernel_gemv_simt.py
      61de5288