"src/include/gridwise_direct_convolution_1.hip.hpp" did not exist on "8732ea04fba4672a2ab4098289935f0a1bec7fc7"
  1. 20 Mar, 2025 1 commit
    • Lei Wang's avatar
      [Refactor] Phaseout LLVM Dependency by Making it Optional (#247) · f2e99180
      Lei Wang authored
      * remove llvm build
      
      * [Refactor] Update kernel compilation and profiling in examples
      
      - Replaced `tilelang.lower` with `tilelang.compile` in multiple example scripts to streamline kernel compilation.
      - Updated profiling calls to utilize the new `get_profiler` method, enhancing performance measurement consistency.
      - Adjusted assertions and benchmarking methods to align with the new profiling structure across various examples, ensuring correctness and clarity in performance evaluations.
      
      * lint fix
      
      * License Update
      
      * [Refactor] Improve code formatting and documentation in CUDA header and HIP runtime files
      
      - Adjusted formatting in `cuda.h` for better readability, including alignment of comments and struct fields.
      - Cleaned up whitespace and improved comment clarity in `rt_mod_hip.cc` to enhance code maintainability.
      
      * [Refactor] Enhance formatting and clarity in CUDA header and HIP runtime files
      
      - Improved comment alignment and readability in `cuda.h`.
      - Cleaned up whitespace and formatting in `rt_mod_hip.cc` to enhance maintainability.
      
      * lint fix
      
      * lint fix
      
      * lint fix
      
      * lint fix
      
      * fix
      
      * License update
      
      * [Enhancement] Update JITKernel to use artifact for kernel source
      
      - Assigned the generated artifact to `self.artifact` for better management.
      - Updated kernel source references to use `artifact.kernel_source` for consistency in execution backend handling.
      
      * lint fix
      
      * Add @tilelang.testing.requires_llvm decorator to vectorization tests
      
      * Enhance setup.py and env.py for library management
      
      - Added functionality to remove original files after copying in CMakeBuild.
      - Updated TVM_LIBRARY_PATH in env.py to include the PyPI build library path for better integration.
      
      * Refactor TVM_LIBRARY_PATH assignment for improved readability in env.py
      
      * Refactor CMakeBuild file handling in setup.py
      
      - Added a check to ensure the target library directory exists before copying .so files.
      - Improved the logic for creating the target directory and copying files to enhance robustness.
      
      * bugfix
      
      * Rename BuildTLDebug to BuildTileLangCUDAWithoutCompile and update registration. Add @tilelang.testing.requires_llvm decorator to multiple tests for LLVM requirement.
      
      * lint fix
      
      * Enhance TileLang code generation by adding support for device code generation without compilation. Updated `host_codegen` and `device_codegen` functions to include new transformations and registration for `tilelang_hip_without_compile`. Refactored JIT kernel adapters to accommodate host and device modules, improving overall integration and flexibility.
      
      * lint fix
      
      * Add support for C target in device code generation
      
      - Updated `device_codegen_without_compile` to include handling for the C target by registering the `tilelang_cpp` function.
      
      * [Enhancement] Implement auto-clear cache feature based on environment variable
      
      * Added TILELANG_CLEAR_CACHE environment variable to control cache clearing.
      * Updated CI workflow to set TILELANG_CLEAR_CACHE during testing.
      * Modified cache initialization to clear cache if TILELANG_CLEAR_CACHE is set to true.
      
      * [Refactor] Update kernel invocation and import paths in tests and cache
      
      * Changed kernel invocation in `test_tilelang_kernel_dequantize_gemm.py` to return the result.
      * Updated import statements in `test_tilelang_kernel_int4_gemm_mma.py` to use `bitblas` instead of `tilelang`.
      * Refactored paths for artifact and parameters in `kernel_cache.py` for better maintainability.
      
      * [Refactor] Clean up whitespace and improve code formatting in kernel_cache.py
      
      * Removed unnecessary blank lines and adjusted spacing for better readability in the KernelCache class.
      * Enhanced overall code formatting to align with project standards.
      
      * [Enhancement] Add bfloat16 test case and improve kernel caching logic
      
      * Introduced a new test case for bfloat16 matrix multiplication in `test_tilelang_kernel_gemm_mma_intrinsic.py`.
      * Updated `KernelCache` to handle multiple kernel source files and improve error handling during saving and loading.
      * Refactored `JITKernel` to support instantiation from a database, enhancing flexibility in kernel management.
      * Adjusted `CtypesKernelAdapter` and `CythonKernelAdapter` to utilize the new kernel loading mechanism from the database.
      * Improved code formatting and readability across several files.
      
      * lint fix
      
      * Update bfloat16 matrix multiplication test case to use larger dimensions for improved coverage
      f2e99180
  2. 17 Mar, 2025 1 commit
    • Lei Wang's avatar
      [Bugfix] Disable force inline for ldmatrix (#227) · a1da26f2
      Lei Wang authored
      * Refactor GEMM and Bulk Copy operations to enhance layout handling and support for Hopper architecture
      
      - Update `ComputeWarpPartition` to include a new parameter for Hopper WGMMA support.
      - Modify layout checks in `LowerBulkCopy` to accommodate new GEMM layout types.
      - Enhance layout inference logic in `InferLayout` for better compatibility with Hopper architecture.
      - Include necessary header files for built-in operations and layout inference improvements.
      
      * Refactor parameter formatting in CUDA matrix load functions for consistency
      
      - Adjusted parameter alignment in `ptx_ldmatrix_x1`, `ptx_ldmatrix_x2`, `ptx_ldmatrix_x4`, and their transposed counterparts for improved readability.
      - Added a blank line in `get_tensor_supply` function in `tensor.py` to enhance code clarity.
      
      * Enhance tensor supply generation in `get_tensor_supply` function
      
      - Introduced handling for unsigned integer and float8 tensor types, allowing for specific random tensor generation based on data type.
      - Updated logic to return appropriate random tensors for different data types, improving flexibility and functionality of tensor supply generation.
      - Refactored existing conditions for clarity and maintainability.
      
      * Fix tensor supply generation logic in `get_tensor_supply` function
      
      - Updated the variable reference from `tensor` to `param` to ensure correct handling of tensor data types.
      - Improved the accuracy of unsigned integer and float8 checks for tensor supply generation, enhancing functionality and reliability.
      
      * Enhance tensor supply checks in `get_tensor_supply` function
      
      - Updated the logic for identifying unsigned integers and float8 types by using `removeprefix` on the dtype string, improving accuracy in tensor supply generation.
      - Ensured better handling of tensor data types for more reliable random tensor generation based on the updated checks.
      
      * Enhance KernelParam functionality and improve tensor supply checks
      
      - Added methods `is_unsigned` and `is_float8` to the `KernelParam` class for better type identification of parameters.
      - Updated the `get_tensor_supply` function to utilize the new methods, improving clarity and accuracy in tensor supply generation based on parameter types.
      a1da26f2
  3. 16 Mar, 2025 1 commit
    • Lei Wang's avatar
      [Refactor] Introduce KernelParam integration across modules (#223) · 3de9f13c
      Lei Wang authored
      * [Refactor] Update KernelParam integration across modules
      
      - Replaced instances of TensorType with KernelParam in various modules to standardize parameter handling.
      - Updated JITKernel, BaseKernelAdapter, and CythonKernelAdapter to utilize KernelParam for improved type consistency.
      - Enhanced Profiler class to include KernelParam in its parameters, ensuring better integration with the new parameter structure.
      - Adjusted tensor handling in utility functions to accommodate the new KernelParam type, improving overall code clarity and maintainability.
      - Updated copyright headers to reflect the correct organization.
      
      * [Refactor] Clean up whitespace in kernel, profiler, and tensor modules
      
      - Added blank lines for improved readability in kernel.py, __init__.py, and tensor.py.
      - Enhanced code clarity by ensuring consistent formatting across these modules.
      
      * [Enhancement] Add detailed docstrings to KernelParam and Profiler classes
      
      - Enhanced KernelParam class with comprehensive docstrings for better understanding of its purpose and methods.
      - Updated Profiler class to include detailed docstrings for its attributes and methods, improving code documentation and usability.
      - Removed unused do_bench function to streamline the profiler module and improve clarity.
      
      * [Refactor] Update type hints in do_bench function and clean up whitespace in profiler module
      
      - Changed type hints for grad_to_none and quantiles parameters in do_bench function to use Optional for better clarity.
      - Added a blank line in __init__.py for improved readability and consistency in the profiler module.
      
      * [Refactor] Update type hint in do_bench function for consistency
      
      - Changed the return type hint in the do_bench function from a union type to a more explicit List type for better clarity and consistency in type annotations.
      
      * [Refactor] Update return type hint in do_bench function for clarity
      
      - Changed the return type hint in the do_bench function from a union type to Union[float, List[float]] for improved clarity and consistency in type annotations.
      
      * [Enhancement] Add func property to Profiler class for adapter access
      
      - Introduced a new property `func` in the Profiler class to provide access to the adapter, ensuring that the adapter is set before retrieval. This enhancement improves the usability of the Profiler class by allowing easier access to the adapter functionality.
      
      * [Refactor] Update kernel compilation and profiling in tests
      
      - Replaced instances of `TL.lower` and `TL.Profiler` with `tilelang.compile` and the new profiler interface across multiple test files.
      - Enhanced the kernel compilation process to utilize the updated API, improving consistency and maintainability in the testing framework.
      - Updated assertions to use the new profiler methods for better clarity and functionality in performance testing.
      
      * [Refactor] Simplify kernel invocation and remove unused parameters in tests
      
      - Updated the kernel invocation in `test_tilelang_dynamic_symbolic.py` to directly assign the result to `C`, improving clarity.
      - Removed the `execution_backend` parameter from `tilelang.compile` calls in `test_tilelang_jit_callback.py` and `test_tilelang_jit_gemm.py` for consistency with the updated API.
      - Commented out the call to `tilelang.testing.main()` in `test_tilelang_jit_callback.py` and replaced it with a direct call to `test_gemm_jit_kernel()` to streamline test execution.
      - Adjusted the dtype mapping in `TorchDLPackKernelAdapter` to use the parameter's dtype directly, enhancing code simplicity.
      
      * [Refactor] Remove unused imports in test files for cleaner code
      
      - Eliminated unnecessary imports of `tilelang` as `TL` in various test files to enhance code clarity and maintainability.
      - Updated multiple test files to streamline the codebase and reduce potential confusion from unused references.
      
      * [Refactor] Simplify kernel invocation in tilelang kernel test
      
      - Updated the kernel invocation in `test_tilelang_kernel_bf16_gemm_mma.py` to directly assign the result to `C`, enhancing code clarity and consistency with recent changes in the API.
      
      * [Refactor] Simplify kernel invocation in tilelang kernel tests
      
      - Updated kernel invocations in multiple test files to directly assign the result to `C`, improving code clarity and consistency with the updated API.
      - Removed unnecessary initialization of `C` as a zero tensor, streamlining the code further.
      
      * [Refactor] Update kernel invocation in tilelang transform tests
      
      - Replaced the use of `TL.Profiler` with `tilelang.compile` in `test_tilelang_transform_simplify.py`, enhancing code clarity and consistency with the updated API.
      - Streamlined the kernel invocation process by directly assigning the result to `C`, improving readability and maintainability of the test code.
      3de9f13c