- 12 Mar, 2025 1 commit
-
-
Lei Wang authored
* Optimize CMake build process with dynamic job count calculation - Modify build_csrc function to use 90% of available CPU cores - Ensure at least one job is used during compilation - Improve build performance by dynamically adjusting parallel job count * Optimize build_csrc function with multiprocessing module - Replace os.cpu_count() with multiprocessing.cpu_count() - Maintain existing 90% CPU utilization logic - Improve CPU core count calculation for build process * Add dynamic shape support with out_idx in Cython JIT kernel compilation - Implement `run_cython_dynamic_shape_with_out_idx` function in test_tilelang_jit_gemm_cython.py - Update Cython wrapper to handle dynamic symbolic shapes during tensor allocation - Add support for resolving dynamic shape dimensions using input tensor references - Enhance flexibility of JIT kernel compilation with symbolic shape handling * Enhance error reporting for dynamic symbolic shape resolution in Cython JIT kernel - Add detailed error message when a dynamic symbolic dimension is not found in dynamic_symbolic_map - Improve debugging by providing context about missing symbolic dimensions - Maintain existing dynamic shape resolution logic * Fix Copy operation handling for scalar and multi-dimensional tensors - Add special handling for scalar tensor copy operations - Enhance error reporting in MakeIndices method with more detailed diagnostic information - Improve SIMT loop generation to support zero-dimensional tensors - Add explicit check and handling for scalar tensor scenarios * Refactor Copy operation code formatting and improve readability - Improve code formatting in MakeIndices and MakeSIMTLoop methods - Add line breaks to enhance readability of complex ICHECK statements - Simplify code structure in scalar tensor handling - Remove unnecessary whitespace and improve code alignment * Simplify GEMM example with direct kernel compilation - Update copyright header to Tile-AI Corporation - Remove Profiler import and usage - Replace tilelang.lower() with tilelang.compile() - Simplify kernel execution workflow - Update kernel source retrieval method * Enhance block sparse attention implementation - Update `blocksparse_flashattn` to use 2 stages for improved performance. - Change `block_mask_dtype` from `int8` to `bool` for better memory efficiency. - Modify condition checks in the kernel to utilize boolean values. - Introduce a new example for top-k sparse attention and a benchmark for native sparse attention. - Add support for asynchronous copy in PTX and improve pipeline planning with condition handling. * Refactor and clean up code formatting across multiple files - Added whitespace for improved readability in `example_blocksparse_gemm.py`, `example_tilelang_nsa_fwd.py`, and `benchmark_nsa_fwd.py`. - Enhanced code structure and alignment in `inject_ptx_async_copy.cc` and `pipeline_planning.cc`. - Updated comments and documentation for clarity in `__init__.py` and `phase.py`. - Ensured consistent formatting and style across the codebase.
-
- 10 Mar, 2025 1 commit
-
-
Lei Wang authored
* Update native sparse attention example with scale parameter handling - Add scale parameter processing in native_sparse_attention function - Modify example script to include custom scale value - Update function calls to pass scale parameter - Enhance flexibility of sparse attention implementation * Refactor Triton Native Sparse Attention Example - Improve code formatting and readability in example_triton_nsa_bwd.py - Standardize function and parameter alignment - Remove unnecessary whitespaces and optimize imports - Enhance code style consistency with previous commits
-
- 07 Mar, 2025 2 commits
-
-
Lei Wang authored
* [Refactor] Update BitBLAS Benchmark with TileLang Carver Imports and Roller Hints Generation - Replace BitBLAS imports with TileLang Carver imports in benchmark_matmul.py - Modify roller hints generation using new TileLang Carver template and utility functions - Update get_roller_hints_from_func to handle None cases and improve return logic - Adjust DefaultPolicy to handle different codegen dictionary formats * [Refactor] Update Thread Binding and Import Statements in TileLang Kernels - Replace T.thread_binding() with T.get_thread_binding() across multiple kernel test files - Update import statements for MMA layout and macro generator in dequantize GEMM and FP8 examples - Move map_torch_type utility function to tilelang.utils.tensor - Remove unnecessary imports and improve code organization * Refactor Native Sparse Attention Example with Enhanced Triton Kernel - Update parallel_nsa_fwd_kernel to support more flexible sparse attention computation - Add support for block counts and offsets in the Triton kernel - Modify kernel grid and computation logic for improved performance - Update example script to use naive_nsa_simple reference implementation - Improve type hints and kernel configuration * Add Native Sparse Attention Examples with Tilelang and Triton Implementations - Introduce new example scripts for native sparse attention: * example_tilelang_nsa_fwd.py: Forward pass implementation using TileLang * example_tilelang_nsa_decode.py: Decoding-specific sparse attention implementation * example_triton_nsa_fwd.py: Triton-based sparse attention forward pass - Update reference.py with naive implementations for sparse attention - Support different sparse attention scenarios including forward pass and inference - Add comprehensive testing and validation against reference implementations * lint fix * Add Variable-Length Native Sparse Attention Examples for TileLang and Triton - Introduce new example scripts for variable-length native sparse attention: * example_tilelang_nsa_fwd_varlen.py: TileLang implementation with variable sequence lengths * example_triton_nsa_fwd_varlen.py: Triton implementation with variable sequence lengths - Update reference.py to support variable-length sparse attention scenarios - Enhance existing sparse attention implementations to handle variable-length inputs - Add comprehensive testing and validation for variable-length sparse attention * Refactor Native Sparse Attention Examples: Code Style and Formatting Improvements - Standardize function and parameter formatting across NSA example files - Improve code readability by adjusting indentation and line breaks - Enhance type hints and parameter alignment - Remove unnecessary whitespaces and optimize imports - Maintain consistent code style across TileLang and Triton implementations
-
Lei Wang authored
* [Refactor] Update BitBLAS Benchmark with TileLang Carver Imports and Roller Hints Generation - Replace BitBLAS imports with TileLang Carver imports in benchmark_matmul.py - Modify roller hints generation using new TileLang Carver template and utility functions - Update get_roller_hints_from_func to handle None cases and improve return logic - Adjust DefaultPolicy to handle different codegen dictionary formats * [Refactor] Update Thread Binding and Import Statements in TileLang Kernels - Replace T.thread_binding() with T.get_thread_binding() across multiple kernel test files - Update import statements for MMA layout and macro generator in dequantize GEMM and FP8 examples - Move map_torch_type utility function to tilelang.utils.tensor - Remove unnecessary imports and improve code organization * Refactor Native Sparse Attention Example with Enhanced Triton Kernel - Update parallel_nsa_fwd_kernel to support more flexible sparse attention computation - Add support for block counts and offsets in the Triton kernel - Modify kernel grid and computation logic for improved performance - Update example script to use naive_nsa_simple reference implementation - Improve type hints and kernel configuration * Add Native Sparse Attention Examples with Tilelang and Triton Implementations - Introduce new example scripts for native sparse attention: * example_tilelang_nsa_fwd.py: Forward pass implementation using TileLang * example_tilelang_nsa_decode.py: Decoding-specific sparse attention implementation * example_triton_nsa_fwd.py: Triton-based sparse attention forward pass - Update reference.py with naive implementations for sparse attention - Support different sparse attention scenarios including forward pass and inference - Add comprehensive testing and validation against reference implementations * lint fix
-