• Lei Wang's avatar
    [Bugfix] Fix input tensor compatibility checks in AutoTuner (#588) · cce6aed8
    Lei Wang authored
    
    
    * [Refactor] Remove cache existence check in kernel saving logic
    
    - Eliminated redundant checks for existing cache paths in `AutotuneResult` and `AutoTunerCache` classes, simplifying the kernel saving process.
    - Ensured that the cache directory is always created before saving kernel source code, improving reliability in kernel storage.
    
    * [Enhancement] Improve input tensor compatibility checks in AutoTuner
    
    - Enhanced the input tensor caching logic in the AutoTuner class to ensure compatibility between cached tensors and newly generated tensors during configuration trials.
    - Added detailed logging to warn users about potential mismatches in tensor properties, including shape and dtype, when caching is enabled.
    - Implemented a mechanism to regenerate input tensors if compatibility issues are detected, improving the robustness of the autotuning process.
    
    * [Refactor] Update L2 persistent map initialization in CUDA wrapper
    
    - Adjusted the L2 persistent map initialization function to use a consistent size parameter for cache limits and byte counts, improving clarity and reducing potential errors in memory management.
    - Simplified the formatting of the initialization function to enhance readability and maintainability of the code.
    
    * Update tilelang/autotuner/__init__.py
    Co-authored-by: default avatargemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
    
    ---------
    Co-authored-by: default avatargemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
    cce6aed8
__init__.py 28.2 KB