• Haodong Tian's avatar
    [Bugfix] Resolve autotuner bugs for blocksparse GEMM example (#300) · 92e8d5f4
    Haodong Tian authored
    * [Bugfix] Configure autotuner specific logger for correct level handling
    - Previously, logging relied on basicConfig, which configured the root logger. This caused the named autotuner logger to ignore DEBUG messages.
    - This commit sets up a dedicated logger for autotuner, correctly route DEBUG messages to 'autotuner.log' and INFO+ messages to the console.
    
    * [Bugfix] Fix tensor_supply for boolean type
    - Previously `get_tensor_supply` used `torch.randint(-2, 3)` as a fallback, which caused error when the dtype was `torch.bool`.
    - This commits adds an `is_boolean` check in `KernelParam` and updates `get_tensor_supply` to specifically use `torch.randint(0, 2)` for boolean dtypes.
    
    * [Bugfix] Always regenerate JIT inputs during tuning
    - Removes the caching for `self.jit_input_tensors` within `AutoTuner`. When different autotuning configurations can alter the required input tensor shapes or other properties, reusing cached inputs from a previous configuration lead to errors or incorrect assessments.
    - This change ensures that `profiler._get_inputs()` is called unconditionally for each configuration evaluation. Since `_get_inputs` is assumed to be relatively inexpensive, the potential overhead is considered acceptable.
    
    * [Example] Update example_blocksparse_gemm for autotuner
    
    * Run code formatter
    
    * [Feature] Enable custom tensor supply and input caching control in Autotuner
    - Previously, tensor generation was tied to `supply_type` and input caching behavior across configurations was less explicit/controlled.
    - This commit introduces a `supply_prog` parameter to allow providing a custom function for generating input tensors, overriding the default mechanism.
    - Adds a `cache_input_tensors` flag (default True) to control input tensor caching:
        - If True, tensors are generated once per configuration and reused for repetitions, with a check for potential shape mismatches between configurations.
        - If False, tensors are regenerated for every configuration trial.
    - Refactors internal input tensor handling using supplier functions for clarity.
    - Adds a `check_tensor_list_compatibility` utility for shape comparison.
    
    * [Example] Update example_blocksparse_gemm for autotuner
    
    * Run code formatter
    
    * [Example] Small fix in example_blocksparse_gemm
    
    * [Fix] Raise error if autotuning yields no valid configuration
    92e8d5f4
example_blocksparse_gemm.py 9.08 KB