examples/blocksparse_gemm/example_blocksparse_gemm.py · d3536d9efab35511224926d7104b6e855b207097 · OpenDAS / tilelang

[Bugfix] Resolve autotuner bugs for blocksparse GEMM example (#300) · 92e8d5f4

Haodong Tian authored Mar 30, 2025

* [Bugfix] Configure autotuner specific logger for correct level handling
- Previously, logging relied on basicConfig, which configured the root logger. This caused the named autotuner logger to ignore DEBUG messages.
- This commit sets up a dedicated logger for autotuner, correctly route DEBUG messages to 'autotuner.log' and INFO+ messages to the console.

* [Bugfix] Fix tensor_supply for boolean type
- Previously `get_tensor_supply` used `torch.randint(-2, 3)` as a fallback, which caused error when the dtype was `torch.bool`.
- This commits adds an `is_boolean` check in `KernelParam` and updates `get_tensor_supply` to specifically use `torch.randint(0, 2)` for boolean dtypes.

* [Bugfix] Always regenerate JIT inputs during tuning
- Removes the caching for `self.jit_input_tensors` within `AutoTuner`. When different autotuning configurations can alter the required input tensor shapes or other properties, reusing cached inputs from a previous configuration lead to errors or incorrect assessments.
- This change ensures that `profiler._get_inputs()` is called unconditionally for each configuration evaluation. Since `_get_inputs` is assumed to be relatively inexpensive, the potential overhead is considered acceptable.

* [Example] Update example_blocksparse_gemm for autotuner

* Run code formatter

* [Feature] Enable custom tensor supply and input caching control in Autotuner
- Previously, tensor generation was tied to `supply_type` and input caching behavior across configurations was less explicit/controlled.
- This commit introduces a `supply_prog` parameter to allow providing a custom function for generating input tensors, overriding the default mechanism.
- Adds a `cache_input_tensors` flag (default True) to control input tensor caching:
- If True, tensors are generated once per configuration and reused for repetitions, with a check for potential shape mismatches between configurations.
- If False, tensors are regenerated for every configuration trial.
- Refactors internal input tensor handling using supplier functions for clarity.
- Adds a `check_tensor_list_compatibility` utility for shape comparison.

* [Example] Update example_blocksparse_gemm for autotuner

* Run code formatter

* [Example] Small fix in example_blocksparse_gemm

* [Fix] Raise error if autotuning yields no valid configuration

92e8d5f4

example_blocksparse_gemm.py 9.08 KB

Replace example_blocksparse_gemm.py