[AutoTune] Enable config-performance trace (#174)
* Improve Autotuner and CUDA Compatibility for Tensor Core Policies - Enhance autotuner with robust parallel compilation and error handling - Add logging for better debugging during configuration compilation - Support SM90 compute capabilities in TensorCore and matmul analysis policies - Improve future handling and result tracking in autotuner - Add more flexible SM version checks for pipeline and async copy stages * Refactor Autotuner Parallel Compilation with Improved Error Handling - Enhance tqdm progress bar formatting for concurrent configuration compilation - Simplify exception handling in parallel compilation process - Remove unnecessary logging and improve code readability - Optimize thread pool shutdown and result processing
Showing
Please register or sign in to comment