- 15 Aug, 2025 1 commit
-
-
Gabriel Wu authored
* chore: fix typos * chore: fix ruff * chore: fix clang-format
-
- 23 Jul, 2025 1 commit
-
-
Zhang Jason authored
Co-authored-by:zhangnju <ningzhan@SMC-SC-DI08-33.dh144.dcgpu>
-
- 10 May, 2025 1 commit
-
-
yyttt6 authored
* yes * [Bugfix] fix the unexpected keyword error of autotune * format * test * [CI] Add Analyzer and blocksparse_attention examples to CI * format * try * try * try * try * t * format * d --------- Co-authored-by:Lei Wang <34334180+LeiWang1999@users.noreply.github.com>
-
- 03 Apr, 2025 2 commits
-
-
Lei Wang authored
* [Enhancement] Introduce CUDA driver module and refactor CUDA device handling - Added a new `cuda_driver` module to encapsulate CUDA device properties and functionalities. - Updated `CUDA` class in `cuda.py` to utilize the new driver for fetching device name and shared memory capabilities. - Introduced `get_device_name` and `get_shared_memory_per_block` functions in the `cuda_driver` for improved device property management. - This refactor enhances code organization and maintainability while improving the handling of CUDA device attributes. * [Refactor] Clean up whitespace in CUDA-related files - Removed unnecessary blank lines in `cuda.py`, `__init__.py`, and `cuda_driver.py` to improve code readability and maintainability. - This change enhances the overall organization of the codebase without altering functionality. * [Benchmark] Add FP8 Matrix Multiplication Benchmark Script - Introduced a new benchmark script for FP8 matrix multiplication in `benchmark/matmul_fp8/benchmark_matmul.py`. - The script includes functions for reference matrix multiplication, configuration generation for autotuning, and an autotuned kernel for performance measurement. - Added command-line argument parsing for matrix dimensions and the option to enable BitBLAS roller for search space exploration. - The benchmark computes and prints the best latency and performance metrics, enhancing the benchmarking capabilities for FP8 operations. * lint fix --------- Co-authored-by:LeiWang1999 <wyatuestc@gmail.com>
-
yyttt6 authored
* refactor autotune * refactor autotune * refactor autotune * refactor autotune * format init.py * add tutorial for autotune * merge * merge * format analyzer * add readme for analyzer * format * [Tools] Summarize TFLOPS Information from a tilelang program * Summarize TFLOPS Information from a tilelang program
-