1. 12 Dec, 2025 1 commit
  2. 15 Aug, 2025 2 commits
  3. 23 Jul, 2025 1 commit
  4. 10 May, 2025 1 commit
  5. 03 Apr, 2025 2 commits
    • Lei Wang's avatar
      [Feat] Enhance CUDA Property Handling (#322) · c0378aa9
      Lei Wang authored
      
      
      * [Enhancement] Introduce CUDA driver module and refactor CUDA device handling
      
      - Added a new `cuda_driver` module to encapsulate CUDA device properties and functionalities.
      - Updated `CUDA` class in `cuda.py` to utilize the new driver for fetching device name and shared memory capabilities.
      - Introduced `get_device_name` and `get_shared_memory_per_block` functions in the `cuda_driver` for improved device property management.
      - This refactor enhances code organization and maintainability while improving the handling of CUDA device attributes.
      
      * [Refactor] Clean up whitespace in CUDA-related files
      
      - Removed unnecessary blank lines in `cuda.py`, `__init__.py`, and `cuda_driver.py` to improve code readability and maintainability.
      - This change enhances the overall organization of the codebase without altering functionality.
      
      * [Benchmark] Add FP8 Matrix Multiplication Benchmark Script
      
      - Introduced a new benchmark script for FP8 matrix multiplication in `benchmark/matmul_fp8/benchmark_matmul.py`.
      - The script includes functions for reference matrix multiplication, configuration generation for autotuning, and an autotuned kernel for performance measurement.
      - Added command-line argument parsing for matrix dimensions and the option to enable BitBLAS roller for search space exploration.
      - The benchmark computes and prints the best latency and performance metrics, enhancing the benchmarking capabilities for FP8 operations.
      
      * lint fix
      
      ---------
      Co-authored-by: default avatarLeiWang1999 <wyatuestc@gmail.com>
      c0378aa9
    • yyttt6's avatar
      [Tools] Summarize TFLOPS Information from a tilelang program (#321) · 853898a7
      yyttt6 authored
      * refactor autotune
      
      * refactor autotune
      
      * refactor autotune
      
      * refactor autotune
      
      * format init.py
      
      * add tutorial for autotune
      
      * merge
      
      * merge
      
      * format analyzer
      
      * add readme for analyzer
      
      * format
      
      * [Tools] Summarize TFLOPS Information from a tilelang program
      
      * Summarize TFLOPS Information from a tilelang program
      853898a7