• Lei Wang's avatar
    [Feat] Enhance CUDA Property Handling (#322) · c0378aa9
    Lei Wang authored
    
    
    * [Enhancement] Introduce CUDA driver module and refactor CUDA device handling
    
    - Added a new `cuda_driver` module to encapsulate CUDA device properties and functionalities.
    - Updated `CUDA` class in `cuda.py` to utilize the new driver for fetching device name and shared memory capabilities.
    - Introduced `get_device_name` and `get_shared_memory_per_block` functions in the `cuda_driver` for improved device property management.
    - This refactor enhances code organization and maintainability while improving the handling of CUDA device attributes.
    
    * [Refactor] Clean up whitespace in CUDA-related files
    
    - Removed unnecessary blank lines in `cuda.py`, `__init__.py`, and `cuda_driver.py` to improve code readability and maintainability.
    - This change enhances the overall organization of the codebase without altering functionality.
    
    * [Benchmark] Add FP8 Matrix Multiplication Benchmark Script
    
    - Introduced a new benchmark script for FP8 matrix multiplication in `benchmark/matmul_fp8/benchmark_matmul.py`.
    - The script includes functions for reference matrix multiplication, configuration generation for autotuning, and an autotuned kernel for performance measurement.
    - Added command-line argument parsing for matrix dimensions and the option to enable BitBLAS roller for search space exploration.
    - The benchmark computes and prints the best latency and performance metrics, enhancing the benchmarking capabilities for FP8 operations.
    
    * lint fix
    
    ---------
    Co-authored-by: default avatarLeiWang1999 <wyatuestc@gmail.com>
    c0378aa9
example_gemm.py 8.23 KB