• Lei Wang's avatar
    [Enhancement] Support index bit width configuration (#343) · 70546adc
    Lei Wang authored
    
    
    * [Refactor] Clean up whitespace in CUDA-related files
    
    - Removed unnecessary blank lines in `cuda.py`, `__init__.py`, and `cuda_driver.py` to improve code readability and maintainability.
    - This change enhances the overall organization of the codebase without altering functionality.
    
    * [Benchmark] Add FP8 Matrix Multiplication Benchmark Script
    
    - Introduced a new benchmark script for FP8 matrix multiplication in `benchmark/matmul_fp8/benchmark_matmul.py`.
    - The script includes functions for reference matrix multiplication, configuration generation for autotuning, and an autotuned kernel for performance measurement.
    - Added command-line argument parsing for matrix dimensions and the option to enable BitBLAS roller for search space exploration.
    - The benchmark computes and prints the best latency and performance metrics, enhancing the benchmarking capabilities for FP8 operations.
    
    * lint fix
    
    * Enhance variable creation by associating data types in IR and layout files, and introduce ExpandIndexDataType transformation
    
    - Updated variable creation in `ir.cc`, `gemm_layouts.cc`, and `elem.cc` to include data types for better type safety.
    - Added a new transformation `ExpandIndexDataType` to promote integer types to int64 where necessary, improving compatibility and performance.
    - Integrated the new transformation into the optimization pipeline in `phase.py`.
    - Documented the new transformation in `__init__.py` for clarity.
    
    * lint fix
    
    * Add configuration option for index bitwidth and remove ExpandIndexDataType transformation
    
    - Introduced a new pass configuration option `kConfigIndexBitwidth` to allow customization of index bitwidth.
    - Updated the optimization pipeline in `phase.py` to utilize the new configuration option instead of the removed `ExpandIndexDataType` transformation.
    - Documented the new configuration option in the JIT compilation function's parameters for clarity.
    - Removed the `ExpandIndexDataType` transformation implementation from the codebase to streamline the transformation process.
    
    * lint fix
    
    * Refactor index bitwidth configuration handling
    
    - Updated the `ConfigIndexBitwidth` pass to only apply the bitwidth transformation if the configuration option is defined, preventing potential errors with undefined values.
    - Changed the default value of `tl.config_index_bitwidth` in the JIT compilation function's parameters from 32 to None for better clarity and flexibility.
    
    * lint fix
    
    * lint fix
    
    ---------
    Co-authored-by: default avatarLeiWang1999 <wyatuestc@gmail.com>
    70546adc
builtin.cc 4.16 KB