• Lei Wang's avatar
    [Bugfix] Disable force inline for ldmatrix (#227) · a1da26f2
    Lei Wang authored
    * Refactor GEMM and Bulk Copy operations to enhance layout handling and support for Hopper architecture
    
    - Update `ComputeWarpPartition` to include a new parameter for Hopper WGMMA support.
    - Modify layout checks in `LowerBulkCopy` to accommodate new GEMM layout types.
    - Enhance layout inference logic in `InferLayout` for better compatibility with Hopper architecture.
    - Include necessary header files for built-in operations and layout inference improvements.
    
    * Refactor parameter formatting in CUDA matrix load functions for consistency
    
    - Adjusted parameter alignment in `ptx_ldmatrix_x1`, `ptx_ldmatrix_x2`, `ptx_ldmatrix_x4`, and their transposed counterparts for improved readability.
    - Added a blank line in `get_tensor_supply` function in `tensor.py` to enhance code clarity.
    
    * Enhance tensor supply generation in `get_tensor_supply` function
    
    - Introduced handling for unsigned integer and float8 tensor types, allowing for specific random tensor generation based on data type.
    - Updated logic to return appropriate random tensors for different data types, improving flexibility and functionality of tensor supply generation.
    - Refactored existing conditions for clarity and maintainability.
    
    * Fix tensor supply generation logic in `get_tensor_supply` function
    
    - Updated the variable reference from `tensor` to `param` to ensure correct handling of tensor data types.
    - Improved the accuracy of unsigned integer and float8 checks for tensor supply generation, enhancing functionality and reliability.
    
    * Enhance tensor supply checks in `get_tensor_supply` function
    
    - Updated the logic for identifying unsigned integers and float8 types by using `removeprefix` on the dtype string, improving accuracy in tensor supply generation.
    - Ensured better handling of tensor data types for more reliable random tensor generation based on the updated checks.
    
    * Enhance KernelParam functionality and improve tensor supply checks
    
    - Added methods `is_unsigned` and `is_float8` to the `KernelParam` class for better type identification of parameters.
    - Updated the `get_tensor_supply` function to utilize the new methods, improving clarity and accuracy in tensor supply generation based on parameter types.
    a1da26f2
common.h 5 KB