1. 04 Jun, 2025 1 commit
    • Lei Wang's avatar
      [AMD][Enhancement] Add support for Vectorized FP8 DataPacking (#542) · 319bc6b1
      Lei Wang authored
      * [Enhancement] Add support for new FP8 types in HIP code generation
      
      * Updated `PrintConst` function in `codegen_hip.cc` to handle `float8_e4m3fnuz` type.
      * Introduced new functions in `hip_fp8.h` for creating FP8 types, including `make_fp8_e4_4_t` and `make_fp8_e4_8_t`, enhancing type handling for FP8 data structures.
      * Improved overall compatibility and performance for FP8 data types in HIP.
      
      * workaround for competition
      
      * enhance autotune
      
      * autotune cache fix
      
      * Implement validation for unused keys in AutoTuner configuration
      
      * Added a check in the AutoTuner class to raise a ValueError if there are unused keys in the configuration, enhancing error handling and ensuring configuration integrity.
      
      * lint fix
      
      * revert changes of threads
      
      * Update pipelining in `example_mla_decode.py` to improve performance
      
      * Changed the number of stages in the pipelined loop from 0 to 2, enhancing the efficiency of the attention mechanism in the decoding process.
      
      * Enhance Cython kernel validation by adding tensor attribute checks
      
      * Updated the `CythonKernelWrapper` to include dedicated methods for validating tensor device, dtype, and static shape.
      * Modified the `forward` method to utilize these new validation methods, improving error handling and ensuring input integrity.
      * Updated the `lambda_forward` function in `CythonKernelAdapter` to reflect changes in validation parameters.
      319bc6b1
  2. 01 Jun, 2025 1 commit
    • Lei Wang's avatar
      [AMD] Support float8 matrix core (#537) · 5872e647
      Lei Wang authored
      
      
      * [Enhancement] Add support for FP8 types in CUDA and HIP code generation
      
      * Updated `GetFP8Type` function in `codegen_cuda.cc` and `codegen_hip.cc` to handle new FP8 types, including `kFloat8_e4m3fnuz`.
      * Introduced a new header file `hip_fp8.h` for FP8 type definitions in HIP.
      * Modified type mappings in `dlpack.py` and `mfma_macro_generator.py` to accommodate new FP8 types.
      * Enhanced type handling in `TLHIPSourceWrapper` and `tensor.py` for better integration with FP8 types.
      * Added necessary includes and logic to support FP8 in the code generation process, improving performance and compatibility with FP8 data types.
      
      * lint fix
      
      * Update src/target/codegen_hip.cc
      Co-authored-by: default avatargemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
      
      * Update tilelang/intrinsics/mfma_macro_generator.py
      Co-authored-by: default avatargemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
      
      * workaround
      
      * fix
      
      * Update submodule TVM to latest commit 587028ffebfff0ded520f8f90d62f0f6b165906c
      
      * bug fix
      
      * Refactor tilelang matrix multiplication to support transposition and packing options. Adjusted shared memory shapes and loading logic for A and B matrices. Updated test cases to validate new functionality.
      
      * Refactor assertion function for tilelang matrix multiplication to improve readability by formatting parameters and aligning code. Cleaned up whitespace in intrinsic layout functions for consistency.
      
      * Update bfloat16 type definitions in common.h and gemm.h for consistency. Changed __hip_bfloat16 to hip_bfloat16 and updated MfmaTraits specialization accordingly.
      
      * lint fix
      
      ---------
      Co-authored-by: default avatargemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
      5872e647