"git@developer.sourcefind.cn:gaoqiong/migraphx.git" did not exist on "41566e306adb20e6527e36f285a37c481025575a"
  • Lei Wang's avatar
    [AMD] Implement Deepseek MLA for AMD (#363) · e3065f0b
    Lei Wang authored
    * [Bugfix] Correct dynamic shared memory size error handling in HIP wrapper
    
    - Updated the error handling logic in `PREDEF_ATTRIBUTE_SET_DYNAMIC_MEMORY_HIP` to check if the dynamic shared memory size exceeds the maximum limit of 65536.
    - Improved error message clarity by specifying the function name and the attempted size, ensuring better debugging information.
    - Ensured the function returns 0 upon successful setting of the dynamic shared memory size.
    
    * [Add] Implement example for MLA decoding with AMD support
    
    - Introduced a new example script `example_mla_decode_amd.py` demonstrating the use of the flash attention mechanism with AMD hardware.
    - Implemented functions for attention calculation, including support for split processing and combining outputs.
    - Added command-line argument parsing for customizable input parameters such as batch size, number of heads, and dimensions.
    - Included a reference implementation for validation against the Tile-AI output, ensuring correctness of the implementation.
    - Enhanced performance profiling and output comparison for debugging and optimization purposes.
    
    * lint fix
    e3065f0b
gemm_layouts.cc 19.8 KB