[MLA][AMD] Add amd mla benchmarking (#367)
* [Add] Introduce benchmark scripts for MLA decoding with AMD support
- Added three new benchmark scripts: `benchmark_mla_decode_amd_tilelang.py`, `benchmark_mla_decode_amd_torch.py`, and `benchmark_mla_decode_amd_triton.py` to evaluate the performance of the MLA decoding mechanism across different frameworks.
- Each script includes implementations for attention calculation, performance profiling, and output validation against reference implementations.
- Enhanced command-line argument parsing for customizable input parameters, including batch size, number of heads, and dimensions.
- Integrated performance comparison functionality to facilitate benchmarking between different implementations.
* lint fix
* lint fix
---------
Co-authored-by:
Zhiwen Mo <zhiwen.mo25@ic.ac.uk>
Showing
Please register or sign in to comment