• alex_xiao's avatar
    Add Flash Attn example on amd mi300 series (#682) · adcba275
    alex_xiao authored
    
    
    * [Enhancement] Refactor buffer index handling for improved precision and clarity (#668)
    
    - Enhanced buffer index handling to address precision issues by removing redundant operations.
    - Streamlined the logic for determining buffer overlaps, ensuring more accurate conflict detection.
    - Updated related documentation to reflect changes in buffer management practices.
    
    * Remove obsolete test script for AMD example, streamlining the examples directory.
    
    * Remove unused dtype_size variable in AMD example script to streamline code.
    
    * Add input configuration file and update AMD example script for enhanced flexibility
    
    - Introduced a new input.txt file for configurable parameters.
    - Modified the example_amd_flash_attn_fwd.py script to allow for a wider range of configurations, including additional options for num_stages, enable_rasterization, and k_pack.
    - Streamlined the main function for better clarity and organization.
    - Added a new test script to facilitate running the example with specified parameters.
    
    * Remove input configuration file and obsolete test script; enhance AMD example with swizzle layout annotations
    
    - Deleted input.txt and test.sh files as they are no longer needed.
    - Updated example_amd_flash_attn_fwd.py to include swizzle layout annotations for shared memory, improving bank conflict avoidance.
    - Reintroduced swizzle usage in the kernel for better performance.
    
    * Refactor AMD example script for FlashAttention-2
    
    - Updated function names for clarity, changing `get_v2_configs` to `get_configs` and `fast_flashattn_v2` to `fast_flashattn`.
    - Streamlined the main function by renaming `main_v2` to `main` and adjusting the corresponding calls.
    - Removed outdated comments and improved code organization for better readability.
    
    * Refactor formatting in AMD FlashAttention example script
    
    - Improved code readability by adjusting line breaks and indentation in the `fast_flashattn` function.
    - Streamlined the `main` function parameter formatting for consistency.
    - Removed unnecessary blank lines to enhance overall code organization.
    
    * Update example_amd_flash_attn_fwd.py
    
    ---------
    Co-authored-by: default avatarxinxyxiao <xinyxiao@amd.com>
    Co-authored-by: default avatarLei Wang <34334180+LeiWang1999@users.noreply.github.com>
    adcba275
example_amd_flash_attn_fwd.py 8.35 KB