• alex_xiao's avatar
    [CI] Fix AMD CI (#729) · a86223f4
    alex_xiao authored
    
    
    * [Enhancement] Refactor buffer index handling for improved precision and clarity (#668)
    
    - Enhanced buffer index handling to address precision issues by removing redundant operations.
    - Streamlined the logic for determining buffer overlaps, ensuring more accurate conflict detection.
    - Updated related documentation to reflect changes in buffer management practices.
    
    * Remove obsolete test script for AMD example, streamlining the examples directory.
    
    * Remove unused dtype_size variable in AMD example script to streamline code.
    
    * Add input configuration file and update AMD example script for enhanced flexibility
    
    - Introduced a new input.txt file for configurable parameters.
    - Modified the example_amd_flash_attn_fwd.py script to allow for a wider range of configurations, including additional options for num_stages, enable_rasterization, and k_pack.
    - Streamlined the main function for better clarity and organization.
    - Added a new test script to facilitate running the example with specified parameters.
    
    * Remove input configuration file and obsolete test script; enhance AMD example with swizzle layout annotations
    
    - Deleted input.txt and test.sh files as they are no longer needed.
    - Updated example_amd_flash_attn_fwd.py to include swizzle layout annotations for shared memory, improving bank conflict avoidance.
    - Reintroduced swizzle usage in the kernel for better performance.
    
    * Refactor AMD example script for FlashAttention-2
    
    - Updated function names for clarity, changing `get_v2_configs` to `get_configs` and `fast_flashattn_v2` to `fast_flashattn`.
    - Streamlined the main function by renaming `main_v2` to `main` and adjusting the corresponding calls.
    - Removed outdated comments and improved code organization for better readability.
    
    * Refactor formatting in AMD FlashAttention example script
    
    - Improved code readability by adjusting line breaks and indentation in the `fast_flashattn` function.
    - Streamlined the `main` function parameter formatting for consistency.
    - Removed unnecessary blank lines to enhance overall code organization.
    
    * Update example_amd_flash_attn_fwd.py
    
    * Enhance AMD example script and update CI workflows
    
    - Improved the `example_amd_flash_attn_fwd.py` script for better clarity and organization.
    - Added new CI workflows for AMD and documentation publishing.
    - Updated various requirements files to include necessary dependencies.
    - Introduced new test cases and examples for better coverage and functionality.
    - Refactored existing code for improved readability and maintainability.
    
    * Remove redundant tool cache cleanup step in AMD CI workflow
    
    * Remove `torch` dependency from `requirements-rocm.txt` to streamline requirements.
    
    ---------
    Co-authored-by: default avatarxinxyxiao <xinyxiao@amd.com>
    Co-authored-by: default avatarLei Wang <34334180+LeiWang1999@users.noreply.github.com>
    a86223f4
amd_ci.yml 4.16 KB