• Lei Wang's avatar
    [Refactor] Improve flash attention example and layout comparison logic (#270) · 5f5bf53c
    Lei Wang authored
    * [Refactor] Improve flash attention example and layout comparison logic
    
    - Removed unnecessary annotation for `lse_local_split` in the flash attention example to streamline the code.
    - Updated the handling of `lse_local_split` to utilize parallel processing for better performance.
    - Refactored kernel compilation and profiling logic to enhance clarity and maintainability in the flash attention example.
    - Added a condition in `FragmentNode::IsEqual` to handle broadcast cases, improving the robustness of layout comparisons.
    
    * lint fix
    
    * [Enhancement] Add support for shared memory scope in Fill operation
    
    - Introduced handling for `shared.dyn` and `shared` memory scopes in the Fill operation.
    - Implemented parallel operation and layout inference for improved performance in shared memory scenarios.
    - Updated thread loop partitioning and vectorization logic to accommodate new memory scope handling.
    5f5bf53c
layout.cc 16 KB