"examples/vscode:/vscode.git/clone" did not exist on "5872e6474c04584d1745629b3bd350745d59a616"
[Bugfix] Put thread_extent into reduce (#640)
* [Enhancement] Update AllReduce operation to include thread offset in kernel generation - Modified the `ReduceOp::Lower` method to incorporate the thread offset in the AllReduce kernel generation for the sm_90 architecture. - This change improves the accuracy of thread management during reduction operations, enhancing performance on specific GPU architectures. * [Enhancement] Refactor thread offset handling in AllReduce kernel generation - Updated the `ReduceOp::Lower` method to streamline the handling of thread offset for AllReduce operations, ensuring consistent usage across different architectures. - This change enhances code clarity and maintains performance improvements for the sm_90 architecture by reducing redundancy in thread offset calculations.
Showing
Please register or sign in to comment