"...resnet50_tensorflow.git" did not exist on "31a8e4664ba8ffca8d14b051f8c3a7ec3b5b91d1"
-
Lei Wang authored
[Layout] Introduce Flexible Parallel to Support T.serial and local buffers inside T.Parallel loop (#844) * Support T.serial and local buffers inside T.Parallel loop. * Fix reducer layout in T.Parallel nested inside other loops * Debug output with LOG(INFO) * Add disable option for WGMMA. * fix * Use DLOG; fix missing registration for new pass config * bug fix * lint fix * Enhance GEMM instruction set with UTCMMA and improve local buffer handling in casting example * Update format.sh shebang, improve logging in layout inference, and enhance buffer store wrapper with detailed comments * Enhance GEMM instantiation logic and improve layout inference for local buffer detection - Updated the GEMM instantiation logic to include a check for WGMMA compatibility, ensuring that the conditions for using WGMMA are more robust. - Refined the layout inference process to better identify when loops manipulate only local buffers, improving the accuracy of thread binding decisions in parallel loops. --------- Co-authored-by:Huanqi Cao <caohuanqi@deepseek.com>
c382dcbc