"tests/test_flash_mla_dense_decoding.py" did not exist on "7f55c7151acfeaacfd610c022aaa26f836c9fac1"
-
Guangguan authored
When dispatch/combine, neither sender nor receiver waits for the finish of the rdma channel head update, which may result in the remaining inflight head update wqes even after the kernel finished. Once the infight wqes arrive after the rdma channel head buffer cleaning for the next round of dispatch/combine, the rdma channel head buffer will be re- written to a none-zero value. The rdma sender can reuse the data buffer before the rdma receivers consume the date buffer because of the wrong rdma channel head, cauing date error and kernel hung. For performance considering, to overlap the inflight wqes' RTT, fix this issue by waiting for all previous inflight wqes to complete before cleaning rdma buffers in the next round of dispatch/combine. Signed-off-by:Guangguan Wang <guangguan.wang@linux.alibaba.com>
b65b22ed