Unverified Commit bc2e5645 authored by DangKai's avatar DangKai Committed by GitHub
Browse files

fix: force synchronization between TP workers when update_weights (#6626)


Co-authored-by: default avatardangkai.dk <dangkai.dk@alibaba-inc.com>
parent 3abc3036
......@@ -2235,6 +2235,7 @@ class Scheduler(
assert flash_cache_success, "Cache flush failed after updating weights"
else:
logger.error(message)
barrier(group=self.tp_cpu_group)
return UpdateWeightsFromTensorReqOutput(success, message)
def get_weights_by_name(self, recv_req: GetWeightsByNameReqInput):
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment