Unverified Commit 8692aa88 authored by Peng Wei's avatar Peng Wei Committed by GitHub
Browse files

fixed the issue of DPO trainer that using one node and mutiple GPUs and set...


fixed the issue of DPO trainer that using one node and mutiple GPUs and set the device_map='auto' (#29695)

* fixed the issue of DPO trainer that using one node and mutiple GPUs

* before update, add the assert

* run the ruff formatter

* Update src/transformers/trainer.py

Thank you.
Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>

* remember to do make style and make quality before commit

* Update src/transformers/trainer.py
Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
parent 243d0de9
......@@ -2124,6 +2124,10 @@ class Trainer:
# if loss is nan or inf simply add the average of previous logged losses
tr_loss += tr_loss / (1 + self.state.global_step - self._globalstep_last_logged)
else:
if tr_loss.device != tr_loss_step.device:
raise ValueError(
f"Calculated loss must be on the original device: {tr_loss.device} but device in use is {tr_loss_step.device}"
)
tr_loss += tr_loss_step
self.current_flos += float(self.floating_point_ops(inputs))
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment