Fix gather when collecting 'num_input_tokens_seen' (#31974)

* Move token count to device before gathering * Run 'make style; make quality'

Fix gather when collecting 'num_input_tokens_seen' (#31974)
* Move token count to device before gathering * Run 'make style; make quality'
e3917064 · Alexander Wettig · GitHub · c22efa61 · e3917064
Unverified Commit e3917064 authored Jul 16, 2024 by Alexander Wettig Committed by GitHub Jul 16, 2024
Hide whitespace changes
Inline Side-by-side

Showing with 10 additions and 5 deletions

src/transformers/trainer.py src/transformers/trainer.py +10 -5

No files found.
--- a/src/transformers/trainer.py
+++ b/src/transformers/trainer.py
@@ -2245,12 +2245,17 @@ class Trainer:
                            "a `main_input_name` attribute to the model class you are using."
                        )
                    else:
-                        input_device = inputs[main_input_name].device
+                        self.state.num_input_tokens_seen += (
-                        self.state.num_input_tokens_seen += torch.sum(
+                            torch.sum(
-                            self.accelerator.gather(
+                                self.accelerator.gather(
-                                torch.tensor(inputs[main_input_name].numel(), device=input_device, dtype=torch.int64)
+                                    torch.tensor(
+                                        inputs[main_input_name].numel(), device=self.args.device, dtype=torch.int64
+                                    )
+                                )
                            )
-                        ).item()
+                            .cpu()
+                            .item()
+                        )
                if rng_to_sync:
                    self._load_rng_state(resume_from_checkpoint)
                    rng_to_sync = False