Hot-fix-mixstral-loss (#27948)

* fix loss computation * compute on GPU if possible

Hot-fix-mixstral-loss (#27948)
* fix loss computation * compute on GPU if possible
680c610f · Arthur · GitHub · 4b759da8 · 680c610f
Unverified Commit 680c610f authored Dec 12, 2023 by Arthur Committed by GitHub Dec 12, 2023
Hide whitespace changes
Inline Side-by-side

Showing with 2 additions and 1 deletion

src/transformers/models/mixtral/modeling_mixtral.py src/transformers/models/mixtral/modeling_mixtral.py +2 -1

No files found.
--- a/src/transformers/models/mixtral/modeling_mixtral.py
+++ b/src/transformers/models/mixtral/modeling_mixtral.py
@@ -95,7 +95,8 @@ def load_balancing_loss_func(gate_logits: torch.Tensor, num_experts: torch.Tenso

    if isinstance(gate_logits, tuple):
        # cat along the layers?
-        gate_logits = torch.cat(gate_logits, dim=0)
+        compute_device = gate_logits[0].device
+        gate_logits = torch.cat([gate.to(compute_device) for gate in gate_logits], dim=0)

    routing_weights, selected_experts = torch.topk(gate_logits, top_k, dim=-1)
    routing_weights = routing_weights.softmax(dim=-1)