Fix LLaMa beam search when using parallelize (#24224)

* Fix LLaMa beam search when using parallelize same issue as T5 #11717 * fix code format in modeling_llama.py * fix format of _reorder_cache in modeling_llama.py

Fix LLaMa beam search when using parallelize (#24224)
* Fix LLaMa beam search when using parallelize same issue as T5 #11717 * fix code format in modeling_llama.py * fix format of _reorder_cache in modeling_llama.py
33196b45 · Fei Wang · GitHub · 7504be35 · 33196b45
Unverified Commit 33196b45 authored Jun 15, 2023 by Fei Wang Committed by GitHub Jun 15, 2023
Hide whitespace changes
Inline Side-by-side

Showing with 3 additions and 1 deletion

src/transformers/models/llama/modeling_llama.py src/transformers/models/llama/modeling_llama.py +3 -1

No files found.
--- a/src/transformers/models/llama/modeling_llama.py
+++ b/src/transformers/models/llama/modeling_llama.py
@@ -762,7 +762,9 @@ class LlamaForCausalLM(LlamaPreTrainedModel):
    def _reorder_cache(past_key_values, beam_idx):
        reordered_past = ()
        for layer_past in past_key_values:
-            reordered_past += (tuple(past_state.index_select(0, beam_idx) for past_state in layer_past),)
+            reordered_past += (
+                tuple(past_state.index_select(0, beam_idx.to(past_state.device)) for past_state in layer_past),
+            )
        return reordered_past