[Gen] Fix decode function not using top_p during iterative decoding

7a3bd55f · Tri Dao · 847abe65 · 7a3bd55f
Commit 7a3bd55f authored Aug 26, 2023 by Tri Dao
Show whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

flash_attn/utils/generation.py flash_attn/utils/generation.py +1 -1

No files found.
--- a/flash_attn/utils/generation.py
+++ b/flash_attn/utils/generation.py
@@ -173,7 +173,7 @@ def decode(
                teacher_outputs is None
                or teacher_output_len <= inference_params.sequence_len_offset + 1
            ):
-                next_token = sample(logits, top_k=top_k, temperature=temperature)
+                next_token = sample(logits, top_k=top_k, top_p=top_p, temperature=temperature)
            else:
                next_token = teacher_outputs[:, inference_params.sequence_len_offset + 1]
            sequences.append(next_token)