fix: Fix typical_p behaviour broken in recent change (#27165)

A recent PR https://github.com/huggingface/transformers/pull/26579 fixed an edge case out-of-bounds tensor indexing error in TypicalLogitsWarper, and a related behaviour change was made that we thought fixed a long-standing bug w.r.t. the token inclusion cutoff. However after looking more closely, I am pretty certain that the original logic was correct and that the OOB fix should have been made differently. Specifically the docs state that it should include the "smallest set of tokens that add up to P or higher" and so `last_ind` should actually be one more than the index of the last token satisfying (cumulative_probs < self.mass). We still need a max clamp in case that last token is the very last one in the tensor.

fix: Fix typical_p behaviour broken in recent change (#27165)
A recent PR https://github.com/huggingface/transformers/pull/26579 fixed an edge case out-of-bounds tensor indexing error in TypicalLogitsWarper, and a related behaviour change was made that we thought fixed a long-standing bug w.r.t. the token inclusion cutoff. However after looking more closely, I am pretty certain that the original logic was correct and that the OOB fix should have been made differently. Specifically the docs state that it should include the "smallest set of tokens that add up to P or higher" and so `last_ind` should actually be one more than the index of the last token satisfying (cumulative_probs < self.mass). We still need a max clamp in case that last token is the very last one in the tensor.
3cd3eaf9 · Nick Hill · GitHub · b5db8ca6 · 3cd3eaf9
Unverified Commit 3cd3eaf9 authored Oct 31, 2023 by Nick Hill Committed by GitHub Oct 31, 2023
Show whitespace changes
Inline Side-by-side

Showing with 2 additions and 2 deletions

src/transformers/generation/logits_process.py src/transformers/generation/logits_process.py +2 -2

No files found.
--- a/src/transformers/generation/logits_process.py
+++ b/src/transformers/generation/logits_process.py
@@ -492,8 +492,8 @@ class TypicalLogitsWarper(LogitsWarper):
        cumulative_probs = sorted_logits.softmax(dim=-1).cumsum(dim=-1)

        # Remove tokens with cumulative mass above the threshold
-        last_ind = (cumulative_probs < self.mass).sum(dim=1) - 1
-        last_ind.clamp_(min=0)
+        last_ind = (cumulative_probs < self.mass).sum(dim=1)
+        last_ind.clamp_(max=sorted_scores.shape[-1] - 1)
        sorted_indices_to_remove = sorted_scores > sorted_scores.gather(1, last_ind.view(-1, 1))
        sorted_indices_to_remove[..., : self.min_tokens_to_keep] = 0
        indices_to_remove = sorted_indices_to_remove.scatter(1, sorted_indices, sorted_indices_to_remove)