Generate: TF contrastive search must pop `use_cache` from `model_kwargs` (#21149)

7b5e943c · Joao Gante · GitHub · 7f3dab39 · 7b5e943c
Unverified Commit 7b5e943c authored Jan 17, 2023 by Joao Gante Committed by GitHub Jan 17, 2023
Show whitespace changes
Inline Side-by-side

Showing with 2 additions and 0 deletions

src/transformers/generation/tf_utils.py src/transformers/generation/tf_utils.py +2 -0

No files found.
--- a/src/transformers/generation/tf_utils.py
+++ b/src/transformers/generation/tf_utils.py
@@ -2437,6 +2437,8 @@ class TFGenerationMixin:
            else self.generation_config.return_dict_in_generate
        )
        use_cache = True  # In contrastive search, we always use cache
+        model_kwargs.pop("use_cache", None)
+
        use_xla = not tf.executing_eagerly()
        # TODO (Joao): fix cache format or find programatic way to detect cache index
        # GPT2 and other models has a slightly different cache structure, with a different batch axis