Fix return_dict_in_generate bug in InstructBlip generate function (#25246)

Fix bug in InstructBlip generate function Previously, the postprocessing conducted on generated sequences in InstructBlip's generate function assumed these sequences were tensors (i.e. that `return_dict_in_generate == False`). This commit checks whether the result of the call to the wrapped language model `generate()` is a tensor, and if not attempts to postprocess the sequence attribute of the returned results object.

Fix return_dict_in_generate bug in InstructBlip generate function (#25246)
Fix bug in InstructBlip generate function Previously, the postprocessing conducted on generated sequences in InstructBlip's generate function assumed these sequences were tensors (i.e. that `return_dict_in_generate == False`). This commit checks whether the result of the call to the wrapped language model `generate()` is a tensor, and if not attempts to postprocess the sequence attribute of the returned results object.
1baeed5b · Euan Ong · GitHub · eec0d84e · 1baeed5b
Unverified Commit 1baeed5b authored Aug 02, 2023 by Euan Ong Committed by GitHub Aug 02, 2023
Hide whitespace changes
Inline Side-by-side

Showing with 4 additions and 1 deletion

src/transformers/models/instructblip/modeling_instructblip.py ...transformers/models/instructblip/modeling_instructblip.py +4 -1

No files found.
--- a/src/transformers/models/instructblip/modeling_instructblip.py
+++ b/src/transformers/models/instructblip/modeling_instructblip.py
@@ -1559,6 +1559,9 @@ class InstructBlipForConditionalGeneration(InstructBlipPreTrainedModel):
        # with the tokenizer's bos token being set to </s> which has ID=2,
        # whereas the model's text config has bos token id = 0
        if self.config.text_config.architectures[0] == "LLaMAForCausalLM":
-            outputs[outputs == 0] = 2
+            if isinstance(outputs, torch.Tensor):
+                outputs[outputs == 0] = 2
+            else:
+                outputs.sequences[outputs.sequences == 0] = 2

        return outputs