fixed multiplechoice tokenization (#12362)

* fixed multiplechoice tokenization The model would have seen two sequences: 1. [CLS]prompt[SEP]prompt[SEP] 2. [CLS]choice0[SEP]choice1[SEP] that is not correct as we want a contextualized embedding of prompt and choice * removed outer brackets for proper sequence generation

fixed multiplechoice tokenization (#12362)
* fixed multiplechoice tokenization The model would have seen two sequences: 1. [CLS]prompt[SEP]prompt[SEP] 2. [CLS]choice0[SEP]choice1[SEP] that is not correct as we want a contextualized embedding of prompt and choice * removed outer brackets for proper sequence generation
f8664258 · cronoik · GitHub · 4a872cae · f8664258
Unverified Commit f8664258 authored Jun 25, 2021 by cronoik Committed by GitHub Jun 25, 2021
Hide whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

src/transformers/file_utils.py src/transformers/file_utils.py +1 -1

No files found.
--- a/src/transformers/file_utils.py
+++ b/src/transformers/file_utils.py
@@ -816,7 +816,7 @@ PT_MULTIPLE_CHOICE_SAMPLE = r"""
        >>> choice1 = "It is eaten while held in the hand."
        >>> labels = torch.tensor(0).unsqueeze(0)  # choice0 is correct (according to Wikipedia ;)), batch size 1

-        >>> encoding = tokenizer([[prompt, prompt], [choice0, choice1]], return_tensors='pt', padding=True)
+        >>> encoding = tokenizer([prompt, prompt], [choice0, choice1], return_tensors='pt', padding=True)
        >>> outputs = model(**{{k: v.unsqueeze(0) for k,v in encoding.items()}}, labels=labels)  # batch size is 1

        >>> # the linear classifier still needs to be trained