fixed word level extract features for roberta-xlmr

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/933 Differential Revision: D18783780 fbshipit-source-id: fa0a27fab886a5fa5be8d5f49151d1d9dd9775f1

fixed word level extract features for roberta-xlmr
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/933 Differential Revision: D18783780 fbshipit-source-id: fa0a27fab886a5fa5be8d5f49151d1d9dd9775f1
d48895bd · Naman Goyal · Facebook Github Bot · 1c565940 · d48895bd
Commit d48895bd authored Dec 03, 2019 by Naman Goyal Committed by Facebook Github Bot Dec 03, 2019
Hide whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

fairseq/models/roberta/alignment_utils.py fairseq/models/roberta/alignment_utils.py +1 -1

No files found.
--- a/fairseq/models/roberta/alignment_utils.py
+++ b/fairseq/models/roberta/alignment_utils.py
@@ -22,6 +22,7 @@ def align_bpe_to_words(roberta, bpe_tokens: torch.LongTensor, other_tokens: List
        List[str]: mapping from *other_tokens* to corresponding *bpe_tokens*.
    """
    assert bpe_tokens.dim() == 1
+    assert bpe_tokens[0] == 0
    def clean(text):
        return text.strip()
@@ -32,7 +33,6 @@ def align_bpe_to_words(roberta, bpe_tokens: torch.LongTensor, other_tokens: List
    other_tokens = [clean(str(o)) for o in other_tokens]
    # strip leading <s>
-    assert bpe_tokens[0] == '<s>'
    bpe_tokens = bpe_tokens[1:]
    assert ''.join(bpe_tokens) == ''.join(other_tokens)