Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
bac51fba
Commit
bac51fba
authored
Jan 27, 2020
by
Maksym Del
Committed by
Lysandre Debut
Jan 27, 2020
Browse files
Fix token_type_ids for XLM-R
parent
babd41e7
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
2 additions
and
5 deletions
+2
-5
src/transformers/tokenization_xlm_roberta.py
src/transformers/tokenization_xlm_roberta.py
+2
-5
No files found.
src/transformers/tokenization_xlm_roberta.py
View file @
bac51fba
...
@@ -176,10 +176,7 @@ class XLMRobertaTokenizer(PreTrainedTokenizer):
...
@@ -176,10 +176,7 @@ class XLMRobertaTokenizer(PreTrainedTokenizer):
def
create_token_type_ids_from_sequences
(
self
,
token_ids_0
,
token_ids_1
=
None
):
def
create_token_type_ids_from_sequences
(
self
,
token_ids_0
,
token_ids_1
=
None
):
"""
"""
Creates a mask from the two sequences passed to be used in a sequence-pair classification task.
Creates a mask from the two sequences passed to be used in a sequence-pair classification task.
A RoBERTa sequence pair mask has the following format:
RoBERTa does not make use of token type ids, therefore a list of zeros is returned.
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1
| first sequence | second sequence
if token_ids_1 is None, only returns the first portion of the mask (0's).
if token_ids_1 is None, only returns the first portion of the mask (0's).
"""
"""
sep
=
[
self
.
sep_token_id
]
sep
=
[
self
.
sep_token_id
]
...
@@ -187,7 +184,7 @@ class XLMRobertaTokenizer(PreTrainedTokenizer):
...
@@ -187,7 +184,7 @@ class XLMRobertaTokenizer(PreTrainedTokenizer):
if
token_ids_1
is
None
:
if
token_ids_1
is
None
:
return
len
(
cls
+
token_ids_0
+
sep
)
*
[
0
]
return
len
(
cls
+
token_ids_0
+
sep
)
*
[
0
]
return
len
(
cls
+
token_ids_0
+
sep
+
sep
)
*
[
0
]
+
len
(
token_ids_1
+
sep
)
*
[
1
]
return
len
(
cls
+
token_ids_0
+
sep
+
sep
+
token_ids_1
+
sep
)
*
[
0
]
@
property
@
property
def
vocab_size
(
self
):
def
vocab_size
(
self
):
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment