Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
d8923270
"tests/test_tokenization_phobert.py" did not exist on "5dd7b677adbd2a228328e42b79583143c16b8dff"
Commit
d8923270
authored
Aug 16, 2019
by
Jason Phang
Committed by
Lysandre Debut
Aug 16, 2019
Browse files
Correct truncation for RoBERTa in 2-input GLUE
parent
7e7fc53d
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
3 additions
and
2 deletions
+3
-2
examples/utils_glue.py
examples/utils_glue.py
+3
-2
No files found.
examples/utils_glue.py
View file @
d8923270
...
...
@@ -422,8 +422,9 @@ def convert_examples_to_features(examples, label_list, max_seq_length,
tokens_b
=
tokenizer
.
tokenize
(
example
.
text_b
)
# Modifies `tokens_a` and `tokens_b` in place so that the total
# length is less than the specified length.
# Account for [CLS], [SEP], [SEP] with "- 3"
_truncate_seq_pair
(
tokens_a
,
tokens_b
,
max_seq_length
-
3
)
# Account for [CLS], [SEP], [SEP] with "- 3". " -4" for RoBERTa.
special_tokens_count
=
4
if
sep_token_extra
else
3
_truncate_seq_pair
(
tokens_a
,
tokens_b
,
max_seq_length
-
special_tokens_count
)
else
:
# Account for [CLS] and [SEP] with "- 2" and with "- 3" for RoBERTa.
special_tokens_count
=
3
if
sep_token_extra
else
2
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment