Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
a1ad16a4
Unverified
Commit
a1ad16a4
authored
Jan 20, 2021
by
Sylvain Gugger
Committed by
GitHub
Jan 20, 2021
Browse files
Restrain tokenizer.model_max_length default (#9681)
* Restrain tokenizer.model_max_length default * Fix indent
parent
7e662e6a
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
6 additions
and
0 deletions
+6
-0
examples/language-modeling/run_mlm.py
examples/language-modeling/run_mlm.py
+6
-0
No files found.
examples/language-modeling/run_mlm.py
View file @
a1ad16a4
...
@@ -338,6 +338,12 @@ def main():
...
@@ -338,6 +338,12 @@ def main():
if
data_args
.
max_seq_length
is
None
:
if
data_args
.
max_seq_length
is
None
:
max_seq_length
=
tokenizer
.
model_max_length
max_seq_length
=
tokenizer
.
model_max_length
if
max_seq_length
>
1024
:
logger
.
warn
(
f
"The tokenizer picked seems to have a very large `model_max_length` (
{
tokenizer
.
model_max_length
}
). "
"Picking 1024 instead. You can change that default value by passing --max_seq_length xxx."
)
max_seq_length
=
1024
else
:
else
:
if
data_args
.
max_seq_length
>
tokenizer
.
model_max_length
:
if
data_args
.
max_seq_length
>
tokenizer
.
model_max_length
:
logger
.
warn
(
logger
.
warn
(
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment