Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
d9ece823
Unverified
Commit
d9ece823
authored
May 18, 2020
by
Boris Dayma
Committed by
GitHub
May 18, 2020
Browse files
fix(run_language_modeling): use arg overwrite_cache (#4407)
parent
d39bf0ac
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
4 additions
and
1 deletion
+4
-1
examples/language-modeling/run_language_modeling.py
examples/language-modeling/run_language_modeling.py
+4
-1
No files found.
examples/language-modeling/run_language_modeling.py
View file @
d9ece823
...
@@ -120,7 +120,9 @@ def get_dataset(args: DataTrainingArguments, tokenizer: PreTrainedTokenizer, eva
...
@@ -120,7 +120,9 @@ def get_dataset(args: DataTrainingArguments, tokenizer: PreTrainedTokenizer, eva
if
args
.
line_by_line
:
if
args
.
line_by_line
:
return
LineByLineTextDataset
(
tokenizer
=
tokenizer
,
file_path
=
file_path
,
block_size
=
args
.
block_size
)
return
LineByLineTextDataset
(
tokenizer
=
tokenizer
,
file_path
=
file_path
,
block_size
=
args
.
block_size
)
else
:
else
:
return
TextDataset
(
tokenizer
=
tokenizer
,
file_path
=
file_path
,
block_size
=
args
.
block_size
)
return
TextDataset
(
tokenizer
=
tokenizer
,
file_path
=
file_path
,
block_size
=
args
.
block_size
,
overwrite_cache
=
args
.
overwrite_cache
)
def
main
():
def
main
():
...
@@ -216,6 +218,7 @@ def main():
...
@@ -216,6 +218,7 @@ def main():
data_args
.
block_size
=
min
(
data_args
.
block_size
,
tokenizer
.
max_len
)
data_args
.
block_size
=
min
(
data_args
.
block_size
,
tokenizer
.
max_len
)
# Get datasets
# Get datasets
train_dataset
=
get_dataset
(
data_args
,
tokenizer
=
tokenizer
)
if
training_args
.
do_train
else
None
train_dataset
=
get_dataset
(
data_args
,
tokenizer
=
tokenizer
)
if
training_args
.
do_train
else
None
eval_dataset
=
get_dataset
(
data_args
,
tokenizer
=
tokenizer
,
evaluate
=
True
)
if
training_args
.
do_eval
else
None
eval_dataset
=
get_dataset
(
data_args
,
tokenizer
=
tokenizer
,
evaluate
=
True
)
if
training_args
.
do_eval
else
None
data_collator
=
DataCollatorForLanguageModeling
(
data_collator
=
DataCollatorForLanguageModeling
(
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment