Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
wuxk1
Megatron-LM
Commits
3207c19a
Commit
3207c19a
authored
Apr 06, 2023
by
Jared Casper
Browse files
Missed some changes from next-best-lm branch.
parent
46ffb75c
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
6 additions
and
1 deletion
+6
-1
megatron/global_vars.py
megatron/global_vars.py
+1
-1
tools/preprocess_data.py
tools/preprocess_data.py
+2
-0
tools/preprocess_data_partitions.py
tools/preprocess_data_partitions.py
+3
-0
No files found.
megatron/global_vars.py
View file @
3207c19a
...
...
@@ -89,7 +89,7 @@ def set_global_variables(args):
set_args
(
args
)
_build_num_microbatches_calculator
(
args
)
if
args
.
vocab_file
:
if
args
.
vocab_file
or
args
.
tokenizer_model
:
_
=
_build_tokenizer
(
args
)
_set_tensorboard_writer
(
args
)
_set_adlr_autoresume
(
args
)
...
...
tools/preprocess_data.py
View file @
3207c19a
...
...
@@ -104,6 +104,8 @@ def get_args():
help
=
'Append an <eod> token to the end of a document.'
)
group
.
add_argument
(
'--lang'
,
type
=
str
,
default
=
'english'
,
help
=
'Language to use for NLTK-powered sentence splitting.'
)
group
.
add_argument
(
'--tokenizer-model'
,
type
=
str
,
default
=
None
,
help
=
'sentencepeice tokenizer model.'
)
group
=
parser
.
add_argument_group
(
title
=
'output data'
)
...
...
tools/preprocess_data_partitions.py
View file @
3207c19a
...
...
@@ -326,6 +326,9 @@ def main():
for
p
in
processes
:
p
.
join
()
if
args
.
partitions
==
1
:
return
# encode partition files in parallel
processes
=
[]
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment