Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
Megatron-LM
Commits
1c1a55da
Commit
1c1a55da
authored
Apr 09, 2020
by
Mohammad
Browse files
addressed Jareds comments
parent
898fcb94
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
5 additions
and
3 deletions
+5
-3
megatron/data/gpt2_dataset.py
megatron/data/gpt2_dataset.py
+1
-1
megatron/data/helpers.cpp
megatron/data/helpers.cpp
+4
-2
No files found.
megatron/data/gpt2_dataset.py
View file @
1c1a55da
...
...
@@ -253,7 +253,7 @@ def _build_sample_idx(sizes, doc_idx, seq_length,
num_epochs
,
tokens_per_epoch
):
"""Sample index mapping is a 2D array with sizes
[number-of-samples + 1, 2] where [..., 0] contains
the index into `doc_idx` and [...,
0
] is the
the index into `doc_idx` and [...,
1
] is the
starting offset in that document."""
# Total number of samples. For -1 see comments in `_num_epochs`.
...
...
megatron/data/helpers.cpp
View file @
1c1a55da
...
...
@@ -38,8 +38,10 @@ py::array build_sample_idx(const py::array_t<int32_t>& sizes_,
const
int32_t
seq_length
,
const
int32_t
num_epochs
,
const
int64_t
tokens_per_epoch
)
{
/* Sample index mapping is a 2D array with sizes [number-of-samples + 1, 2]
where [..., 0] contains the index into `doc_idx` and [..., 0] is the
/* Sample index (sample_idx) is used for gpt2 like dataset for which
the documents are flattened and the samples are built based on this
1-D flatten array. It is a 2D array with sizes [number-of-samples + 1, 2]
where [..., 0] contains the index into `doc_idx` and [..., 1] is the
starting offset in that document.*/
// Consistency checks.
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment