Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
Megatron-LM
Commits
7bd2a3c5903db4f7df764aa93083251f13478d90
Switch branch/tag
megatron-lm
megatron
data
preprocess_data.py
05 Apr, 2020
1 commit
Hacks to build IndexedDataset and run pretrain
· b03af49e
Neel Kant
authored
Apr 05, 2020
b03af49e
09 Nov, 2019
1 commit
Skip any empty sentences during preprocessing.
· 3f4bc91b
Jared Casper
authored
Nov 08, 2019
3f4bc91b
08 Nov, 2019
1 commit
Add document index to index file. An empty sentence no longer separate documents.
· 87bbe9be
Jared Casper
authored
Nov 07, 2019
87bbe9be
07 Nov, 2019
2 commits
Initial commit of multiprocess preprocess and extracted copy of fairseq's indexed_dataset.
· 1237533e
Jared Casper
authored
Nov 07, 2019
1237533e
added bert tokenization
· 0ceeb3b4
Mohammad Shoeybi
authored
Nov 06, 2019
0ceeb3b4