"vscode:/vscode.git/clone" did not exist on "7b9a174a7af35c25c91781b0b18327557e24878d"
-
John Kamalu authored
tools/merge_datasets.py - tool to merge multiple dataset files into a single dataset - testing conducted and included in the megatron-testing repo https://gitlab-master.nvidia.com/ADLR/megatron-testing tools/preprocess_data.py - magic numbers changed to required command line arguments megatron/data/indexed_dataset.py - when merging, fix to properly update document index - testing conducted and included in the megatron-testing repo (see above) - fix follows this history https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/66
a2c5e6cd