"src/vscode:/vscode.git/clone" did not exist on "843355f89fd043e82b3344d9259e6faa640da6f9"
Updates to preprocess_data.py and indexed_dataset.
preprocess_data: - Adds ability to not split sentences. This is used for gpt2 datasets. - Adds ability to create multiple datasets from different json keys, this is current untested. indexed_dataset: - Add new "get" function to get a portion of an entry.
Showing
scripts/preprocess_data.py
0 → 100644
Please register or sign in to comment