-`examples/preprocess_data.sh` : Run preprocessing (calls `get_preprocess_cmd.sh`, `main.py`).
-`main.py` : Entry point for processing.
-`examples/preprocess_data.sh` : Example preprocessing launch (calls`main.py`).
-`examples/pretrain_data.sh` : Example pretraining launch (calls `pretrain_retro.py`).
Use `--retro-tasks` to move through the preprocessing pipeline.
...
...
@@ -86,9 +84,8 @@ Multiple tasks can be specified by separating with commas (e.g., `--retro-tasks
Example scripts for setting arguments and launch Retro preprocessing. The key files here are:
-**`get_preprocess_cmd.sh`** : Sets up arguments and command for preprocessing. **Important note**: this script assumes a few environment variables are already set before it is called. Please see the `Environment vars.` section at the top of this file. Generally, environment variables must be set to determine the location of Retro workdirs, input datasets, and GPT and Bert model information.
-**`preprocess_data.sh`** : Calls `get_preprocess_cmd.sh` to get arguments, and then calls `main.py` to launch preprocessing.
-**`pretrain_model.sh`** : Example script for pretraining on Wikipedia data, after preprocessing is complete.
-**`preprocess_data.sh`** : Example launch script for preprocessing retro data.
-**`pretrain_model.sh`** : Example launch script for pretraining a retro model.