## Introduction This directory contains our TF implementation of Transformer-XL. Note that our state-of-the-art results reported in the paper were obtained by training the model on a large-scale TPU cluster, and our gpu codebase currently does not support distributed training. Here we provide two sets of hyperparameters and scripts: - `*large_tpu.sh` are for the SoTA setting on TPUs. These are exactly the commands we used to obtained our best results. - `*base_gpu.sh` are for the base models which can be run on a few GPUs. ## Prerequisite - Python 2.7 - Tensorflow [1.12.0](https://github.com/tensorflow/tensorflow/releases/tag/v1.12.0) ## Obtain and evaluate pretrained SoTA models #### 1. Download preprocessed data (vocab) & pretrained models (a) Set your own `DATA_ROOT` in `sota/download.sh` (default to `./`), which will be the root diretory of downloaded model. (b) Then, download the model & data by `bash sota/download.sh`. After downloading, the expected directory structure is as follows ```markdown pretrained_xl tf_enwik8/ data/ cache.pkl corpus-info.json model/ checkpoint model.ckpt* tf_wt103/ ... ... ``` **Note**: we include preprocessed data in the download files to make sure the **same vocabulary** is used. Please see the code `tf/data_utils.py` to understand the data structure. #### 2. Run evaluation scripts to replicate SoTA results on GPUs - **enwik8**: modify the script `sota/enwik8.sh` accordingly (see below) - set `DATA_ROOT` to the same folder used in the download step (default to `./`) - set `TEST_NUM_CORE ` (number of GPUs to use): we recommend 2 GPUs => about 60 mins - run the script: `bash sota/enwik8.sh` - **lm1b**: modify the script `sota/lm1b.sh` accordingly (see below) - set `DATA_ROOT` to the same folder used in the download step (default to `./`) - set `TEST_NUM_CORE ` (number of GPUs to use): we recommend 1 GPUs => less than 5 mins - run the script: `bash sota/lm1b.sh` - **wt103**: modify the script `sota/wt103.sh` accordingly (see below) - set `DATA_ROOT` to the same folder used in the download step (default to `./`) - set `TEST_NUM_CORE ` (number of GPUs to use): we recommend 1 GPUs => less than 5 mins - run the script: `bash sota/wt103.sh` - **text8**: modify the script `sota/text8.sh` accordingly (see below) - set `DATA_ROOT` to the same folder used in the download step (default to `./`) - set `TEST_NUM_CORE ` (number of GPUs to use): we recommend 2 GPUs => about 60 mins - run the script: `bash sota/text8.sh` ## Train "Transformer-XL" from scratch with GPUs or TPUs ### 1. Download raw data `bash getdata.sh` ### 2. Preprocess, training and evaluation For `dataset` in `[enwik8, lm1b, wt103, text8]`: - check out `scripts/dataset_base_gpu.sh` for GPU training and evaluation - check out `scripts/dataset_large_tpu.sh` for TPU training and evaluation #### (1) Preprocess raw data and create tfrecords **NOTE**: The preprocessing for GPU and TPU are different. So, you have to run them separately. GPU: - create training and validation data: `bash scripts/dataset_bas_gpu.sh train_data` - create test data: `bash scripts/dataset_base_gpu.sh test_data` TPU: - Set the Google storage URL in `scripts/dataset_large_tpu.sh`: - `GSDATA`: data URL - `GSEXP`: experiment URL - create training and validation data: `bash scripts/dataset_large_tpu.sh train_data` - create test data: `bash scripts/dataset_large_tpu.sh test_data` #### (2) Run training Base models on GPUs: - Modify the configurations in `scripts/dataset_base_gpu.sh` according to your needs. - `bash scripts/dataset_base_gpu.sh train` - If enough resources are available, increasing the model sizes (e.g., `N_LAYER`, `D_MODEL`, `D_EMBED`, `D_HEAD`, `D_INNER`) so that they are closer to the values defined in `scripts/dataset_large_tpu.sh`. Likewise, when resources are limited, decrease the model sizes. It is recommended to ensure that `D_MODEL == D_EMBED` and `D_MODEL == N_HEAD x D_HEAD`. When the model sizes increase, remember to increase `warmup_steps` accordingly to alleviate optimization difficulties. Larger models on TPUs: - Modify the configurations in `scripts/dataset_large_tpu.sh` according to your needs. - `bash scripts/dataset_large_tpu.sh train` #### (3) Run evaluation Base models on GPUs: - `bash scripts/dataset_base_gpu.sh eval --eval_ckpt_path PATH_TO_CKPT` Larger models on TPUs: - `bash scripts/dataset_base_tpu.sh eval --eval_ckpt_path PATH_TO_CKPT`