This directory contains our pytorch implementation of Transformer-XL. Note that our state-of-the-art results reported in the paper were obtained by training the model on a large-scale TPU cluster, and our pytorch codebase currently does not support distributed training. Here we provide two sets of hyperparameters and scripts:
-`*large.sh` are for the SoTA setting with large models which might not be directly runnable on a local GPU machine.
...
...
@@ -7,16 +7,16 @@ This directory contains our pytorch implementation of Transformer-XL. Note that
The pytorch implementation produces similar results to the TF codebase under the same settings in our preliminary experiments.