This directory contains our pytorch implementation of Transformer-XL. Note that our state-of-the-art results reported in the paper were obtained by training the model on a large-scale TPU cluster, and our pytorch codebase currently does not support distributed training. Here we provide two sets of hyperparameters and scripts:
This directory contains our pytorch implementation of Transformer-XL. Note that our state-of-the-art results reported in the paper were obtained by training the model on a large-scale TPU cluster, and our pytorch codebase currently does not support distributed training. Here we provide two sets of hyperparameters and scripts:
-`*large.sh` are for the SoTA setting with large models which might not be directly runnable on a local GPU machine.
-`*large.sh` are for the SoTA setting with large models which might not be directly runnable on a local GPU machine.
...
@@ -7,16 +7,16 @@ This directory contains our pytorch implementation of Transformer-XL. Note that
...
@@ -7,16 +7,16 @@ This directory contains our pytorch implementation of Transformer-XL. Note that
The pytorch implementation produces similar results to the TF codebase under the same settings in our preliminary experiments.
The pytorch implementation produces similar results to the TF codebase under the same settings in our preliminary experiments.