@@ -7,18 +7,18 @@ This repository contains the code in both **PyTorch** and **TensorFlow** for our
...
@@ -7,18 +7,18 @@ This repository contains the code in both **PyTorch** and **TensorFlow** for our
>Preprint 2018
>Preprint 2018
#### TensorFlow
## TensorFlow
- The source code is in the `tf/` folder, supporting (1) single-node multi-gpu training, and (2) multi-host TPU training.
- The source code is in the `tf/` folder, supporting (1) single-node multi-gpu training, and (2) multi-host TPU training.
- Besides the source code, we also provide pretrained "TensorFlow" models with state-of-the-art (SoTA) performances reported in the paper.
- Besides the source code, we also provide pretrained "TensorFlow" models with state-of-the-art (SoTA) performances reported in the paper.
- Please refer to `tf/README.md` for details.
- Please refer to `tf/README.md` for details.
#### PyTorch
## PyTorch
- The source code is in the `pytorch/` folder, supporting single-node multi-gpu training via the module `nn.DataParallel`.
- The source code is in the `pytorch/` folder, supporting single-node multi-gpu training via the module `nn.DataParallel`.
- Please refer to `pytorch/README.md` for details.
- Please refer to `pytorch/README.md` for details.
#### Results
## Results
Transformer-XL achieves new state-of-the-art results on multipole language modeling benchmarks. Transformer-XL is also the first to break through the 1.0 barrier on char-level language modeling. Below is a summary.
Transformer-XL achieves new state-of-the-art results on multipole language modeling benchmarks. Transformer-XL is also the first to break through the 1.0 barrier on char-level language modeling. Below is a summary.
This directory contains our pytorch implementation of Transformer-XL. Note that our state-of-the-art results reported in the paper were obtained by training the model on a large-scale TPU cluster, and our pytorch codebase currently does not support distributed training. Here we provide two sets of hyperparameters and scripts:
This directory contains our pytorch implementation of Transformer-XL. Note that our state-of-the-art results reported in the paper were obtained by training the model on a large-scale TPU cluster, and our pytorch codebase currently does not support distributed training. Here we provide two sets of hyperparameters and scripts:
-`*large.sh` are for the SoTA setting with large models which might not be directly runnable on a local GPU machine.
-`*large.sh` are for the SoTA setting with large models which might not be directly runnable on a local GPU machine.
...
@@ -7,16 +7,16 @@ This directory contains our pytorch implementation of Transformer-XL. Note that
...
@@ -7,16 +7,16 @@ This directory contains our pytorch implementation of Transformer-XL. Note that
The pytorch implementation produces similar results to the TF codebase under the same settings in our preliminary experiments.
The pytorch implementation produces similar results to the TF codebase under the same settings in our preliminary experiments.