In this example we implement the [Transformer](https://arxiv.org/pdf/1706.03762.pdf)and [Universal Transformer](https://arxiv.org/abs/1807.03819)with ACT in DGL.
In this example we implement the [Transformer](https://arxiv.org/pdf/1706.03762.pdf) with ACT in DGL.
The folder contains training module and inferencing module (beam decoder) for Transformer and training module for Universal Transformer
The folder contains training module and inferencing module (beam decoder) for Transformer.
## Dependencies
...
...
@@ -18,6 +18,8 @@ The folder contains training module and inferencing module (beam decoder) for Tr
By specifying multiple gpu ids separated by comma, we will employ multi-gpu training with multiprocessing.
- For evaluating BLEU score on test set(by enabling `--print` to see translated text):
```
...
...
@@ -28,19 +30,9 @@ Available datasets: `copy`, `sort`, `wmt14`, `multi30k`(default).
## Test Results
### Transformer
- Multi30k: we achieve BLEU score 35.41 with default setting on Multi30k dataset, without using pre-trained embeddings. (if we set the number of layers to 2, the BLEU score could reach 36.45).
- WMT14: work in progress
### Universal Transformer
- work in progress
## Notes
- Currently we do not support Multi-GPU training(this will be fixed soon), you should only specify only one gpu\_id when running the training script.