"torchvision/vscode:/vscode.git/clone" did not exist on "9778d2675ed33627c30fe2537911f5a4128433bd"
README.md 1.43 KB
Newer Older
Zihao Ye's avatar
Zihao Ye committed
1
# Transformer in DGL
2
3
4

**This example is out-dated, please refer to [BP-Transformer](http://github.com/yzh119/bpt) for efficient (Sparse) Transformer implementation in DGL.**

5
In this example we implement the [Transformer](https://arxiv.org/pdf/1706.03762.pdf) with ACT in DGL.
Zihao Ye's avatar
Zihao Ye committed
6

7
The folder contains training module and inferencing module (beam decoder) for Transformer.
Zihao Ye's avatar
Zihao Ye committed
8

9
## Dependencies
Zihao Ye's avatar
Zihao Ye committed
10
11
12
13

- PyTorch 0.4.1+
- networkx
- tqdm
14
- requests
15
- matplotlib
Zihao Ye's avatar
Zihao Ye committed
16
17
18
19
20
21

## Usage

- For training:

    ```
Chao Ma's avatar
Chao Ma committed
22
    python3 translation_train.py [--gpus id1,id2,...] [--N #layers] [--dataset DATASET] [--batch BATCHSIZE] [--universal]
Zihao Ye's avatar
Zihao Ye committed
23
24
    ```

25
26
By specifying multiple gpu ids separated by comma, we will employ multi-gpu training with multiprocessing.

Zihao Ye's avatar
Zihao Ye committed
27
28
29
- For evaluating BLEU score on test set(by enabling `--print` to see translated text):

    ```
Chao Ma's avatar
Chao Ma committed
30
    python3 translation_test.py [--gpu id] [--N #layers] [--dataset DATASET] [--batch BATCHSIZE] [--checkpoint CHECKPOINT] [--print] [--universal]
Zihao Ye's avatar
Zihao Ye committed
31
32
33
34
35
36
37
38
39
40
41
42
43
    ```

Available datasets: `copy`, `sort`, `wmt14`, `multi30k`(default).

## Test Results

- Multi30k: we achieve BLEU score 35.41 with default setting on Multi30k dataset, without using pre-trained embeddings. (if we set the number of layers to 2, the BLEU score could reach 36.45).
- WMT14: work in progress 

## Reference

- [The Annotated Transformer](http://nlp.seas.harvard.edu/2018/04/03/attention.html)
- [Tensor2Tensor](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/)