README.md 3.02 KB
Newer Older
1
2
3
4
![No Maintenance Intended](https://img.shields.io/badge/No%20Maintenance%20Intended-%E2%9C%95-red.svg)
![TensorFlow Requirement: 1.x](https://img.shields.io/badge/TensorFlow%20Requirement-1.x-brightgreen)
![TensorFlow 2 Not Supported](https://img.shields.io/badge/TensorFlow%202%20Not%20Supported-%E2%9C%95-red.svg)

Lukasz Kaiser's avatar
Lukasz Kaiser committed
5
# NeuralGPU
6
7
Code for the Neural GPU model described in http://arxiv.org/abs/1511.08228.
The extended version was described in https://arxiv.org/abs/1610.08613.
Lukasz Kaiser's avatar
Lukasz Kaiser committed
8

9
10
11
Requirements:
* TensorFlow (see tensorflow.org for how to install)

12
13
The model can be trained on the following algorithmic tasks:

14
15
16
17
18
19
* `sort` - Sort a symbol list
* `kvsort` - Sort symbol keys in dictionary
* `id` - Return the same symbol list
* `rev` - Reverse a symbol list
* `rev2` - Reverse a symbol dictionary by key
* `incr` - Add one to a symbol value
20
* `add` - Long decimal addition
21
22
23
24
* `left` - First symbol in list
* `right` - Last symbol in list
* `left-shift` - Left shift a symbol list
* `right-shift` - Right shift a symbol list
25
26
* `bmul` - Long binary multiplication
* `mul` - Long decimal multiplication
27
* `dup` - Duplicate a symbol list with padding
28
29
* `badd` - Long binary addition
* `qadd` - Long quaternary addition
30
31
* `search` - Search for symbol key in dictionary

Lukasz Kaiser's avatar
Lukasz Kaiser committed
32
It can also be trained on the WMT English-French translation task:
33

Lukasz Kaiser's avatar
Lukasz Kaiser committed
34
* `wmt` - WMT English-French translation (data will be downloaded)
35

Lukasz Kaiser's avatar
Lukasz Kaiser committed
36
37
38
39
40
41
42
43
44
45
46
47
48
49
The value range for symbols are defined by the `vocab_size` flag.
In particular, the values are in the range `vocab_size - 1`.
So if you set `--vocab_size=16` (the default) then `--problem=rev`
will be reversing lists of 15 symbols, and `--problem=id` will be identity
on a list of up to 15 symbols.


To train the model on the binary multiplication task run:

```
python neural_gpu_trainer.py --problem=bmul
```

This trains the Extended Neural GPU, to train the original model run:
50
51

```
Lukasz Kaiser's avatar
Lukasz Kaiser committed
52
python neural_gpu_trainer.py --problem=bmul --beam_size=0
53
54
55
56
57
58
59
60
61
62
```

While training, interim / checkpoint model parameters will be
written to `/tmp/neural_gpu/`.

Once the amount of error gets down to what you're comfortable
with, hit `Ctrl-C` to stop the training process. The latest
model parameters will be in `/tmp/neural_gpu/neural_gpu.ckpt-<step>`
and used on any subsequent run.

Lukasz Kaiser's avatar
Lukasz Kaiser committed
63
To evaluate a trained model on how well it decodes run:
64
65

```
Lukasz Kaiser's avatar
Lukasz Kaiser committed
66
python neural_gpu_trainer.py --problem=bmul --mode=1
67
68
```

Lukasz Kaiser's avatar
Lukasz Kaiser committed
69
To interact with a model (experimental, see code) run:
70
71

```
Lukasz Kaiser's avatar
Lukasz Kaiser committed
72
python neural_gpu_trainer.py --problem=bmul --mode=2
73
```
74

75
76
77
78
79
80
81
82
83
84
85
86
To train on WMT data, set a larger --nmaps and --vocab_size and avoid curriculum:

```
python neural_gpu_trainer.py --problem=wmt --vocab_size=32768 --nmaps=256
  --vec_size=256 --curriculum_seq=1.0 --max_length=60 --data_dir ~/wmt
```

With less memory, try lower batch size, e.g. `--batch_size=4`. With more GPUs
in your system, there will be a batch on every GPU so you can run larger models.
For example, `--batch_size=4 --num_gpus=4 --nmaps=512 --vec_size=512` will
run a large model (512-size) on 4 GPUs, with effective batches of 4*4=16.

87
Maintained by Lukasz Kaiser (lukaszkaiser)