README.md 2.71 KB
Newer Older
Lukasz Kaiser's avatar
Lukasz Kaiser committed
1
# NeuralGPU
2
3
Code for the Neural GPU model described in http://arxiv.org/abs/1511.08228.
The extended version was described in https://arxiv.org/abs/1610.08613.
Lukasz Kaiser's avatar
Lukasz Kaiser committed
4

5
6
7
Requirements:
* TensorFlow (see tensorflow.org for how to install)

8
9
The model can be trained on the following algorithmic tasks:

10
11
12
13
14
15
* `sort` - Sort a symbol list
* `kvsort` - Sort symbol keys in dictionary
* `id` - Return the same symbol list
* `rev` - Reverse a symbol list
* `rev2` - Reverse a symbol dictionary by key
* `incr` - Add one to a symbol value
16
* `add` - Long decimal addition
17
18
19
20
* `left` - First symbol in list
* `right` - Last symbol in list
* `left-shift` - Left shift a symbol list
* `right-shift` - Right shift a symbol list
21
22
* `bmul` - Long binary multiplication
* `mul` - Long decimal multiplication
23
* `dup` - Duplicate a symbol list with padding
24
25
* `badd` - Long binary addition
* `qadd` - Long quaternary addition
26
27
* `search` - Search for symbol key in dictionary

Lukasz Kaiser's avatar
Lukasz Kaiser committed
28
It can also be trained on the WMT English-French translation task:
29

Lukasz Kaiser's avatar
Lukasz Kaiser committed
30
* `wmt` - WMT English-French translation (data will be downloaded)
31

Lukasz Kaiser's avatar
Lukasz Kaiser committed
32
33
34
35
36
37
38
39
40
41
42
43
44
45
The value range for symbols are defined by the `vocab_size` flag.
In particular, the values are in the range `vocab_size - 1`.
So if you set `--vocab_size=16` (the default) then `--problem=rev`
will be reversing lists of 15 symbols, and `--problem=id` will be identity
on a list of up to 15 symbols.


To train the model on the binary multiplication task run:

```
python neural_gpu_trainer.py --problem=bmul
```

This trains the Extended Neural GPU, to train the original model run:
46
47

```
Lukasz Kaiser's avatar
Lukasz Kaiser committed
48
python neural_gpu_trainer.py --problem=bmul --beam_size=0
49
50
51
52
53
54
55
56
57
58
```

While training, interim / checkpoint model parameters will be
written to `/tmp/neural_gpu/`.

Once the amount of error gets down to what you're comfortable
with, hit `Ctrl-C` to stop the training process. The latest
model parameters will be in `/tmp/neural_gpu/neural_gpu.ckpt-<step>`
and used on any subsequent run.

Lukasz Kaiser's avatar
Lukasz Kaiser committed
59
To evaluate a trained model on how well it decodes run:
60
61

```
Lukasz Kaiser's avatar
Lukasz Kaiser committed
62
python neural_gpu_trainer.py --problem=bmul --mode=1
63
64
```

Lukasz Kaiser's avatar
Lukasz Kaiser committed
65
To interact with a model (experimental, see code) run:
66
67

```
Lukasz Kaiser's avatar
Lukasz Kaiser committed
68
python neural_gpu_trainer.py --problem=bmul --mode=2
69
```
70

71
72
73
74
75
76
77
78
79
80
81
82
To train on WMT data, set a larger --nmaps and --vocab_size and avoid curriculum:

```
python neural_gpu_trainer.py --problem=wmt --vocab_size=32768 --nmaps=256
  --vec_size=256 --curriculum_seq=1.0 --max_length=60 --data_dir ~/wmt
```

With less memory, try lower batch size, e.g. `--batch_size=4`. With more GPUs
in your system, there will be a batch on every GPU so you can run larger models.
For example, `--batch_size=4 --num_gpus=4 --nmaps=512 --vec_size=512` will
run a large model (512-size) on 4 GPUs, with effective batches of 4*4=16.

83
Maintained by Lukasz Kaiser (lukaszkaiser)