"src/lib/components/icons/ChevronUp.svelte" did not exist on "9763d885be9fca79481df065524107c86b69c915"
README.md 3.4 KB
Newer Older
xuehui's avatar
xuehui committed
1
# Automatic Model Architecture Search for Reading Comprehension
xuehui's avatar
xuehui committed
2
3
This example shows us how to use Genetic Algorithm to find good model architectures for Reading Comprehension task.

xuehui's avatar
xuehui committed
4
## Search Space
xuehui's avatar
xuehui committed
5
6
7
8
9
10
11
12
13
14
15
Since attention and recurrent neural network (RNN) module have been proven effective in Reading Comprehension.
We conclude the search space as follow:

1. IDENTITY (Effectively means keep training).
2. INSERT-RNN-LAYER (Inserts a LSTM. Comparing the performance of GRU and LSTM in our experiment, we decided to use LSTM here.)
3. REMOVE-RNN-LAYER
4. INSERT-ATTENTION-LAYER(Inserts a attention layer.)
5. REMOVE-ATTENTION-LAYER
6. ADD-SKIP (Identity between random layers).
7. REMOVE-SKIP (Removes random skip).

xuehui's avatar
xuehui committed
16
![ga-squad-logo](./ga_squad.png)
xuehui's avatar
xuehui committed
17

xuehui's avatar
xuehui committed
18
## New version
xuehui's avatar
xuehui committed
19
20
Also we have another version which time cost is less and performance is better. We will release soon.

xuehui's avatar
xuehui committed
21
# How to run this example?
xuehui's avatar
xuehui committed
22

xuehui's avatar
xuehui committed
23
## Use downloading script to download data
xuehui's avatar
xuehui committed
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48

Execute the following command to download needed files
using the downloading script:

```
chmod +x ./download.sh
./download.sh
```

## Download manually

1. download "dev-v1.1.json" and "train-v1.1.json" in https://rajpurkar.github.io/SQuAD-explorer/

```
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json
```

2. download "glove.840B.300d.txt" in https://nlp.stanford.edu/projects/glove/

```
wget http://nlp.stanford.edu/data/glove.840B.300d.zip
unzip glove.840B.300d.zip
```

xuehui's avatar
xuehui committed
49
# submit this job
xuehui's avatar
xuehui committed
50
51
52
```
nnictl create --config ~/nni/examples/trials/ga_squad/config.yaml
```
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111

# Techinal details about the trial

## Model configuration format

Here is an example of the model configuration, which is passed from the tuner to the trial in the architecture search procedure.

```
{
    "max_layer_num": 50,
    "layers": [
        {
            "input_size": 0,
            "type": 3,
            "output_size": 1,
            "input": [],
            "size": "x",
            "output": [4, 5],
            "is_delete": false
        },
        {
            "input_size": 0,
            "type": 3,
            "output_size": 1,
            "input": [],
            "size": "y",
            "output": [4, 5],
            "is_delete": false
        },
        {
            "input_size": 1,
            "type": 4,
            "output_size": 0,
            "input": [6],
            "size": "x",
            "output": [],
            "is_delete": false
        },
        {
            "input_size": 1,
            "type": 4,
            "output_size": 0,
            "input": [5],
            "size": "y",
            "output": [],
            "is_delete": false
        },
        {"Comment": "More layers will be here for actual graphs."}
    ]
}
```

Every model configuration will has a "layers" section, which is a JSON list of layer definitions. The definition of each layer is also a JSON object, where:

 * "type" is the type of the layer. 0, 1, 2, 3, 4 corresponde to attention, self-attention, RNN, input and output layer respectively.
 * "size" is the length of the output. "x", "y" corresponde to document length / question length, respectively.
 * "input_size" is the number of inputs the layer has.
 * "input" is the indices of layers taken as input of this layer.
 * "output" is the indices of layers use this layer's output as their input.