SquadEvolutionExamples.rst 11.4 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
Automatic Model Architecture Search for Reading Comprehension
=============================================================

This example shows us how to use Genetic Algorithm to find good model architectures for Reading Comprehension.

1. Search Space
---------------

Since attention and RNN have been proven effective in Reading Comprehension, we conclude the search space as follow:


#. IDENTITY (Effectively means keep training).
#. INSERT-RNN-LAYER (Inserts a LSTM. Comparing the performance of GRU and LSTM in our experiment, we decided to use LSTM here.)
#. REMOVE-RNN-LAYER
#. INSERT-ATTENTION-LAYER(Inserts an attention layer.)
#. REMOVE-ATTENTION-LAYER
#. ADD-SKIP (Identity between random layers).
#. REMOVE-SKIP (Removes random skip).


.. image:: ../../../examples/trials/ga_squad/ga_squad.png
   :target: ../../../examples/trials/ga_squad/ga_squad.png
   :alt: 


New version
^^^^^^^^^^^

Also we have another version which time cost is less and performance is better. We will release soon.

2. How to run this example in local?
------------------------------------

2.1 Use downloading script to download data
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Execute the following command to download needed files
using the downloading script:

.. code-block:: bash

   chmod +x ./download.sh
   ./download.sh

Or Download manually


48
#. download ``dev-v1.1.json`` and ``train-v1.1.json`` `here <https://rajpurkar.github.io/SQuAD-explorer/>`__
49
50
51
52
53
54
55

.. code-block:: bash

   wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json
   wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json


56
#. download ``glove.840B.300d.txt`` `here <https://nlp.stanford.edu/projects/glove/>`__
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89

.. code-block:: bash

   wget http://nlp.stanford.edu/data/glove.840B.300d.zip
   unzip glove.840B.300d.zip

2.2 Update configuration
^^^^^^^^^^^^^^^^^^^^^^^^

Modify ``nni/examples/trials/ga_squad/config.yml``\ , here is the default configuration:

.. code-block:: yaml

   authorName: default
   experimentName: example_ga_squad
   trialConcurrency: 1
   maxExecDuration: 1h
   maxTrialNum: 1
   #choice: local, remote
   trainingServicePlatform: local
   #choice: true, false
   useAnnotation: false
   tuner:
     codeDir: ~/nni/examples/tuners/ga_customer_tuner
     classFileName: customer_tuner.py
     className: CustomerTuner
     classArgs:
       optimize_mode: maximize
   trial:
     command: python3 trial.py
     codeDir: ~/nni/examples/trials/ga_squad
     gpuNum: 0

90
In the **trial** part, if you want to use GPU to perform the architecture search, change ``gpuNum`` from ``0`` to ``1``. You need to increase the ``maxTrialNum`` and ``maxExecDuration``\ , according to how long you want to wait for the search result.
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122

2.3 submit this job
^^^^^^^^^^^^^^^^^^^

.. code-block:: bash

   nnictl create --config ~/nni/examples/trials/ga_squad/config.yml

3 Run this example on OpenPAI
-----------------------------

Due to the memory limitation of upload, we only upload the source code and complete the data download and training on OpenPAI. This experiment requires sufficient memory that ``memoryMB >= 32G``\ , and the training may last for several hours.

3.1 Update configuration
^^^^^^^^^^^^^^^^^^^^^^^^

Modify ``nni/examples/trials/ga_squad/config_pai.yml``\ , here is the default configuration:

.. code-block:: yaml

   authorName: default
   experimentName: example_ga_squad
   trialConcurrency: 1
   maxExecDuration: 1h
   maxTrialNum: 10
   #choice: local, remote, pai
   trainingServicePlatform: pai
   #choice: true, false
   useAnnotation: false
   #Your nni_manager ip
   nniManagerIp: 10.10.10.10
   tuner:
J-shang's avatar
J-shang committed
123
     codeDir: https://github.com/Microsoft/nni/tree/v2.1/examples/tuners/ga_customer_tuner
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
     classFileName: customer_tuner.py
     className: CustomerTuner
     classArgs:
       optimize_mode: maximize
   trial:
     command: chmod +x ./download.sh && ./download.sh && python3 trial.py
     codeDir: .
     gpuNum: 0
     cpuNum: 1
     memoryMB: 32869
     #The docker image to run nni job on OpenPAI
     image: msranni/nni:latest
   paiConfig:
     #The username to login OpenPAI
     userName: username
     #The password to login OpenPAI
     passWord: password
     #The host of restful server of OpenPAI
     host: 10.10.10.10

Please change the default value to your personal account and machine information. Including ``nniManagerIp``\ , ``userName``\ , ``passWord`` and ``host``.

In the "trial" part, if you want to use GPU to perform the architecture search, change ``gpuNum`` from ``0`` to ``1``. You need to increase the ``maxTrialNum`` and ``maxExecDuration``\ , according to how long you want to wait for the search result.

``trialConcurrency`` is the number of trials running concurrently, which is the number of GPUs you want to use, if you are setting ``gpuNum`` to 1.

3.2 submit this job
^^^^^^^^^^^^^^^^^^^

.. code-block:: bash

   nnictl create --config ~/nni/examples/trials/ga_squad/config_pai.yml

4. Technical details about the trial
------------------------------------

4.1 How does it works
^^^^^^^^^^^^^^^^^^^^^

The evolution-algorithm based architecture for question answering has two different parts just like any other examples: the trial and the tuner.

4.2 The trial
^^^^^^^^^^^^^

The trial has a lot of different files, functions and classes. Here we will only give most of those files a brief introduction:


* ``attention.py`` contains an implementation for attention mechanism in Tensorflow.
* ``data.py`` contains functions for data preprocessing.
* ``evaluate.py`` contains the evaluation script.
* ``graph.py`` contains the definition of the computation graph.
* ``rnn.py`` contains an implementation for GRU in Tensorflow.
* ``train_model.py`` is a wrapper for the whole question answering model.

Among those files, ``trial.py`` and ``graph_to_tf.py`` are special.

``graph_to_tf.py`` has a function named as ``graph_to_network``\ , here is its skeleton code:

.. code-block:: python

   def graph_to_network(input1,
                        input2,
                        input1_lengths,
                        input2_lengths,
                        graph,
                        dropout_rate,
                        is_training,
                        num_heads=1,
                        rnn_units=256):
       topology = graph.is_topology()
       layers = dict()
       layers_sequence_lengths = dict()
       num_units = input1.get_shape().as_list()[-1]
       layers[0] = input1*tf.sqrt(tf.cast(num_units, tf.float32)) + \
           positional_encoding(input1, scale=False, zero_pad=False)
       layers[1] = input2*tf.sqrt(tf.cast(num_units, tf.float32))
       layers[0] = dropout(layers[0], dropout_rate, is_training)
       layers[1] = dropout(layers[1], dropout_rate, is_training)
       layers_sequence_lengths[0] = input1_lengths
       layers_sequence_lengths[1] = input2_lengths
       for _, topo_i in enumerate(topology):
           if topo_i == '|':
               continue
           if graph.layers[topo_i].graph_type == LayerType.input.value:
               # ......
           elif graph.layers[topo_i].graph_type == LayerType.attention.value:
               # ......
           # More layers to handle

As we can see, this function is actually a compiler, that converts the internal model DAG configuration (which will be introduced in the ``Model configuration format`` section) ``graph``\ , to a Tensorflow computation graph.

.. code-block:: python

   topology = graph.is_topology()

performs topological sorting on the internal graph representation, and the code inside the loop:

.. code-block:: python

   for _, topo_i in enumerate(topology):

performs actually conversion that maps each layer to a part in Tensorflow computation graph.

4.3 The tuner
^^^^^^^^^^^^^

The tuner is much more simple than the trial. They actually share the same ``graph.py``. Besides, the tuner has a ``customer_tuner.py``\ , the most important class in which is ``CustomerTuner``\ :

.. code-block:: python

   class CustomerTuner(Tuner):
       # ......

       def generate_parameters(self, parameter_id):
           """Returns a set of trial graph config, as a serializable object.
           parameter_id : int
           """
           if len(self.population) <= 0:
               logger.debug("the len of poplution lower than zero.")
               raise Exception('The population is empty')
           pos = -1
           for i in range(len(self.population)):
               if self.population[i].result == None:
                   pos = i
                   break
           if pos != -1:
               indiv = copy.deepcopy(self.population[pos])
               self.population.pop(pos)
               temp = json.loads(graph_dumps(indiv.config))
           else:
               random.shuffle(self.population)
               if self.population[0].result > self.population[1].result:
                   self.population[0] = self.population[1]
               indiv = copy.deepcopy(self.population[0])
               self.population.pop(1)
               indiv.mutation()
               graph = indiv.config
               temp =  json.loads(graph_dumps(graph))

       # ......

As we can see, the overloaded method ``generate_parameters`` implements a pretty naive mutation algorithm. The code lines:

.. code-block:: python

               if self.population[0].result > self.population[1].result:
                   self.population[0] = self.population[1]
               indiv = copy.deepcopy(self.population[0])

controls the mutation process. It will always take two random individuals in the population, only keeping and mutating the one with better result.

4.4 Model configuration format
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Here is an example of the model configuration, which is passed from the tuner to the trial in the architecture search procedure.

.. code-block:: json

   {
       "max_layer_num": 50,
       "layers": [
           {
               "input_size": 0,
               "type": 3,
               "output_size": 1,
               "input": [],
               "size": "x",
               "output": [4, 5],
               "is_delete": false
           },
           {
               "input_size": 0,
               "type": 3,
               "output_size": 1,
               "input": [],
               "size": "y",
               "output": [4, 5],
               "is_delete": false
           },
           {
               "input_size": 1,
               "type": 4,
               "output_size": 0,
               "input": [6],
               "size": "x",
               "output": [],
               "is_delete": false
           },
           {
               "input_size": 1,
               "type": 4,
               "output_size": 0,
               "input": [5],
               "size": "y",
               "output": [],
               "is_delete": false
           },
           {"Comment": "More layers will be here for actual graphs."}
       ]
   }

Every model configuration will have a "layers" section, which is a JSON list of layer definitions. The definition of each layer is also a JSON object, where:


* ``type`` is the type of the layer. 0, 1, 2, 3, 4 corresponds to attention, self-attention, RNN, input and output layer respectively.
* ``size`` is the length of the output. "x", "y" correspond to document length / question length, respectively.
* ``input_size`` is the number of inputs the layer has.
* ``input`` is the indices of layers taken as input of this layer.
* ``output`` is the indices of layers use this layer's output as their input.
* ``is_delete`` means whether the layer is still available.