Add more detailed introduction about the evolution algorithm.

eeab424e · Sinan Tan · xuehui · 08a75265 · eeab424e
Commit eeab424e authored Sep 21, 2018 by Sinan Tan Committed by xuehui Sep 25, 2018
Hide whitespace changes
Inline Side-by-side

Showing with 110 additions and 0 deletions

examples/trials/ga_squad/README.md examples/trials/ga_squad/README.md +110 -0

No files found.
--- a/examples/trials/ga_squad/README.md
+++ b/examples/trials/ga_squad/README.md
@@ -85,6 +85,116 @@ nnictl create --config ~/nni/examples/trials/ga_squad/config.yaml
 # Techinal details about the trial
+## How does it works
+The evolution-algorithm based architecture for question answering has two different parts just like any other examples: the trial and the tuner.
+### The trial
+The trial has a lot of different files, functions and classes. Here we will only give most of those files a brief introduction:
+* `attention.py` contains an implementaion for attention mechanism in Tensorflow.
+* `data.py` contains functions for data preprocessing.
+* `evaluate.py` contains the evaluation script.
+* `graph.py` contains the definition of the computation graph.
+* `rnn.py` contains an implementaion for GRU in Tensorflow.
+* `train_model.py` is a wrapper for the whole question answering model.
+Among those files, `trial.py` and `graph_to_tf.py` is special.
+`graph_to_tf.py` has a function named as `graph_to_network`, here is its skelton code:
+```
+def graph_to_network(input1,
+                     input2,
+                     input1_lengths,
+                     input2_lengths,
+                     graph,
+                     dropout_rate,
+                     is_training,
+                     num_heads=1,
+                     rnn_units=256):
+    topology = graph.is_topology()
+    layers = dict()
+    layers_sequence_lengths = dict()
+    num_units = input1.get_shape().as_list()[-1]
+    layers[0] = input1*tf.sqrt(tf.cast(num_units, tf.float32)) + \
+        positional_encoding(input1, scale=False, zero_pad=False)
+    layers[1] = input2*tf.sqrt(tf.cast(num_units, tf.float32))
+    layers[0] = dropout(layers[0], dropout_rate, is_training)
+    layers[1] = dropout(layers[1], dropout_rate, is_training)
+    layers_sequence_lengths[0] = input1_lengths
+    layers_sequence_lengths[1] = input2_lengths
+    for _, topo_i in enumerate(topology):
+        if topo_i == '|':
+            continue
+        if graph.layers[topo_i].graph_type == LayerType.input.value:
+            # ......
+        elif graph.layers[topo_i].graph_type == LayerType.attention.value:
+            # ......
+        # More layers to handle
+```
+As we can see, this function is actually a compiler, that converts the internal model DAG configuration (which will be introduced in the `Model configuration format` section) `graph`, to a Tensorflow computation graph.
+```
+topology = graph.is_topology()
+```
+performs topological sorting on the internal graph representation, and the code inside the loop:
+```
+for _, topo_i in enumerate(topology):
+```
+performs actually conversion that maps each layer to a part in Tensorflow computation graph.
+### The tuner
+The tuner is much more simple than the trial. They actually share the same `graph.py`. Besides, the tuner has a `customer_tuner.py`, the most important class in which is `CustomerTuner`:
+```
+class CustomerTuner(Tuner):
+    # ......
+    def generate_parameters(self, parameter_id):
+        """Returns a set of trial graph config, as a serializable object.
+        parameter_id : int
+        """
+        if len(self.population) <= 0:
+            logger.debug("the len of poplution lower than zero.")
+            raise Exception('The population is empty')
+        pos = -1
+        for i in range(len(self.population)):
+            if self.population[i].result == None:
+                pos = i
+                break
+        if pos != -1:
+            indiv = copy.deepcopy(self.population[pos])
+            self.population.pop(pos)
+            temp = json.loads(graph_dumps(indiv.config))
+        else:
+            random.shuffle(self.population)
+            if self.population[0].result > self.population[1].result:
+                self.population[0] = self.population[1]
+            indiv = copy.deepcopy(self.population[0])
+            self.population.pop(1)
+            indiv.mutation()
+            graph = indiv.config
+            temp =  json.loads(graph_dumps(graph))
+    # ......
+```
+As we can see, the overloaded method `generate_parameters` implements a pretty naive mutation algorithm. The code lines:
+```
+            if self.population[0].result > self.population[1].result:
+                self.population[0] = self.population[1]
+            indiv = copy.deepcopy(self.population[0])
+```
+controls the mutation process. It will always take two random individuals in the population, only keeping and mutating the one with better result.
 ## Model configuration format
 Here is an example of the model configuration, which is passed from the tuner to the trial in the architecture search procedure.