"src/git@developer.sourcefind.cn:OpenDAS/nni.git" did not exist on "d218dcb5f13d938a4aff1bcbf6c351b9908718b3"
Unverified Commit b7c686f6 authored by xuehui's avatar xuehui Committed by GitHub
Browse files

Update ga squad (#104)

* update readme in ga_squad

* update readme

* fix typo

* Update README.md

* Update README.md

* Update README.md

* update readme
parent b84c50cb
...@@ -121,3 +121,6 @@ In the yaml configure file, you need to set *useAnnotation* to true to enable NN ...@@ -121,3 +121,6 @@ In the yaml configure file, you need to set *useAnnotation* to true to enable NN
``` ```
useAnnotation: true useAnnotation: true
``` ```
## More Trial Example
* [Automatic Model Architecture Search for Reading Comprehension.](../examples/trials/ga_squad/README.md)
\ No newline at end of file
# Automatic Model Architecture Search for Reading Comprehension # Automatic Model Architecture Search for Reading Comprehension
This example shows us how to use Genetic Algorithm to find good model architectures for Reading Comprehension task. This example shows us how to use Genetic Algorithm to find good model architectures for Reading Comprehension task.
## Search Space ## Search Space
Since attention and recurrent neural network (RNN) module have been proven effective in Reading Comprehension. Since attention and recurrent neural network (RNN) module have been proven effective in Reading Comprehension.
We conclude the search space as follow: We conclude the search space as follow:
1. IDENTITY (Effectively means keep training). 1. IDENTITY (Effectively means keep training).
2. INSERT-RNN-LAYER (Inserts a LSTM. Comparing the performance of GRU and LSTM in our experiment, we decided to use LSTM here.) 2. INSERT-RNN-LAYER (Inserts a LSTM. Comparing the performance of GRU and LSTM in our experiment, we decided to use LSTM here.)
3. REMOVE-RNN-LAYER 3. REMOVE-RNN-LAYER
4. INSERT-ATTENTION-LAYER(Inserts a attention layer.) 4. INSERT-ATTENTION-LAYER(Inserts a attention layer.)
5. REMOVE-ATTENTION-LAYER 5. REMOVE-ATTENTION-LAYER
6. ADD-SKIP (Identity between random layers). 6. ADD-SKIP (Identity between random layers).
7. REMOVE-SKIP (Removes random skip). 7. REMOVE-SKIP (Removes random skip).
![ga-squad-logo](./ga_squad.png) ![ga-squad-logo](./ga_squad.png)
## New version ## New version
Also we have another version which time cost is less and performance is better. We will release soon. Also we have another version which time cost is less and performance is better. We will release soon.
# How to run this example? # How to run this example?
## Download data ## Use downloading script to download data
### Use downloading script to download data Execute the following command to download needed files
using the downloading script:
Execute the following command to download needed files
using the downloading script: ```
chmod +x ./download.sh
``` ./download.sh
chmod +x ./download.sh ```
./download.sh
``` ## Download manually
### Download manually 1. download "dev-v1.1.json" and "train-v1.1.json" in https://rajpurkar.github.io/SQuAD-explorer/
1. download "dev-v1.1.json" and "train-v1.1.json" in https://rajpurkar.github.io/SQuAD-explorer/ ```
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json
``` wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json ```
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json
``` 2. download "glove.840B.300d.txt" in https://nlp.stanford.edu/projects/glove/
2. download "glove.840B.300d.txt" in https://nlp.stanford.edu/projects/glove/ ```
wget http://nlp.stanford.edu/data/glove.840B.300d.zip
``` unzip glove.840B.300d.zip
wget http://nlp.stanford.edu/data/glove.840B.300d.zip ```
unzip glove.840B.300d.zip
``` ## Update configuration
Modify `nni/examples/trials/ga_squad/config.yaml`, here is the default configuration:
## Update configuration
Modify `nni/examples/trials/ga_squad/config.yaml`, here is the default configuration: ```
authorName: default
``` experimentName: example_ga_squad
authorName: default trialConcurrency: 1
experimentName: example_ga_squad maxExecDuration: 1h
trialConcurrency: 1 maxTrialNum: 1
maxExecDuration: 1h #choice: local, remote
maxTrialNum: 1 trainingServicePlatform: local
#choice: local, remote #choice: true, false
trainingServicePlatform: local useAnnotation: false
#choice: true, false tuner:
useAnnotation: false codeDir: ~/nni/examples/tuners/ga_customer_tuner
tuner: classFileName: customer_tuner.py
codeDir: ~/nni/examples/tuners/ga_customer_tuner className: CustomerTuner
classFileName: customer_tuner.py classArgs:
className: CustomerTuner optimize_mode: maximize
classArgs: trial:
optimize_mode: maximize command: python3 trial.py
trial: codeDir: ~/nni/examples/trials/ga_squad
command: python3 trial.py gpuNum: 0
codeDir: ~/nni/examples/trials/ga_squad ```
gpuNum: 0
``` In the "trial" part, if you want to use GPU to perform the architecture search, change `gpuNum` from `0` to `1`. You need to increase the `maxTrialNum` and `maxExecDuration`, according to how long you want to wait for the search result.
In the "trial" part, if you want to use GPU to perform the architecture search, change `gpuNum` from `0` to `1`. You need to increase the `maxTrialNum` and `maxExecDuration`, according to how long you want to wait for the search result. `trialConcurrency` is the number of trials running concurrently, which is the number of GPUs you want to use, if you are setting `gpuNum` to 1.
`trialConcurrency` is the number of trials running concurrently, which is the number of GPUs you want to use, if you are setting `gpuNum` to 1. ## submit this job
## submit this job ```
nnictl create --config ~/nni/examples/trials/ga_squad/config.yaml
``` ```
nnictl create --config ~/nni/examples/trials/ga_squad/config.yaml
``` # Techinal details about the trial
# Techinal details about the trial ## How does it works
The evolution-algorithm based architecture for question answering has two different parts just like any other examples: the trial and the tuner.
## How does it works
The evolution-algorithm based architecture for question answering has two different parts just like any other examples: the trial and the tuner. ### The trial
### The trial The trial has a lot of different files, functions and classes. Here we will only give most of those files a brief introduction:
The trial has a lot of different files, functions and classes. Here we will only give most of those files a brief introduction: * `attention.py` contains an implementaion for attention mechanism in Tensorflow.
* `data.py` contains functions for data preprocessing.
* `attention.py` contains an implementaion for attention mechanism in Tensorflow. * `evaluate.py` contains the evaluation script.
* `data.py` contains functions for data preprocessing. * `graph.py` contains the definition of the computation graph.
* `evaluate.py` contains the evaluation script. * `rnn.py` contains an implementaion for GRU in Tensorflow.
* `graph.py` contains the definition of the computation graph. * `train_model.py` is a wrapper for the whole question answering model.
* `rnn.py` contains an implementaion for GRU in Tensorflow.
* `train_model.py` is a wrapper for the whole question answering model. Among those files, `trial.py` and `graph_to_tf.py` is special.
Among those files, `trial.py` and `graph_to_tf.py` is special. `graph_to_tf.py` has a function named as `graph_to_network`, here is its skelton code:
`graph_to_tf.py` has a function named as `graph_to_network`, here is its skelton code: ```
def graph_to_network(input1,
``` input2,
def graph_to_network(input1, input1_lengths,
input2, input2_lengths,
input1_lengths, graph,
input2_lengths, dropout_rate,
graph, is_training,
dropout_rate, num_heads=1,
is_training, rnn_units=256):
num_heads=1, topology = graph.is_topology()
rnn_units=256): layers = dict()
topology = graph.is_topology() layers_sequence_lengths = dict()
layers = dict() num_units = input1.get_shape().as_list()[-1]
layers_sequence_lengths = dict() layers[0] = input1*tf.sqrt(tf.cast(num_units, tf.float32)) + \
num_units = input1.get_shape().as_list()[-1] positional_encoding(input1, scale=False, zero_pad=False)
layers[0] = input1*tf.sqrt(tf.cast(num_units, tf.float32)) + \ layers[1] = input2*tf.sqrt(tf.cast(num_units, tf.float32))
positional_encoding(input1, scale=False, zero_pad=False) layers[0] = dropout(layers[0], dropout_rate, is_training)
layers[1] = input2*tf.sqrt(tf.cast(num_units, tf.float32)) layers[1] = dropout(layers[1], dropout_rate, is_training)
layers[0] = dropout(layers[0], dropout_rate, is_training) layers_sequence_lengths[0] = input1_lengths
layers[1] = dropout(layers[1], dropout_rate, is_training) layers_sequence_lengths[1] = input2_lengths
layers_sequence_lengths[0] = input1_lengths for _, topo_i in enumerate(topology):
layers_sequence_lengths[1] = input2_lengths if topo_i == '|':
for _, topo_i in enumerate(topology): continue
if topo_i == '|': if graph.layers[topo_i].graph_type == LayerType.input.value:
continue # ......
if graph.layers[topo_i].graph_type == LayerType.input.value: elif graph.layers[topo_i].graph_type == LayerType.attention.value:
# ...... # ......
elif graph.layers[topo_i].graph_type == LayerType.attention.value: # More layers to handle
# ...... ```
# More layers to handle
``` As we can see, this function is actually a compiler, that converts the internal model DAG configuration (which will be introduced in the `Model configuration format` section) `graph`, to a Tensorflow computation graph.
As we can see, this function is actually a compiler, that converts the internal model DAG configuration (which will be introduced in the `Model configuration format` section) `graph`, to a Tensorflow computation graph. ```
topology = graph.is_topology()
``` ```
topology = graph.is_topology()
``` performs topological sorting on the internal graph representation, and the code inside the loop:
performs topological sorting on the internal graph representation, and the code inside the loop: ```
for _, topo_i in enumerate(topology):
``` ```
for _, topo_i in enumerate(topology):
``` performs actually conversion that maps each layer to a part in Tensorflow computation graph.
performs actually conversion that maps each layer to a part in Tensorflow computation graph. ### The tuner
### The tuner The tuner is much more simple than the trial. They actually share the same `graph.py`. Besides, the tuner has a `customer_tuner.py`, the most important class in which is `CustomerTuner`:
The tuner is much more simple than the trial. They actually share the same `graph.py`. Besides, the tuner has a `customer_tuner.py`, the most important class in which is `CustomerTuner`: ```
class CustomerTuner(Tuner):
``` # ......
class CustomerTuner(Tuner):
# ...... def generate_parameters(self, parameter_id):
"""Returns a set of trial graph config, as a serializable object.
def generate_parameters(self, parameter_id): parameter_id : int
"""Returns a set of trial graph config, as a serializable object. """
parameter_id : int if len(self.population) <= 0:
""" logger.debug("the len of poplution lower than zero.")
if len(self.population) <= 0: raise Exception('The population is empty')
logger.debug("the len of poplution lower than zero.") pos = -1
raise Exception('The population is empty') for i in range(len(self.population)):
pos = -1 if self.population[i].result == None:
for i in range(len(self.population)): pos = i
if self.population[i].result == None: break
pos = i if pos != -1:
break indiv = copy.deepcopy(self.population[pos])
if pos != -1: self.population.pop(pos)
indiv = copy.deepcopy(self.population[pos]) temp = json.loads(graph_dumps(indiv.config))
self.population.pop(pos) else:
temp = json.loads(graph_dumps(indiv.config)) random.shuffle(self.population)
else: if self.population[0].result > self.population[1].result:
random.shuffle(self.population) self.population[0] = self.population[1]
if self.population[0].result > self.population[1].result: indiv = copy.deepcopy(self.population[0])
self.population[0] = self.population[1] self.population.pop(1)
indiv = copy.deepcopy(self.population[0]) indiv.mutation()
self.population.pop(1) graph = indiv.config
indiv.mutation() temp = json.loads(graph_dumps(graph))
graph = indiv.config
temp = json.loads(graph_dumps(graph)) # ......
```
# ......
``` As we can see, the overloaded method `generate_parameters` implements a pretty naive mutation algorithm. The code lines:
As we can see, the overloaded method `generate_parameters` implements a pretty naive mutation algorithm. The code lines: ```
if self.population[0].result > self.population[1].result:
``` self.population[0] = self.population[1]
if self.population[0].result > self.population[1].result: indiv = copy.deepcopy(self.population[0])
self.population[0] = self.population[1] ```
indiv = copy.deepcopy(self.population[0])
``` controls the mutation process. It will always take two random individuals in the population, only keeping and mutating the one with better result.
controls the mutation process. It will always take two random individuals in the population, only keeping and mutating the one with better result. ## Model configuration format
## Model configuration format Here is an example of the model configuration, which is passed from the tuner to the trial in the architecture search procedure.
Here is an example of the model configuration, which is passed from the tuner to the trial in the architecture search procedure. ```
{
``` "max_layer_num": 50,
{ "layers": [
"max_layer_num": 50, {
"layers": [ "input_size": 0,
{ "type": 3,
"input_size": 0, "output_size": 1,
"type": 3, "input": [],
"output_size": 1, "size": "x",
"input": [], "output": [4, 5],
"size": "x", "is_delete": false
"output": [4, 5], },
"is_delete": false {
}, "input_size": 0,
{ "type": 3,
"input_size": 0, "output_size": 1,
"type": 3, "input": [],
"output_size": 1, "size": "y",
"input": [], "output": [4, 5],
"size": "y", "is_delete": false
"output": [4, 5], },
"is_delete": false {
}, "input_size": 1,
{ "type": 4,
"input_size": 1, "output_size": 0,
"type": 4, "input": [6],
"output_size": 0, "size": "x",
"input": [6], "output": [],
"size": "x", "is_delete": false
"output": [], },
"is_delete": false {
}, "input_size": 1,
{ "type": 4,
"input_size": 1, "output_size": 0,
"type": 4, "input": [5],
"output_size": 0, "size": "y",
"input": [5], "output": [],
"size": "y", "is_delete": false
"output": [], },
"is_delete": false {"Comment": "More layers will be here for actual graphs."}
}, ]
{"Comment": "More layers will be here for actual graphs."} }
] ```
}
``` Every model configuration will has a "layers" section, which is a JSON list of layer definitions. The definition of each layer is also a JSON object, where:
Every model configuration will has a "layers" section, which is a JSON list of layer definitions. The definition of each layer is also a JSON object, where: * `type` is the type of the layer. 0, 1, 2, 3, 4 corresponde to attention, self-attention, RNN, input and output layer respectively.
* `size` is the length of the output. "x", "y" corresponde to document length / question length, respectively.
* `type` is the type of the layer. 0, 1, 2, 3, 4 corresponde to attention, self-attention, RNN, input and output layer respectively. * `input_size` is the number of inputs the layer has.
* `size` is the length of the output. "x", "y" corresponde to document length / question length, respectively. * `input` is the indices of layers taken as input of this layer.
* `input_size` is the number of inputs the layer has. * `output` is the indices of layers use this layer's output as their input.
* `input` is the indices of layers taken as input of this layer. * `is_delete` means whether the layer is still available.
* `output` is the indices of layers use this layer's output as their input.
* `is_delete` means whether the layer is still available.
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment