"Makefile" did not exist on "781cea26c3e4f3da0b63bea8cfaba1ed96c0338d"
MultiPhase.md 5.01 KB
Newer Older
QuanluZhang's avatar
QuanluZhang committed
1
## What is multi-phase experiment
chicm-ms's avatar
chicm-ms committed
2

QuanluZhang's avatar
QuanluZhang committed
3
Typically each trial job gets a single configuration (e.g., hyperparameters) from tuner, tries this configuration and reports result, then exits. But sometimes a trial job may wants to request multiple configurations from tuner. We find this is a very compelling feature. For example:
4

Chi Song's avatar
Chi Song committed
5
1. Job launch takes tens of seconds in some training platform. If a configuration takes only around a minute to finish, running only one configuration in a trial job would be very inefficient. An appealing way is that a trial job requests a configuration and finishes it, then requests another configuration and run. The extreme case is that a trial job can run infinite configurations. If you set concurrency to be for example 6, there would be 6 __long running__ jobs keeping trying different configurations.
chicm-ms's avatar
chicm-ms committed
6

QuanluZhang's avatar
QuanluZhang committed
7
2. Some types of models have to be trained phase by phase, the configuration of next phase depends on the results of previous phase(s). For example, to find the best quantization for a model, the training procedure is often as follows: the auto-quantization algorithm (i.e., tuner in NNI) chooses a size of bits (e.g., 16 bits), a trial job gets this configuration and trains the model for some epochs and reports result (e.g., accuracy). The algorithm receives this result and makes decision of changing 16 bits to 8 bits, or changing back to 32 bits. This process is repeated for a configured times.
chicm-ms's avatar
chicm-ms committed
8

9
The above cases can be supported by the same feature, i.e., multi-phase execution. To support those cases, basically a trial job should be able to request multiple configurations from tuner. Tuner is aware of whether two configuration requests are from the same trial job or different ones. Also in multi-phase a trial job can report multiple final results.
chicm-ms's avatar
chicm-ms committed
10

QuanluZhang's avatar
QuanluZhang committed
11
Note that, `nni.get_next_parameter()` and `nni.report_final_result()` should be called sequentially: __call the former one, then call the later one; and repeat this pattern__. If `nni.get_next_parameter()` is called multiple times consecutively, and then `nni.report_final_result()` is called once, the result is associated to the last configuration, which is retrieved from the last get_next_parameter call. So there is no result associated to previous get_next_parameter calls, and it may cause some multi-phase algorithm broken.
chicm-ms's avatar
chicm-ms committed
12

QuanluZhang's avatar
QuanluZhang committed
13
## Create multi-phase experiment
14

QuanluZhang's avatar
QuanluZhang committed
15
### Write trial code which leverages multi-phase:
16

QuanluZhang's avatar
QuanluZhang committed
17
__1. Update trial code__
18

QuanluZhang's avatar
QuanluZhang committed
19
It is pretty simple to use multi-phase in trial code, an example is shown below:
20
21

    ```python
QuanluZhang's avatar
QuanluZhang committed
22
    # ...
23
24
25
26
27
28
29
30
31
    for i in range(5):
        # get parameter from tuner
        tuner_param = nni.get_next_parameter()

        # consume the params
        # ...
        # report final result somewhere for the parameter retrieved above
        nni.report_final_result()
        # ...
QuanluZhang's avatar
QuanluZhang committed
32
    # ...
33
    ```
QuanluZhang's avatar
QuanluZhang committed
34

chicm-ms's avatar
chicm-ms committed
35
__2. Experiment configuration__
QuanluZhang's avatar
QuanluZhang committed
36

chicm-ms's avatar
chicm-ms committed
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
To enable multi-phase, you should also add `multiPhase: true` in your experiment YAML configure file. If this line is not added, `nni.get_next_parameter()` would always return the same configuration.

Multi-phase experiment configuration example:

```
authorName: default
experimentName: multiphase experiment
trialConcurrency: 2
maxExecDuration: 1h
maxTrialNum: 8
trainingServicePlatform: local
searchSpacePath: search_space.json
multiPhase: true
useAnnotation: false
tuner:
  builtinTunerName: TPE
  classArgs:
    optimize_mode: maximize
trial:
  command: python3 mytrial.py
  codeDir: .
  gpuNum: 0
```
QuanluZhang's avatar
QuanluZhang committed
60
61
62

### Write a tuner that leverages multi-phase:

63
64
65
66
67
68
69
70
71
Before writing a multi-phase tuner, we highly suggest you to go through  [Customize Tuner](https://nni.readthedocs.io/en/latest/Customize_Tuner.html). Same as writing a normal tuner, your tuner needs to inherit from `Tuner` class. When you enable multi-phase through configuration (set `multiPhase` to true), your tuner will get an additional parameter `trial_job_id` via tuner's following methods:
```
generate_parameters
generate_multiple_parameters
receive_trial_result
receive_customized_trial_result
trial_end
```
With this information, the tuner could know which trial is requesting a configuration, and which trial is reporting results. This information provides enough flexibility for your tuner to deal with different trials and different phases. For example, you may want to use the trial_job_id parameter of generate_parameters method to generate hyperparameters for a specific trial job.
QuanluZhang's avatar
QuanluZhang committed
72

chicm-ms's avatar
chicm-ms committed
73
74
75
### Tuners support multi-phase experiments:

[TPE](../Tuner/HyperoptTuner.md), [Random](../Tuner/HyperoptTuner.md), [Anneal](../Tuner/HyperoptTuner.md), [Evolution](../Tuner/EvolutionTuner.md), [SMAC](../Tuner/SmacTuner.md), [NetworkMorphism](../Tuner/NetworkmorphismTuner.md), [MetisTuner](../Tuner/MetisTuner.md), [BOHB](../Tuner/BohbAdvisor.md), [Hyperband](../Tuner/HyperbandAdvisor.md), [ENAS tuner](https://github.com/countif/enas_nni/blob/master/nni/examples/tuners/enas/nni_controller_ptb.py).
QuanluZhang's avatar
QuanluZhang committed
76

chicm-ms's avatar
chicm-ms committed
77
78
### Training services support multi-phase experiment:
[Local Machine](../TrainingService/LocalMode.md), [Remote Servers](../TrainingService/RemoteMachineMode.md), [OpenPAI](../TrainingService/PaiMode.md)