"docs/en_US/reference.rst" did not exist on "abc221589c65d75b494407c60a81ca87c3020463"
PBTTuner.rst 4.23 KB
Newer Older
liuzhe-lz's avatar
liuzhe-lz committed
1
2
PBT Tuner
=========
3
4
5
6
7
8
9
10
11

Population Based Training (PBT) comes from `Population Based Training of Neural Networks <https://arxiv.org/abs/1711.09846v1>`__. It's a simple asynchronous optimization algorithm which effectively utilizes a fixed computational budget to jointly optimize a population of models and their hyperparameters to maximize performance. Importantly, PBT discovers a schedule of hyperparameter settings rather than following the generally sub-optimal strategy of trying to find a single fixed set to use for the whole course of training. 


.. image:: ../../img/pbt.jpg
   :target: ../../img/pbt.jpg
   :alt: 


liuzhe-lz's avatar
liuzhe-lz committed
12
PBTTuner initializes a population with several trials (i.e., ``population_size``). There are four steps in the above figure, each trial only runs by one step. How long is one step is controlled by trial code, e.g., one epoch. When a trial starts, it loads a checkpoint specified by PBTTuner and continues to run one step, then saves checkpoint to a directory specified by PBTTuner and exits. The trials in a population run steps synchronously, that is, after all the trials finish the ``i``-th step, the ``(i+1)``-th step can be started. Exploitation and exploration of PBT are executed between two consecutive steps.
13

14
15
Usage
-----
ChrisZRen's avatar
ChrisZRen committed
16

17
18
19
Provide checkpoint directory
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

liuzhe-lz's avatar
liuzhe-lz committed
20
Since some trials need to load other trial's checkpoint, users should provide a directory (i.e., ``all_checkpoint_dir``) which is accessible by every trial. It is easy for local mode, users could directly use the default directory or specify any directory on the local machine. For other training services, users should follow `the document of those training services <../TrainingService/Overview.rst>`__ to provide a directory in a shared storage, such as NFS, Azure storage.
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

Modify your trial code
^^^^^^^^^^^^^^^^^^^^^^

Before running a step, a trial needs to load a checkpoint, the checkpoint directory is specified in hyper-parameter configuration generated by PBTTuner, i.e., ``params['load_checkpoint_dir']``. Similarly, the directory for saving checkpoint is also included in the configuration, i.e., ``params['save_checkpoint_dir']``. Here, ``all_checkpoint_dir`` is base folder of ``load_checkpoint_dir`` and ``save_checkpoint_dir`` whose format is ``all_checkpoint_dir/<population-id>/<step>``.

.. code-block:: python

   params = nni.get_next_parameter()
   # the path of the checkpoint to load
   load_path = os.path.join(params['load_checkpoint_dir'], 'model.pth')
   # load checkpoint from `load_path`
   ...
   # run one step
   ...
   # the path for saving a checkpoint
   save_path = os.path.join(params['save_checkpoint_dir'], 'model.pth')
   # save checkpoint to `save_path`
   ...

The complete example code can be found :githublink:`here <examples/trials/mnist-pbt-tuner-pytorch>`.

ChrisZRen's avatar
ChrisZRen committed
43
44
45
classArgs requirements
^^^^^^^^^^^^^^^^^^^^^^

liuzhe-lz's avatar
liuzhe-lz committed
46
47
48
49
50
* **optimize_mode** (*'maximize' or 'minimize'*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
* **all_checkpoint_dir** (*str, optional, default = None*) - Directory for trials to load and save checkpoint, if not specified, the directory would be "~/nni/checkpoint/\ :raw-html:`<exp-id>`\ ". Note that if the experiment is not local mode, users should provide a path in a shared storage which can be accessed by all the trials.
* **population_size** (*int, optional, default = 10*) - Number of trials in a population. Each step has this number of trials. In our implementation, one step is running each trial by specific training epochs set by users.
* **factors** (*tuple, optional, default = (1.2, 0.8)*) - Factors for perturbation of hyperparameters.
* **fraction** (*float, optional, default = 0.2*) - Fraction for selecting bottom and top trials.
ChrisZRen's avatar
ChrisZRen committed
51

52
53
54
55
56
57
58
59
60
Experiment config
^^^^^^^^^^^^^^^^^

Below is an exmaple of PBTTuner configuration in experiment config file. **Note that Assessor is not allowed if PBTTuner is used.**

.. code-block:: yaml

   # config.yml
   tuner:
61
     name: PBTTuner
62
63
64
65
     classArgs:
       optimize_mode: maximize
       all_checkpoint_dir: /the/path/to/store/checkpoints
       population_size: 10
ChrisZRen's avatar
ChrisZRen committed
66
67
68
69
70
71
72
73

Example Configuration
^^^^^^^^^^^^^^^^^^^^^

.. code-block:: yaml

   # config.yml
   tuner:
liuzhe-lz's avatar
liuzhe-lz committed
74
     name: PBTTuner
ChrisZRen's avatar
ChrisZRen committed
75
     classArgs:
76
       optimize_mode: maximize