"vscode:/vscode.git/clone" did not exist on "35f709a210feaebfc7c1ce02c6564e290ab08c7a"
nnictl.rst 5.93 KB
Newer Older
qianyj's avatar
qianyj committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
Run HPO Experiment with nnictl
==============================

This tutorial has exactly the same effect as :doc:`../hpo_quickstart_pytorch/main`.

Both tutorials optimize the model in `official PyTorch quickstart
<https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html>`__ with auto-tuning,
while this one manages the experiment with command line tool and YAML config file, instead of pure Python code.

The tutorial consists of 4 steps: 

1. Modify the model for auto-tuning.
2. Define hyperparameters' search space.
3. Create config file.
4. Run the experiment.

The first two steps are identical to quickstart.

Step 1: Prepare the model
-------------------------
In first step, we need to prepare the model to be tuned.

The model should be put in a separate script.
It will be evaluated many times concurrently,
and possibly will be trained on distributed platforms.

In this tutorial, the model is defined in :doc:`model.py <model>`.

In short, it is a PyTorch model with 3 additional API calls:

1. Use :func:`nni.get_next_parameter` to fetch the hyperparameters to be evalutated.
2. Use :func:`nni.report_intermediate_result` to report per-epoch accuracy metrics.
3. Use :func:`nni.report_final_result` to report final accuracy.

Please understand the model code before continue to next step.

Step 2: Define search space
---------------------------
In model code, we have prepared 3 hyperparameters to be tuned:
*features*, *lr*, and *momentum*.

Here we need to define their *search space* so the tuning algorithm can sample them in desired range.

Assuming we have following prior knowledge for these hyperparameters:

1. *features* should be one of 128, 256, 512, 1024.
2. *lr* should be a float between 0.0001 and 0.1, and it follows exponential distribution.
3. *momentum* should be a float between 0 and 1.

In NNI, the space of *features* is called ``choice``;
the space of *lr* is called ``loguniform``;
and the space of *momentum* is called ``uniform``.
You may have noticed, these names are derived from ``numpy.random``.

For full specification of search space, check :doc:`the reference </hpo/search_space>`.

Now we can define the search space as follow:

.. code-block:: yaml

    search_space:
      features:
        _type: choice
        _value: [ 128, 256, 512, 1024 ]
      lr:
        _type: loguniform
        _value: [ 0.0001, 0.1 ]
      momentum:
        _type: uniform
        _value: [ 0, 1 ]

Step 3: Configure the experiment
--------------------------------
NNI uses an *experiment* to manage the HPO process.
The *experiment config* defines how to train the models and how to explore the search space.

In this tutorial we use a YAML file ``config.yaml`` to define the experiment.

Configure trial code
^^^^^^^^^^^^^^^^^^^^
In NNI evaluation of each hyperparameter set is called a *trial*.
So the model script is called *trial code*.

.. code-block:: yaml

    trial_command: python model.py
    trial_code_directory: .

When ``trial_code_directory`` is a relative path, it relates to the config file.
So in this case we need to put ``config.yaml`` and ``model.py`` in the same directory.

.. attention::

    The rules for resolving relative path are different in YAML config file and :doc:`Python experiment API </reference/experiment>`.
    In Python experiment API relative paths are relative to current working directory.

Configure how many trials to run
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here we evaluate 10 sets of hyperparameters in total, and concurrently evaluate 2 sets at a time.

.. code-block:: yaml

    max_trial_number: 10
    trial_concurrency: 2

You may also set ``max_experiment_duration = '1h'`` to limit running time.

If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
the experiment will run forever until you stop it.

.. note::

    ``max_trial_number`` is set to 10 here for a fast example.
    In real world it should be set to a larger number.
    With default config TPE tuner requires 20 trials to warm up.


Configure tuning algorithm
^^^^^^^^^^^^^^^^^^^^^^^^^^
Here we use :doc:`TPE tuner </hpo/tuners>`.

.. code-block:: yaml

    name: TPE
    class_args:
      optimize_mode: maximize

Configure training service
^^^^^^^^^^^^^^^^^^^^^^^^^^

In this tutorial we use *local* mode,
which means models will be trained on local machine, without using any special training platform.

.. code-block:: yaml

    training_service:
      platform: local

Wrap up
^^^^^^^

The full content of ``config.yaml`` is as follow:

.. code-block:: yaml

    search_space:
      features:
        _type: choice
        _value: [ 128, 256, 512, 1024 ]
      lr:
        _type: loguniform
        _value: [ 0.0001, 0.1 ]
      momentum:
        _type: uniform
        _value: [ 0, 1 ]
    
    trial_command: python model.py
    trial_code_directory: .

    trial_concurrency: 2
    max_trial_number: 10
    
    tuner:
      name: TPE
      class_args:
        optimize_mode: maximize
    
    training_service:
      platform: local

Step 4: Run the experiment
--------------------------
Now the experiment is ready. Launch it with ``nnictl create`` command:

.. code-block:: bash

    $ nnictl create --config config.yaml --port 8080

You can use the web portal to view experiment status: http://localhost:8080.

.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    [2022-04-01 12:00:00] Creating experiment, Experiment ID: p43ny6ew
    [2022-04-01 12:00:00] Starting web server...
    [2022-04-01 12:00:01] Setting up...
    [2022-04-01 12:00:01] Web portal URLs: http://127.0.0.1:8080 http://192.168.1.1:8080
    [2022-04-01 12:00:01] To stop experiment run "nnictl stop p43ny6ew" or "nnictl stop --all"
    [2022-04-01 12:00:01] Reference: https://nni.readthedocs.io/en/stable/reference/nnictl.html

When the experiment is done, use ``nnictl stop`` command to stop it.

.. code-block:: bash

    $ nnictl stop p43ny6ew

.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    INFO:  Stopping experiment 7u8yg9zw
    INFO:  Stop experiment success.