hello_nas.rst 18.1 KB
Newer Older
Yuge Zhang's avatar
Yuge Zhang committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99

.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/hello_nas.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_tutorials_hello_nas.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_tutorials_hello_nas.py:


Hello, NAS!
===========

This is the 101 tutorial of Neural Architecture Search (NAS) on NNI.
In this tutorial, we will search for a neural architecture on MNIST dataset with the help of NAS framework of NNI, i.e., *Retiarii*.
We use multi-trial NAS as an example to show how to construct and explore a model space.

There are mainly three crucial components for a neural architecture search task, namely,

* Model search space that defines a set of models to explore.
* A proper strategy as the method to explore this model space.
* A model evaluator that reports the performance of every model in the space.

Currently, PyTorch is the only supported framework by Retiarii, and we have only tested **PyTorch 1.7 to 1.10**.
This tutorial assumes PyTorch context but it should also apply to other frameworks, which is in our future plan.

Define your Model Space
-----------------------

Model space is defined by users to express a set of models that users want to explore, which contains potentially good-performing models.
In this framework, a model space is defined with two parts: a base model and possible mutations on the base model.

.. GENERATED FROM PYTHON SOURCE LINES 26-34

Define Base Model
^^^^^^^^^^^^^^^^^

Defining a base model is almost the same as defining a PyTorch (or TensorFlow) model.
Usually, you only need to replace the code ``import torch.nn as nn`` with
``import nni.retiarii.nn.pytorch as nn`` to use our wrapped PyTorch modules.

Below is a very simple example of defining a base model.

.. GENERATED FROM PYTHON SOURCE LINES 35-61

.. code-block:: default


    import torch
    import torch.nn.functional as F
    import nni.retiarii.nn.pytorch as nn
    from nni.retiarii import model_wrapper


    @model_wrapper      # this decorator should be put on the out most
    class Net(nn.Module):
        def __init__(self):
            super().__init__()
            self.conv1 = nn.Conv2d(1, 32, 3, 1)
            self.conv2 = nn.Conv2d(32, 64, 3, 1)
            self.dropout1 = nn.Dropout(0.25)
            self.dropout2 = nn.Dropout(0.5)
            self.fc1 = nn.Linear(9216, 128)
            self.fc2 = nn.Linear(128, 10)

        def forward(self, x):
            x = F.relu(self.conv1(x))
            x = F.max_pool2d(self.conv2(x), 2)
            x = torch.flatten(self.dropout1(x), 1)
            x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
            output = F.log_softmax(x, dim=1)
            return output








.. GENERATED FROM PYTHON SOURCE LINES 62-104

.. tip:: Always keep in mind that you should use ``import nni.retiarii.nn.pytorch as nn`` and :meth:`nni.retiarii.model_wrapper`.
         Many mistakes are a result of forgetting one of those.
         Also, please use ``torch.nn`` for submodules of ``nn.init``, e.g., ``torch.nn.init`` instead of ``nn.init``.

Define Model Mutations
^^^^^^^^^^^^^^^^^^^^^^

Yuge Zhang's avatar
Yuge Zhang committed
100
A base model is only one concrete model not a model space. We provide :doc:`API and Primitives </nas/construct_space>`
Yuge Zhang's avatar
Yuge Zhang committed
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
for users to express how the base model can be mutated. That is, to build a model space which includes many models.

Based on the above base model, we can define a model space as below.

.. code-block:: diff

  @model_wrapper
  class Net(nn.Module):
    def __init__(self):
      super().__init__()
      self.conv1 = nn.Conv2d(1, 32, 3, 1)
  -   self.conv2 = nn.Conv2d(32, 64, 3, 1)
  +   self.conv2 = nn.LayerChoice([
  +       nn.Conv2d(32, 64, 3, 1),
  +       DepthwiseSeparableConv(32, 64)
  +   ])
  -   self.dropout1 = nn.Dropout(0.25)
  +   self.dropout1 = nn.Dropout(nn.ValueChoice([0.25, 0.5, 0.75]))
      self.dropout2 = nn.Dropout(0.5)
  -   self.fc1 = nn.Linear(9216, 128)
  -   self.fc2 = nn.Linear(128, 10)
  +   feature = nn.ValueChoice([64, 128, 256])
  +   self.fc1 = nn.Linear(9216, feature)
  +   self.fc2 = nn.Linear(feature, 10)

    def forward(self, x):
      x = F.relu(self.conv1(x))
      x = F.max_pool2d(self.conv2(x), 2)
      x = torch.flatten(self.dropout1(x), 1)
      x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
      output = F.log_softmax(x, dim=1)
      return output

This results in the following code:

.. GENERATED FROM PYTHON SOURCE LINES 104-147

.. code-block:: default



    class DepthwiseSeparableConv(nn.Module):
        def __init__(self, in_ch, out_ch):
            super().__init__()
            self.depthwise = nn.Conv2d(in_ch, in_ch, kernel_size=3, groups=in_ch)
            self.pointwise = nn.Conv2d(in_ch, out_ch, kernel_size=1)

        def forward(self, x):
            return self.pointwise(self.depthwise(x))


    @model_wrapper
    class ModelSpace(nn.Module):
        def __init__(self):
            super().__init__()
            self.conv1 = nn.Conv2d(1, 32, 3, 1)
            # LayerChoice is used to select a layer between Conv2d and DwConv.
            self.conv2 = nn.LayerChoice([
                nn.Conv2d(32, 64, 3, 1),
                DepthwiseSeparableConv(32, 64)
            ])
            # ValueChoice is used to select a dropout rate.
            # ValueChoice can be used as parameter of modules wrapped in `nni.retiarii.nn.pytorch`
            # or customized modules wrapped with `@basic_unit`.
            self.dropout1 = nn.Dropout(nn.ValueChoice([0.25, 0.5, 0.75]))  # choose dropout rate from 0.25, 0.5 and 0.75
            self.dropout2 = nn.Dropout(0.5)
            feature = nn.ValueChoice([64, 128, 256])
            self.fc1 = nn.Linear(9216, feature)
            self.fc2 = nn.Linear(feature, 10)

        def forward(self, x):
            x = F.relu(self.conv1(x))
            x = F.max_pool2d(self.conv2(x), 2)
            x = torch.flatten(self.dropout1(x), 1)
            x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
            output = F.log_softmax(x, dim=1)
            return output


    model_space = ModelSpace()
    model_space





.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none


    ModelSpace(
      (conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))
      (conv2): LayerChoice([Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1)), DepthwiseSeparableConv(
        (depthwise): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32)
        (pointwise): Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1))
      )], label='model_1')
      (dropout1): Dropout(p=0.25, inplace=False)
      (dropout2): Dropout(p=0.5, inplace=False)
      (fc1): Linear(in_features=9216, out_features=64, bias=True)
      (fc2): Linear(in_features=64, out_features=10, bias=True)
    )



208
.. GENERATED FROM PYTHON SOURCE LINES 148-182
Yuge Zhang's avatar
Yuge Zhang committed
209

210
211
212
213
214
This example uses two mutation APIs,
:class:`nn.LayerChoice <nni.retiarii.nn.pytorch.LayerChoice>` and
:class:`nn.InputChoice <nni.retiarii.nn.pytorch.ValueChoice>`.
:class:`nn.LayerChoice <nni.retiarii.nn.pytorch.LayerChoice>`
takes a list of candidate modules (two in this example), one will be chosen for each sampled model.
Yuge Zhang's avatar
Yuge Zhang committed
215
It can be used like normal PyTorch module.
216
217
:class:`nn.InputChoice <nni.retiarii.nn.pytorch.ValueChoice>` takes a list of candidate values,
one will be chosen to take effect for each sampled model.
Yuge Zhang's avatar
Yuge Zhang committed
218

Yuge Zhang's avatar
Yuge Zhang committed
219
More detailed API description and usage can be found :doc:`here </nas/construct_space>`.
Yuge Zhang's avatar
Yuge Zhang committed
220
221
222
223
224

.. note::

    We are actively enriching the mutation APIs, to facilitate easy construction of model space.
    If the currently supported mutation APIs cannot express your model space,
Yuge Zhang's avatar
Yuge Zhang committed
225
    please refer to :doc:`this doc </nas/mutator>` for customizing mutators.
Yuge Zhang's avatar
Yuge Zhang committed
226
227
228
229
230

Explore the Defined Model Space
-------------------------------

There are basically two exploration approaches: (1) search by evaluating each sampled model independently,
Yuge Zhang's avatar
Yuge Zhang committed
231
232
233
which is the search approach in :ref:`multi-trial NAS <multi-trial-nas>`
and (2) one-shot weight-sharing based search, which is used in one-shot NAS.
We demonstrate the first approach in this tutorial. Users can refer to :ref:`here <one-shot-nas>` for the second approach.
Yuge Zhang's avatar
Yuge Zhang committed
234
235
236
237
238
239
240

First, users need to pick a proper exploration strategy to explore the defined model space.
Second, users need to pick or customize a model evaluator to evaluate the performance of each explored model.

Pick an exploration strategy
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Yuge Zhang's avatar
Yuge Zhang committed
241
Retiarii supports many :doc:`exploration strategies </nas/exploration_strategy>`.
Yuge Zhang's avatar
Yuge Zhang committed
242
243
244

Simply choosing (i.e., instantiate) an exploration strategy as below.

245
.. GENERATED FROM PYTHON SOURCE LINES 182-186
Yuge Zhang's avatar
Yuge Zhang committed
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262

.. code-block:: default


    import nni.retiarii.strategy as strategy
    search_strategy = strategy.Random(dedup=True)  # dedup=False if deduplication is not wanted





.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

Yuge Zhang's avatar
Yuge Zhang committed
263

Yuge Zhang's avatar
Yuge Zhang committed
264
265
266
267
268
269
    /home/yugzhan/miniconda3/envs/cu102/lib/python3.8/site-packages/ray/autoscaler/_private/cli_logger.py:57: FutureWarning: Not all Ray CLI dependencies were found. In Ray 1.4+, the Ray CLI, autoscaler, and dashboard will only be usable via `pip install 'ray[default]'`. Please update your install command.
      warnings.warn(




270
.. GENERATED FROM PYTHON SOURCE LINES 187-200
Yuge Zhang's avatar
Yuge Zhang committed
271
272
273
274

Pick or customize a model evaluator
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Yuge Zhang's avatar
Yuge Zhang committed
275
276
277
In the exploration process, the exploration strategy repeatedly generates new models. A model evaluator is for training
and validating each generated model to obtain the model's performance.
The performance is sent to the exploration strategy for the strategy to generate better models.
Yuge Zhang's avatar
Yuge Zhang committed
278

Yuge Zhang's avatar
Yuge Zhang committed
279
Retiarii has provided :doc:`built-in model evaluators </nas/evaluator>`, but to start with,
280
281
282
it is recommended to use :class:`FunctionalEvaluator <nni.retiarii.evaluator.FunctionalEvaluator>`,
that is, to wrap your own training and evaluation code with one single function.
This function should receive one single model class and uses :func:`nni.report_final_result` to report the final score of this model.
Yuge Zhang's avatar
Yuge Zhang committed
283
284
285

An example here creates a simple evaluator that runs on MNIST dataset, trains for 2 epochs, and reports its validation accuracy.

286
.. GENERATED FROM PYTHON SOURCE LINES 200-268
Yuge Zhang's avatar
Yuge Zhang committed
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364

.. code-block:: default


    import nni

    from torchvision import transforms
    from torchvision.datasets import MNIST
    from torch.utils.data import DataLoader


    def train_epoch(model, device, train_loader, optimizer, epoch):
        loss_fn = torch.nn.CrossEntropyLoss()
        model.train()
        for batch_idx, (data, target) in enumerate(train_loader):
            data, target = data.to(device), target.to(device)
            optimizer.zero_grad()
            output = model(data)
            loss = loss_fn(output, target)
            loss.backward()
            optimizer.step()
            if batch_idx % 10 == 0:
                print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                    epoch, batch_idx * len(data), len(train_loader.dataset),
                    100. * batch_idx / len(train_loader), loss.item()))


    def test_epoch(model, device, test_loader):
        model.eval()
        test_loss = 0
        correct = 0
        with torch.no_grad():
            for data, target in test_loader:
                data, target = data.to(device), target.to(device)
                output = model(data)
                pred = output.argmax(dim=1, keepdim=True)
                correct += pred.eq(target.view_as(pred)).sum().item()

        test_loss /= len(test_loader.dataset)
        accuracy = 100. * correct / len(test_loader.dataset)

        print('\nTest set: Accuracy: {}/{} ({:.0f}%)\n'.format(
              correct, len(test_loader.dataset), accuracy))

        return accuracy


    def evaluate_model(model_cls):
        # "model_cls" is a class, need to instantiate
        model = model_cls()

        device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
        model.to(device)

        optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
        transf = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
        train_loader = DataLoader(MNIST('data/mnist', download=True, transform=transf), batch_size=64, shuffle=True)
        test_loader = DataLoader(MNIST('data/mnist', download=True, train=False, transform=transf), batch_size=64)

        for epoch in range(3):
            # train the model for one epoch
            train_epoch(model, device, train_loader, optimizer, epoch)
            # test the model for one epoch
            accuracy = test_epoch(model, device, test_loader)
            # call report intermediate result. Result can be float or dict
            nni.report_intermediate_result(accuracy)

        # report final test result
        nni.report_final_result(accuracy)









365
.. GENERATED FROM PYTHON SOURCE LINES 269-270
Yuge Zhang's avatar
Yuge Zhang committed
366
367
368

Create the evaluator

369
.. GENERATED FROM PYTHON SOURCE LINES 270-274
Yuge Zhang's avatar
Yuge Zhang committed
370
371
372
373
374
375
376
377
378
379
380
381
382
383

.. code-block:: default


    from nni.retiarii.evaluator import FunctionalEvaluator
    evaluator = FunctionalEvaluator(evaluate_model)








384
.. GENERATED FROM PYTHON SOURCE LINES 275-286
Yuge Zhang's avatar
Yuge Zhang committed
385

386
387
The ``train_epoch`` and ``test_epoch`` here can be any customized function,
where users can write their own training recipe.
Yuge Zhang's avatar
Yuge Zhang committed
388

389
390
It is recommended that the ``evaluate_model`` here accepts no additional arguments other than ``model_cls``.
However, in the :doc:`advanced tutorial </nas/evaluator>`, we will show how to use additional arguments in case you actually need those.
Yuge Zhang's avatar
Yuge Zhang committed
391
392
393
394
395
396
397
In future, we will support mutation on the arguments of evaluators, which is commonly called "Hyper-parmeter tuning".

Launch an Experiment
--------------------

After all the above are prepared, it is time to start an experiment to do the model search. An example is shown below.

398
.. GENERATED FROM PYTHON SOURCE LINES 287-293
Yuge Zhang's avatar
Yuge Zhang committed
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414

.. code-block:: default


    from nni.retiarii.experiment.pytorch import RetiariiExperiment, RetiariiExeConfig
    exp = RetiariiExperiment(model_space, evaluator, [], search_strategy)
    exp_config = RetiariiExeConfig('local')
    exp_config.experiment_name = 'mnist_search'








415
.. GENERATED FROM PYTHON SOURCE LINES 294-295
Yuge Zhang's avatar
Yuge Zhang committed
416
417
418

The following configurations are useful to control how many trials to run at most / at the same time.

419
.. GENERATED FROM PYTHON SOURCE LINES 295-299
Yuge Zhang's avatar
Yuge Zhang committed
420
421
422
423
424
425
426
427
428
429
430
431
432
433

.. code-block:: default


    exp_config.max_trial_number = 4   # spawn 4 trials at most
    exp_config.trial_concurrency = 2  # will run two trials concurrently








434
.. GENERATED FROM PYTHON SOURCE LINES 300-302
Yuge Zhang's avatar
Yuge Zhang committed
435
436
437
438

Remember to set the following config if you want to GPU.
``use_active_gpu`` should be set true if you wish to use an occupied GPU (possibly running a GUI).

439
.. GENERATED FROM PYTHON SOURCE LINES 302-306
Yuge Zhang's avatar
Yuge Zhang committed
440
441
442
443
444

.. code-block:: default


    exp_config.trial_gpu_number = 1
Yuge Zhang's avatar
Yuge Zhang committed
445
    exp_config.training_service.use_active_gpu = True
Yuge Zhang's avatar
Yuge Zhang committed
446
447
448
449
450
451
452
453








454
.. GENERATED FROM PYTHON SOURCE LINES 307-308
Yuge Zhang's avatar
Yuge Zhang committed
455
456
457

Launch the experiment. The experiment should take several minutes to finish on a workstation with 2 GPUs.

458
.. GENERATED FROM PYTHON SOURCE LINES 308-311
Yuge Zhang's avatar
Yuge Zhang committed
459
460
461
462
463
464
465
466
467
468
469
470
471

.. code-block:: default


    exp.run(exp_config, 8081)








472
.. GENERATED FROM PYTHON SOURCE LINES 312-330
Yuge Zhang's avatar
Yuge Zhang committed
473

474
Users can also run Retiarii Experiment with :doc:`different training services </experiment/training_service/overview>`
Yuge Zhang's avatar
Yuge Zhang committed
475
besides ``local`` training service.
Yuge Zhang's avatar
Yuge Zhang committed
476
477
478
479
480
481

Visualize the Experiment
------------------------

Users can visualize their experiment in the same way as visualizing a normal hyper-parameter tuning experiment.
For example, open ``localhost:8081`` in your browser, 8081 is the port that you set in ``exp.run``.
482
Please refer to :doc:`here </experiment/web_portal/web_portal>` for details.
Yuge Zhang's avatar
Yuge Zhang committed
483
484
485
486
487
488
489
490
491
492

We support visualizing models with 3rd-party visualization engines (like `Netron <https://netron.app/>`__).
This can be used by clicking ``Visualization`` in detail panel for each trial.
Note that current visualization is based on `onnx <https://onnx.ai/>`__ ,
thus visualization is not feasible if the model cannot be exported into onnx.

Built-in evaluators (e.g., Classification) will automatically export the model into a file.
For your own evaluator, you need to save your file into ``$NNI_OUTPUT_DIR/model.onnx`` to make this work.
For instance,

493
.. GENERATED FROM PYTHON SOURCE LINES 330-344
Yuge Zhang's avatar
Yuge Zhang committed
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517

.. code-block:: default


    import os
    from pathlib import Path


    def evaluate_model_with_visualization(model_cls):
        model = model_cls()
        # dump the model into an onnx
        if 'NNI_OUTPUT_DIR' in os.environ:
            dummy_input = torch.zeros(1, 3, 32, 32)
            torch.onnx.export(model, (dummy_input, ),
                              Path(os.environ['NNI_OUTPUT_DIR']) / 'model.onnx')
        evaluate_model(model_cls)








518
.. GENERATED FROM PYTHON SOURCE LINES 345-353
Yuge Zhang's avatar
Yuge Zhang committed
519

520
Relaunch the experiment, and a button is shown on Web portal.
Yuge Zhang's avatar
Yuge Zhang committed
521
522
523
524
525
526
527
528

.. image:: ../../img/netron_entrance_webui.png

Export Top Models
-----------------

Users can export top models after the exploration is done using ``export_top_models``.

529
.. GENERATED FROM PYTHON SOURCE LINES 353-365
Yuge Zhang's avatar
Yuge Zhang committed
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554

.. code-block:: default


    for model_dict in exp.export_top_models(formatter='dict'):
        print(model_dict)

    # The output is `json` object which records the mutation actions of the top model.
    # If users want to output source code of the top model, they can use graph-based execution engine for the experiment,
    # by simply adding the following two lines.
    #
    # .. code-block:: python
    #
    #   exp_config.execution_engine = 'base'
    #   export_formatter = 'code'




.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

555
    {'model_1': '0', 'model_2': 0.75, 'model_3': 128}
Yuge Zhang's avatar
Yuge Zhang committed
556
557
558
559
560
561
562





.. rst-class:: sphx-glr-timing

563
   **Total running time of the script:** ( 2 minutes  15.810 seconds)
Yuge Zhang's avatar
Yuge Zhang committed
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591


.. _sphx_glr_download_tutorials_hello_nas.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example



  .. container:: sphx-glr-download sphx-glr-download-python

     :download:`Download Python source code: hello_nas.py <hello_nas.py>`



  .. container:: sphx-glr-download sphx-glr-download-jupyter

     :download:`Download Jupyter notebook: hello_nas.ipynb <hello_nas.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_