aml.rst 3.78 KB
Newer Older
1
2
3
AML Training Service
====================

4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
To run your trials on `AzureML <https://azure.microsoft.com/en-us/services/machine-learning/>`__, you can use AML training service. AML training service can programmatically submit runs to AzureML platform and collect their metrics.

Prerequisite
------------

1. Create an Azure account/subscription using this `link <https://azure.microsoft.com/en-us/free/services/machine-learning/>`__. If you already have an Azure account/subscription, skip this step.
2. Install the Azure CLI on your machine, follow the install guide `here <https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest>`__.
3. Authenticate to your Azure subscription from the CLI. To authenticate interactively, open a command line or terminal and use the following command:

   .. code-block:: bash

      az login

4. Log into your Azure account with a web browser and create a Machine Learning resource. You will need to choose a resource group and specific a workspace name. Then download ``config.json`` which will be used later.

19
   .. image:: ../../../img/aml_workspace.png
20
21
22

5. Create an AML cluster as the compute target.

23
   .. image:: ../../../img/aml_cluster.png
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79

6. Open a command line and install AML package environment.

   .. code-block:: bash

      python3 -m pip install azureml
      python3 -m pip install azureml-sdk

Usage
-----

We show an example configuration here with YAML (Python configuration should be similar).

.. code-block:: yaml

   trialConcurrency: 1
   maxTrialNumber: 10
   ...
   trainingService:
     platform: aml
     dockerImage: msranni/nni
     subscriptionId: ${your subscription ID}
     resourceGroup: ${your resource group}
     workspaceName: ${your workspace name}
     computeTarget: ${your compute target}

Configuration References
------------------------

Compared with :doc:`local` and :doc:`remote`, OpenPAI training service supports the following additional configurations.

.. list-table::
   :header-rows: 1
   :widths: auto

   * - Field name
     - Description
   * - dockerImage
     - Required field. The docker image name used in job. If you don't want to build your own, NNI has provided a docker image `msranni/nni <https://hub.docker.com/r/msranni/nni>`__, which is up-to-date with every NNI release.
   * - subscriptionId
     - Required field. The subscription id of your account, can be found in ``config.json`` described above.
   * - resourceGroup
     - Required field. The resource group of your account, can be found in ``config.json`` described above.
   * - workspaceName
     - Required field. The workspace name of your account, can be found in ``config.json`` described above.
   * - computeTarget
     - Required field. The compute cluster name you want to use in your AML workspace. See `reference <https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-target>`__ and Step 5 above.
   * - maxTrialNumberPerGpu
     - Optional field. Default 1. Used to specify the max concurrency trial number on a GPU device.
   * - useActiveGpu
     - Optional field. Default false. Used to specify whether to use a GPU if there is another process. By default, NNI will use the GPU only if there is no other active process in the GPU. See :doc:`local` for details.

Monitor your trial on the cloud by using AML studio
---------------------------------------------------

To see your trial job's detailed status on the cloud, you need to visit your studio which you create at Step 5 above. Once the job completes, go to the **Outputs + logs** tab. There you can see a ``70_driver_log.txt`` file, This file contains the standard output from a run and can be useful when you're debugging remote runs in the cloud. Learn more about aml from `here <https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-1st-experiment-hello-world>`__.