Unverified Commit b7dfc7cf authored by QuanluZhang's avatar QuanluZhang Committed by GitHub
Browse files

fix broken links (#4761)

parent fb62e80f
...@@ -6,7 +6,7 @@ An experiment can be created with command line tool ``nnictl`` or python APIs. N ...@@ -6,7 +6,7 @@ An experiment can be created with command line tool ``nnictl`` or python APIs. N
Management with ``nnictl`` Management with ``nnictl``
-------------------------- --------------------------
The ability of ``nnictl`` on experiment management is almost equivalent to :doc:`web_portal/web_portal`. Users can refer to :doc:`../reference/nnictl` for detailed usage. It is highly suggested when visualization is not well supported in your environment (e.g., no GUI on your machine). The ability of ``nnictl`` on experiment management is almost equivalent to :doc:`web_portal/web_portal`. Users can refer to :doc:`../reference/nnictl` for detailed usage. It is highly suggested when visualization is not well supported in your environment (e.g., web browser is not supported in your environment).
Management with web portal Management with web portal
-------------------------- --------------------------
......
...@@ -37,7 +37,7 @@ Verify the Prerequisites ...@@ -37,7 +37,7 @@ Verify the Prerequisites
Usage Usage
----- -----
We have a CIFAR10 example that fully leverages the AdaptDL scheduler under :githublink:`examples/trials/cifar10_pytorch` folder. (:githublink:`main_adl.py <examples/trials/cifar10_pytorch/main_adl.py>` and :githublink:`config_adl.yaml <examples/trials/cifar10_pytorch/config_adl.yaml>`) We have a CIFAR10 example that fully leverages the AdaptDL scheduler under :githublink:`examples/trials/cifar10_pytorch` folder. (:githublink:`main_adl.py <examples/trials/cifar10_pytorch/main_adl.py>` and :githublink:`config_adl.yaml <examples/trials/cifar10_pytorch/config_adl.yml>`)
Here is a template configuration specification to use AdaptDL as a training service. Here is a template configuration specification to use AdaptDL as a training service.
......
...@@ -15,7 +15,7 @@ System architecture ...@@ -15,7 +15,7 @@ System architecture
:alt: :alt:
The brief system architecture of NNI is shown in the picture. NNIManager is the core management module of system, in charge of calling TrainingService to manage trial jobs and the communication between different modules. Dispatcher is a message processing center responsible for message dispatch. TrainingService is a module to manage trial jobs, it communicates with nniManager module, and has different instance according to different training platform. For the time being, NNI supports `local platfrom <LocalMode.rst>`__\ , `remote platfrom <RemoteMachineMode.rst>`__\ , `PAI platfrom <PaiMode.rst>`__\ , `kubeflow platform <KubeflowMode.rst>`__ and `FrameworkController platfrom <FrameworkControllerMode.rst>`__. The brief system architecture of NNI is shown in the picture. NNIManager is the core management module of system, in charge of calling TrainingService to manage trial jobs and the communication between different modules. Dispatcher is a message processing center responsible for message dispatch. TrainingService is a module to manage trial jobs, it communicates with nniManager module, and has different instance according to different training platform. For the time being, NNI supports :doc:`./local`, :doc:`./remote`, :doc:`./openpai`, :doc:`./kubeflow` and :doc:`./frameworkcontroller`.
In this document, we introduce the brief design of TrainingService. If users want to add a new TrainingService instance, they just need to complete a child class to implement TrainingService, don't need to understand the code detail of NNIManager, Dispatcher or other modules. In this document, we introduce the brief design of TrainingService. If users want to add a new TrainingService instance, they just need to complete a child class to implement TrainingService, don't need to understand the code detail of NNIManager, Dispatcher or other modules.
...@@ -185,6 +185,4 @@ When users submit a trial job to cloud platform, they should wrap their trial co ...@@ -185,6 +185,4 @@ When users submit a trial job to cloud platform, they should wrap their trial co
Reference Reference
--------- ---------
For more information about how to debug, please `refer <../Tutorial/HowToDebug.rst>`__. The guideline of how to contribute, please refer to :doc:`/notes/contributing`.
The guideline of how to contribute, please `refer <../Tutorial/Contributing.rst>`__.
...@@ -60,9 +60,7 @@ Follow the `guideline <https://github.com/Microsoft/frameworkcontroller/tree/mas ...@@ -60,9 +60,7 @@ Follow the `guideline <https://github.com/Microsoft/frameworkcontroller/tree/mas
to set up FrameworkController in the Kubernetes cluster, NNI supports FrameworkController by the stateful set mode. to set up FrameworkController in the Kubernetes cluster, NNI supports FrameworkController by the stateful set mode.
If your cluster enforces authorization, you need to create a service account with granted permission for FrameworkController, If your cluster enforces authorization, you need to create a service account with granted permission for FrameworkController,
and then pass the name of the FrameworkController service account to the NNI Experiment Config. and then pass the name of the FrameworkController service account to the NNI Experiment Config.
`refer <https://github.com/Microsoft/frameworkcontroller/tree/master/example/run#run-by-kubernetes-statefulset>`__. If the k8s cluster enforces Authorization, you also need to create a ServiceAccount with granted permission for FrameworkController.
If the k8s cluster enforces Authorization, you also need to create a ServiceAccount with granted permission for FrameworkController,
`refer <https://github.com/microsoft/frameworkcontroller/tree/master/example/run#prerequisite>`__.
Design Design
------ ------
......
...@@ -6,7 +6,7 @@ NNI supports running an experiment on `OpenPAI <https://github.com/Microsoft/pai ...@@ -6,7 +6,7 @@ NNI supports running an experiment on `OpenPAI <https://github.com/Microsoft/pai
Prerequisite Prerequisite
------------ ------------
1. Before starting to use OpenPAI training service, you should have an account to access an `OpenPAI <https://github.com/Microsoft/pai>`__ cluster. See `here <https://github.com/Microsoft/pai#how-to-deploy>`__ if you don't have any OpenPAI account and want to deploy an OpenPAI cluster. Please note that, on OpenPAI, your trial program will run in Docker containers. 1. Before starting to use OpenPAI training service, you should have an account to access an `OpenPAI <https://github.com/Microsoft/pai>`__ cluster. See `here <https://github.com/Microsoft/pai>`__ if you don't have any OpenPAI account and want to deploy an OpenPAI cluster. Please note that, on OpenPAI, your trial program will run in Docker containers.
2. Get token. Open web portal of OpenPAI, and click ``My profile`` button in the top-right side. 2. Get token. Open web portal of OpenPAI, and click ``My profile`` button in the top-right side.
......
...@@ -12,7 +12,7 @@ NNI has supported many training services listed below. Users can go through each ...@@ -12,7 +12,7 @@ NNI has supported many training services listed below. Users can go through each
* - Local * - Local
- The whole experiment runs on your dev machine (i.e., a single local machine) - The whole experiment runs on your dev machine (i.e., a single local machine)
* - Remote * - Remote
- The trials are dispatched to your configured remote servers - The trials are dispatched to your configured SSH servers
* - OpenPAI * - OpenPAI
- Running trials on OpenPAI, a DNN model training platform based on Kubernetes - Running trials on OpenPAI, a DNN model training platform based on Kubernetes
* - Kubeflow * - Kubeflow
...@@ -22,7 +22,7 @@ NNI has supported many training services listed below. Users can go through each ...@@ -22,7 +22,7 @@ NNI has supported many training services listed below. Users can go through each
* - FrameworkController * - FrameworkController
- Running trials with FrameworkController, a DNN model training framework on Kubernetes - Running trials with FrameworkController, a DNN model training framework on Kubernetes
* - AML * - AML
- Running trials on AML cloud service - Running trials on Azure Machine Learning (AML) cloud service
* - PAI-DLC * - PAI-DLC
- Running trials on PAI-DLC, which is deep learning containers based on Alibaba ACK - Running trials on PAI-DLC, which is deep learning containers based on Alibaba ACK
* - Hybrid * - Hybrid
......
...@@ -8,7 +8,7 @@ PAI-DSW server performs the role to submit a job while PAI-DLC is where the trai ...@@ -8,7 +8,7 @@ PAI-DSW server performs the role to submit a job while PAI-DLC is where the trai
Prerequisite Prerequisite
------------ ------------
Step 1. Install NNI, follow the install guide `here <../Tutorial/QuickStart.rst>`__. Step 1. Install NNI, follow the :doc:`install guide </installation>`.
Step 2. Create PAI-DSW server following this `link <https://help.aliyun.com/document_detail/163684.html?section-2cw-lsi-es9#title-ji9-re9-88x>`__. Note as the training service will be run on PAI-DLC, it won't cost many resources to run and you may just need a PAI-DSW server with CPU. Step 2. Create PAI-DSW server following this `link <https://help.aliyun.com/document_detail/163684.html?section-2cw-lsi-es9#title-ji9-re9-88x>`__. Note as the training service will be run on PAI-DLC, it won't cost many resources to run and you may just need a PAI-DSW server with CPU.
......
...@@ -21,18 +21,18 @@ In addition, there are several steps for Windows server. ...@@ -21,18 +21,18 @@ In addition, there are several steps for Windows server.
1. Install and start ``OpenSSH Server``. 1. Install and start ``OpenSSH Server``.
1) Open ``Settings`` app on Windows. 1) Open ``Settings`` app on Windows.
2) Click ``Apps``\ , then click ``Optional features``. 2) Click ``Apps``\ , then click ``Optional features``.
3) Click ``Add a feature``\ , search and select ``OpenSSH Server``\ , and then click ``Install``. 3) Click ``Add a feature``\ , search and select ``OpenSSH Server``\ , and then click ``Install``.
4) Once it's installed, run below command to start and set to automatic start. 4) Once it's installed, run below command to start and set to automatic start.
.. code-block:: bat .. code-block:: bat
sc config sshd start=auto sc config sshd start=auto
net start sshd net start sshd
2. Make sure remote account is administrator, so that it can stop running trials. 2. Make sure remote account is administrator, so that it can stop running trials.
...@@ -85,7 +85,7 @@ You can run below command on Windows, Linux, or macOS to spawn trials on remote ...@@ -85,7 +85,7 @@ You can run below command on Windows, Linux, or macOS to spawn trials on remote
.. _nniignore: .. _nniignore:
.. Note:: If you are planning to use remote machines or clusters as your training service, to avoid too much pressure on network, NNI limits the number of files to 2000 and total size to 300MB. If your codeDir contains too many files, you can choose which files and subfolders should be excluded by adding a ``.nniignore`` file that works like a ``.gitignore`` file. For more details on how to write this file, see the `git documentation <https://git-scm.com/docs/gitignore#_pattern_format>`__. .. Note:: If you are planning to use remote machines or clusters as your training service, to avoid too much pressure on network, NNI limits the number of files to 2000 and total size to 300MB. If your trial code directory contains too many files, you can choose which files and subfolders should be excluded by adding a ``.nniignore`` file that works like a ``.gitignore`` file. For more details on how to write this file, see the `git documentation <https://git-scm.com/docs/gitignore#_pattern_format>`__.
*Example:* :githublink:`config_detailed.yml <examples/trials/mnist-pytorch/config_detailed.yml>` and :githublink:`.nniignore <examples/trials/mnist-pytorch/.nniignore>` *Example:* :githublink:`config_detailed.yml <examples/trials/mnist-pytorch/config_detailed.yml>` and :githublink:`.nniignore <examples/trials/mnist-pytorch/.nniignore>`
...@@ -111,4 +111,4 @@ Remote training service support shared storage, which can help use your own stor ...@@ -111,4 +111,4 @@ Remote training service support shared storage, which can help use your own stor
Monitor via TensorBoard Monitor via TensorBoard
^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^
Remote training service support trial visualization via TensorBoard. Follow the guide `here <./tensorboard.rst>`__ to learn how to use TensorBoard. Remote training service support trial visualization via TensorBoard. Follow the guide :doc:`/experiment/web_portal/tensorboard` to learn how to use TensorBoard.
...@@ -7,7 +7,7 @@ All the information generated by the experiment will be stored under ``/nni`` fo ...@@ -7,7 +7,7 @@ All the information generated by the experiment will be stored under ``/nni`` fo
All the output produced by the trial will be located under ``/nni/{EXPERIMENT_ID}/trials/{TRIAL_ID}/nnioutput`` folder in your shared storage. All the output produced by the trial will be located under ``/nni/{EXPERIMENT_ID}/trials/{TRIAL_ID}/nnioutput`` folder in your shared storage.
This saves you from finding for experiment-related information in various places. This saves you from finding for experiment-related information in various places.
Remember that your trial working directory is ``/nni/{EXPERIMENT_ID}/trials/{TRIAL_ID}``, so if you upload your data in this shared storage, you can open it like a local file in your trial code without downloading it. Remember that your trial working directory is ``/nni/{EXPERIMENT_ID}/trials/{TRIAL_ID}``, so if you upload your data in this shared storage, you can open it like a local file in your trial code without downloading it.
And we will develop more practical features in the future based on shared storage. The config reference can be found `here <../reference/experiment_config.html#sharedstorageconfig>`_. And we will develop more practical features in the future based on shared storage. The config reference can be found :ref:`here <reference-sharedstorage-config-label>`.
.. note:: .. note::
Shared storage is currently in the experimental stage. We suggest use AzureBlob under Ubuntu/CentOS/RHEL, and NFS under Ubuntu/CentOS/RHEL/Fedora/Debian for remote. Shared storage is currently in the experimental stage. We suggest use AzureBlob under Ubuntu/CentOS/RHEL, and NFS under Ubuntu/CentOS/RHEL/Fedora/Debian for remote.
......
...@@ -608,6 +608,8 @@ HybridConfig ...@@ -608,6 +608,8 @@ HybridConfig
Currently only support `LocalConfig`_, `RemoteConfig`_, `OpenpaiConfig`_ and `AmlConfig`_ . Detailed usage can be found :doc:`here </experiment/training_service/hybrid>`. Currently only support `LocalConfig`_, `RemoteConfig`_, `OpenpaiConfig`_ and `AmlConfig`_ . Detailed usage can be found :doc:`here </experiment/training_service/hybrid>`.
.. _reference-sharedstorage-config-label:
SharedStorageConfig SharedStorageConfig
^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment