Commit 7fc1c51c authored by SparkSnail's avatar SparkSnail Committed by fishyds
Browse files

Update document v0.4 (#437)

move nnictl folder
delete kubernetsServer in nnictl
refactor aks document
add warning information to expand relative path
update experiment status when the experiment crashed.
parent 4537e633
......@@ -150,7 +150,7 @@ machineList:
* __pai__ submit trial jobs to [OpenPai](https://github.com/Microsoft/pai) of Microsoft. For more details of pai configuration, please reference [PAIMOdeDoc](./PAIMode.md)
* __kubeflow__ submit trial jobs to [kubeflow](https://www.kubeflow.org/docs/about/kubeflow/), nni support kubeflow based on normal kubernets and [azure kubernets](https://azure.microsoft.com/en-us/services/kubernetes-service/).
* __kubeflow__ submit trial jobs to [kubeflow](https://www.kubeflow.org/docs/about/kubeflow/), nni support kubeflow based on normal kubernetes and [azure kubernetes](https://azure.microsoft.com/en-us/services/kubernetes-service/).
* __searchSpacePath__
* Description
......@@ -377,13 +377,9 @@ machineList:
__path__ is the mounted path of nfs
* __kubernetsServer__
__kubernetsServer__ set the host of kubernets service.
* __keyVault__
If users want to use azure kubernets service, they should set keyVault to storage the private key of your azure storage account. Refer: https://docs.microsoft.com/en-us/azure/key-vault/key-vault-manage-with-cli2
If users want to use azure kubernetes service, they should set keyVault to storage the private key of your azure storage account. Refer: https://docs.microsoft.com/en-us/azure/key-vault/key-vault-manage-with-cli2
* __vaultName__
......@@ -393,6 +389,18 @@ machineList:
__name__ is the value of ```--name``` used in az command.
* __azureStorage__
If users use azure kubernetes service, they should set azure storage account to store code files.
* __accountName__
__accountName__ is the name of azure storage account.
* __azureShare__
__azureShare__ is the share of the azure file storage.
* __paiConfig__
* __userName__
......@@ -407,18 +415,6 @@ machineList:
__host__ is the host of pai.
* __azureStorage__
If users use azure kubernets service, they should set azure storage account to store code files.
* __accountName__
__accountName__ is the name of azure storage account.
* __azureShare__
__azureShare__ is the share of the azure file storage.
......
......@@ -2,7 +2,7 @@
===
Now NNI supports running experiment on [Kubeflow](https://github.com/kubeflow/kubeflow), called kubeflow mode. Before starting to use NNI kubeflow mode, you should have a kubernetes cluster, either on-prem or [Azure Kubernetes Service(AKS)](https://azure.microsoft.com/en-us/services/kubernetes-service/), a Ubuntu machine on which [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) is installed and configured to connect to your kubernetes cluster. If you are not familiar with kubernetes, [here](https://kubernetes.io/docs/tutorials/kubernetes-basics/) is a goot start. In kubeflow mode, your trial program will run as kubeflow job in kubernetes cluster.
## Prerequisite
## Prerequisite for on-premises Kubernetes Service
1. A **Kubernetes** cluster using Kubernetes 1.8 or later. Follow this [guideline](https://kubernetes.io/docs/setup/) to set up Kubernetes
2. Download, set up, and deploy **Kubelow** to your Kubernetes cluster. Follow this [guideline](https://www.kubeflow.org/docs/started/getting-started/) to set up Kubeflow
3. Install **kubectl**, and configure to connect to your Kubernetes API server. Follow this [guideline](https://kubernetes.io/docs/tasks/tools/install-kubectl/) to install kubectl on Ubuntu
......@@ -15,13 +15,12 @@ Now NNI supports running experiment on [Kubeflow](https://github.com/kubeflow/ku
7. Install **NNI**, follow the install guide [here](GetStarted.md).
## Prerequisite for Azure Kubernets Service
1. NNI support kubeflow based on Azure Kubernets Service, follow the [guideline](https://azure.microsoft.com/en-us/services/kubernetes-service/) to set up Azure Kubernets Service.
2. Deploy kubeflow on Azure Kubernets Service.
3. Install __kubectl__ and [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest). Connect kubectl client to Azure K8S, and use `az login` to set azure account.
4. Follow the [guideline](https://docs.microsoft.com/en-us/azure/storage/common/storage-quickstart-create-account?tabs=portal) to create azure file storage account. If you use Azure Kubernets Service, nni need Azure Storage Service to store code files and the output files.
5. Set up Azure Key Vault Service, add a secret to Key Vault
to store the private key of Azure account.
## Prerequisite for Azure Kubernetes Service
1. NNI support kubeflow based on Azure Kubernetes Service, follow the [guideline](https://azure.microsoft.com/en-us/services/kubernetes-service/) to set up Azure Kubernetes Service.
2. Install [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) and __kubectl__. Use `az login` to set azure account, and connect kubectl client to AKS, [refer](https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough#connect-to-the-cluster).
3. Deploy kubeflow on Azure Kubernetes Service, follow the [guideline](https://www.kubeflow.org/docs/started/getting-started/).
4. Follow the [guideline](https://docs.microsoft.com/en-us/azure/storage/common/storage-quickstart-create-account?tabs=portal) to create azure file storage account. If you use Azure Kubernetes Service, nni need Azure Storage Service to store code files and the output files.
5. To access Azure storage service, nni need the access key of the storage account, and nni use [Azure Key Vault](https://azure.microsoft.com/en-us/services/key-vault/) Service to protect your private key. Set up Azure Key Vault Service, add a secret to Key Vault to store the access key of Azure storage account. Follow this [guideline](https://docs.microsoft.com/en-us/azure/key-vault/quick-create-cli) to store the access key.
## Design
TODO
......@@ -68,7 +67,7 @@ kubeflowConfig:
server: {your_nfs_server}
path: {your_nfs_server_exported_path}
```
If you use Azure Kubernets Service, you should set `kubeflowConfig` in your config yaml file as follows:
If you use Azure Kubernetes Service, you should set `kubeflowConfig` in your config yaml file as follows:
```
kubeflowConfig:
operator: tf-operator
......
......@@ -20,7 +20,7 @@
import os
NNICTL_HOME_DIR = os.path.join(os.environ['HOME'], '.local', 'nni', 'nnictl')
NNICTL_HOME_DIR = os.path.join(os.environ['HOME'], '.local', 'nnictl')
ERROR_INFO = 'ERROR: %s'
......
......@@ -21,7 +21,7 @@
import os
import json
from .config_schema import LOCAL_CONFIG_SCHEMA, REMOTE_CONFIG_SCHEMA, PAI_CONFIG_SCHEMA, KUBEFLOW_CONFIG_SCHEMA
from .common_utils import get_json_content, print_error
from .common_utils import get_json_content, print_error, print_warning
def expand_path(experiment_config, key):
'''Change '~' to user home directory'''
......@@ -31,7 +31,10 @@ def expand_path(experiment_config, key):
def parse_relative_path(root_path, experiment_config, key):
'''Change relative path to absolute path'''
if experiment_config.get(key) and not os.path.isabs(experiment_config.get(key)):
experiment_config[key] = os.path.join(root_path, experiment_config.get(key))
absolute_path = os.path.join(root_path, experiment_config.get(key))
print_warning('expand %s: %s to %s ' % (key, experiment_config[key], absolute_path))
experiment_config[key] = absolute_path
def parse_time(experiment_config):
'''Parse time format'''
......
......@@ -31,9 +31,24 @@ from .constants import NNICTL_HOME_DIR, EXPERIMENT_INFORMATION_FORMAT, EXPERIMEN
import time
from .common_utils import print_normal, print_error, print_warning, detect_process
def update_experiment_status():
'''Update the experiment status in config file'''
experiment_config = Experiments()
experiment_dict = experiment_config.get_all_experiments()
if not experiment_dict:
return None
for key in experiment_dict.keys():
if isinstance(experiment_dict[key], dict):
if experiment_dict[key].get('status') == 'running':
nni_config = Config(experiment_dict[key]['fileName'])
rest_pid = nni_config.get_config('restServerPid')
if not detect_process(rest_pid):
experiment_config.update_experiment(key, 'status', 'stopped')
def check_experiment_id(args):
'''check if the id is valid
'''
update_experiment_status()
experiment_config = Experiments()
experiment_dict = experiment_config.get_all_experiments()
if not experiment_dict:
......@@ -76,6 +91,7 @@ def parse_ids(args):
5.If the id does not exist but match the prefix of an experiment id, nnictl will return the matched id
6.If the id does not exist but match multiple prefix of the experiment ids, nnictl will give id information
'''
update_experiment_status()
experiment_config = Experiments()
experiment_dict = experiment_config.get_all_experiments()
if not experiment_dict:
......@@ -175,11 +191,6 @@ def stop_experiment(args):
nni_config = Config(experiment_dict[experiment_id]['fileName'])
rest_port = nni_config.get_config('restServerPort')
rest_pid = nni_config.get_config('restServerPid')
if not detect_process(rest_pid):
print_normal('Experiment is not running...')
experiment_config.update_experiment(experiment_id, 'status', 'stopped')
return
rest_pid = nni_config.get_config('restServerPid')
if rest_pid:
stop_rest_cmds = ['kill', str(rest_pid)]
call(stop_rest_cmds)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment