Merge pull request #234 from microsoft/master

merge master

Merge pull request #234 from microsoft/master
merge master
1d74ae5e · SparkSnail · GitHub · aa316742 · ff2728c0 · 1d74ae5e
Unverified Commit 1d74ae5e authored Feb 27, 2020 by SparkSnail Committed by GitHub Feb 27, 2020
18 changed files
--- a/README.md
+++ b/README.md
@@ -124,10 +124,12 @@ Within the following table, we summarized the current NNI capabilities, we are g
          <a href="docs/en_US/NAS/Overview.md">Neural Architecture Search</a>
          <ul>                        
            <ul>
-              <li><a href="docs/en_US/NAS/Overview.md#enas">ENAS</a></li>
-              <li><a href="docs/en_US/NAS/Overview.md#darts">DARTS</a></li>
-              <li><a href="docs/en_US/NAS/Overview.md#p-darts">P-DARTS</a></li>
-              <li><a href="docs/en_US/NAS/Overview.md#cdarts">CDARTS</a></li>
+              <li><a href="docs/en_US/NAS/ENAS.md">ENAS</a></li>
+              <li><a href="docs/en_US/NAS/DARTS.md">DARTS</a></li>
+              <li><a href="docs/en_US/NAS/PDARTS.md">P-DARTS</a></li>
+              <li><a href="docs/en_US/NAS/CDARTS.md">CDARTS</a></li>
+              <li><a href="docs/en_US/NAS/SPOS.md">SPOS</a></li>
+              <li><a href="docs/en_US/NAS/Proxylessnas.md">ProxylessNAS</a></li>
              <li><a href="docs/en_US/Tuner/BuiltinTuner.md#NetworkMorphism">Network Morphism</a> </li>
            </ul>    
          </ul>
@@ -224,7 +226,7 @@ Note:

 * If there is any privilege issue, add `--user` to install NNI in the user directory.
 * Currently NNI on Windows supports local, remote and pai mode. Anaconda or Miniconda is highly recommended to install NNI on Windows.
-* If there is any error like `Segmentation fault`, please refer to [FAQ](docs/en_US/Tutorial/FAQ.md). For FAQ on Windows, please refer to [NNI on Windows](docs/en_US/Tutorial/NniOnWindows.md).
+* If there is any error like `Segmentation fault`, please refer to [FAQ](docs/en_US/Tutorial/FAQ.md). For FAQ on Windows, please refer to [NNI on Windows](docs/en_US/Tutorial/InstallationWin.md#faq).

 ### **Verify installation**

@@ -288,7 +290,7 @@ You can use these commands to get more information about the experiment
 ## **Documentation**
 * To learn about what's NNI, read the [NNI Overview](https://nni.readthedocs.io/en/latest/Overview.html). 
 * To get yourself familiar with how to use NNI, read the [documentation](https://nni.readthedocs.io/en/latest/index.html). 
-* To get started and install NNI on your system, please refer to [Install NNI](docs/en_US/Tutorial/Installation.md).
+* To get started and install NNI on your system, please refer to [Install NNI](https://nni.readthedocs.io/en/latest/installation.html).

 ## **Contributing**
 This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.
@@ -304,7 +306,7 @@ After getting familiar with contribution agreements, you are ready to create you
 * If you have any questions on usage, review [FAQ](https://github.com/microsoft/nni/blob/master/docs/en_US/Tutorial/FAQ.md) first, if there are no relevant issues and answers to your question, try contact NNI dev team and users in [Gitter](https://gitter.im/Microsoft/nni?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) or [File an issue](https://github.com/microsoft/nni/issues/new/choose) on GitHub.
 * [Customize your own Tuner](docs/en_US/Tuner/CustomizeTuner.md)
 * [Implement customized TrainingService](docs/en_US/TrainingService/HowToImplementTrainingService.md)
-* [Implement a new NAS trainer on NNI](https://github.com/microsoft/nni/blob/master/docs/en_US/NAS/NasInterface.md#implement-a-new-nas-trainer-on-nni)
+* [Implement a new NAS trainer on NNI](docs/en_US/NAS/Advanced.md)
 * [Customize your own Advisor](docs/en_US/Tuner/CustomizeAdvisor.md)

 ## **External Repositories and References**

--- a/docs/en_US/Compressor/ModelSpeedup.md
+++ b/docs/en_US/Compressor/ModelSpeedup.md
@@ -14,7 +14,7 @@ There are two types of pruning. One is fine-grained pruning, it does not change

 ## Design and Implementation

-To speed up a model, the pruned layers should be replaced, either replaced with smaller layer for coarse-grained mask, or replaced with sparse kernel for fine-grained mask. Coarse-grained mask usually changes the shape of weights or input/output tensors, thus, we should do shape inference to check are there other unpruned layers should be replaced as well due to shape change. Therefore, in our design, there are two main steps: first, do shape inference to find out all the modules that should be replaced; second, replace the modules. The first step requires topology (i.e., connections) of the model, we use `jit.trace` to obtain the model grpah for PyTorch.
+To speed up a model, the pruned layers should be replaced, either replaced with smaller layer for coarse-grained mask, or replaced with sparse kernel for fine-grained mask. Coarse-grained mask usually changes the shape of weights or input/output tensors, thus, we should do shape inference to check are there other unpruned layers should be replaced as well due to shape change. Therefore, in our design, there are two main steps: first, do shape inference to find out all the modules that should be replaced; second, replace the modules. The first step requires topology (i.e., connections) of the model, we use `jit.trace` to obtain the model graph for PyTorch.

 For each module, we should prepare four functions, three for shape inference and one for module replacement. The three shape inference functions are: given weight shape infer input/output shape, given input shape infer weight/output shape, given output shape infer weight/input shape. The module replacement function returns a newly created module which is smaller.


--- a/examples/model_compress/APoZ_torch_cifar10.py
+++ b/examples/model_compress/APoZ_torch_cifar10.py
@@ -41,7 +41,7 @@ def test(model, device, test_loader):

 def main():
    torch.manual_seed(0)
-    device = torch.device('cuda')
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    train_loader = torch.utils.data.DataLoader(
        datasets.CIFAR10('./data.cifar10', train=True, download=True,
                         transform=transforms.Compose([

--- a/examples/model_compress/BNN_quantizer_cifar10.py
+++ b/examples/model_compress/BNN_quantizer_cifar10.py
@@ -105,7 +105,7 @@ def adjust_learning_rate(optimizer, epoch):

 def main():
    torch.manual_seed(0)
-    device = torch.device('cuda')
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    train_loader = torch.utils.data.DataLoader(
        datasets.CIFAR10('./data.cifar10', train=True, download=True,
                         transform=transforms.Compose([

--- a/examples/model_compress/DoReFaQuantizer_torch_mnist.py
+++ b/examples/model_compress/DoReFaQuantizer_torch_mnist.py
+import torch
+import torch.nn.functional as F
+from torchvision import datasets, transforms
+from nni.compression.torch import DoReFaQuantizer
+
+
+class Mnist(torch.nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.conv1 = torch.nn.Conv2d(1, 20, 5, 1)
+        self.conv2 = torch.nn.Conv2d(20, 50, 5, 1)
+        self.fc1 = torch.nn.Linear(4 * 4 * 50, 500)
+        self.fc2 = torch.nn.Linear(500, 10)
+        self.relu1 = torch.nn.ReLU6()
+        self.relu2 = torch.nn.ReLU6()
+        self.relu3 = torch.nn.ReLU6()
+
+    def forward(self, x):
+        x = self.relu1(self.conv1(x))
+        x = F.max_pool2d(x, 2, 2)
+        x = self.relu2(self.conv2(x))
+        x = F.max_pool2d(x, 2, 2)
+        x = x.view(-1, 4 * 4 * 50)
+        x = self.relu3(self.fc1(x))
+        x = self.fc2(x)
+        return F.log_softmax(x, dim=1)
+
+
+def train(model, quantizer, device, train_loader, optimizer):
+    model.train()
+    for batch_idx, (data, target) in enumerate(train_loader):
+        data, target = data.to(device), target.to(device)
+        optimizer.zero_grad()
+        output = model(data)
+        loss = F.nll_loss(output, target)
+        loss.backward()
+        optimizer.step()
+        if batch_idx % 100 == 0:
+            print('{:2.0f}%  Loss {}'.format(100 * batch_idx / len(train_loader), loss.item()))
+
+def test(model, device, test_loader):
+    model.eval()
+    test_loss = 0
+    correct = 0
+    with torch.no_grad():
+        for data, target in test_loader:
+            data, target = data.to(device), target.to(device)
+            output = model(data)
+            test_loss += F.nll_loss(output, target, reduction='sum').item()
+            pred = output.argmax(dim=1, keepdim=True)
+            correct += pred.eq(target.view_as(pred)).sum().item()
+    test_loss /= len(test_loader.dataset)
+
+    print('Loss: {}  Accuracy: {}%)\n'.format(
+        test_loss, 100 * correct / len(test_loader.dataset)))
+
+def main():
+    torch.manual_seed(0)
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+
+    trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
+    train_loader = torch.utils.data.DataLoader(
+        datasets.MNIST('data', train=True, download=True, transform=trans),
+        batch_size=64, shuffle=True)
+    test_loader = torch.utils.data.DataLoader(
+        datasets.MNIST('data', train=False, transform=trans),
+        batch_size=1000, shuffle=True)
+
+    model = Mnist()
+    model = model.to(device)
+    configure_list = [{
+        'quant_types': ['weight'],
+        'quant_bits': {
+            'weight': 8,
+        }, # you can just use `int` here because all `quan_types` share same bits length, see config for `ReLu6` below.
+        'op_types':['Conv2d', 'Linear']
+    }]
+    quantizer = DoReFaQuantizer(model, configure_list)
+    quantizer.compress()
+
+    optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.5)
+    for epoch in range(10):
+        print('# Epoch {} #'.format(epoch))
+        train(model, quantizer, device, train_loader, optimizer)
+        test(model, device, test_loader)
+
+
+if __name__ == '__main__':
+    main()
\ No newline at end of file
--- a/examples/model_compress/L1_torch_cifar10.py
+++ b/examples/model_compress/L1_torch_cifar10.py
@@ -41,7 +41,7 @@ def test(model, device, test_loader):

 def main():
    torch.manual_seed(0)
-    device = torch.device('cuda')
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    train_loader = torch.utils.data.DataLoader(
        datasets.CIFAR10('./data.cifar10', train=True, download=True,
                         transform=transforms.Compose([

--- a/examples/model_compress/MeanActivation_torch_cifar10.py
+++ b/examples/model_compress/MeanActivation_torch_cifar10.py
 import math
+import os
 import argparse
 import torch
 import torch.nn as nn
@@ -48,7 +49,7 @@ def main():

    args = parser.parse_args()
    torch.manual_seed(0)
-    device = torch.device('cuda')
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    train_loader = torch.utils.data.DataLoader(
        datasets.CIFAR10('./data.cifar10', train=True, download=True,
                         transform=transforms.Compose([
@@ -79,10 +80,11 @@ def main():
            test(model, device, test_loader)
            lr_scheduler.step(epoch)
        torch.save(model.state_dict(), 'vgg16_cifar10.pth')
-
+    else:
+        assert os.path.isfile('vgg16_cifar10.pth'), "can not find checkpoint 'vgg16_cifar10.pth'"
+        model.load_state_dict(torch.load('vgg16_cifar10.pth'))
    # Test base model accuracy
    print('=' * 10 + 'Test on the original model' + '=' * 10)
-    model.load_state_dict(torch.load('vgg16_cifar10.pth'))
    test(model, device, test_loader)
    # top1 = 93.51%


--- a/examples/model_compress/QAT_torch_quantizer.py
+++ b/examples/model_compress/QAT_torch_quantizer.py
@@ -56,7 +56,7 @@ def test(model, device, test_loader):

 def main():
    torch.manual_seed(0)
-    device = torch.device('cpu')
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
    train_loader = torch.utils.data.DataLoader(
@@ -67,7 +67,6 @@ def main():
        batch_size=1000, shuffle=True)

    model = Mnist()
-
    '''you can change this to DoReFaQuantizer to implement it
    DoReFaQuantizer(configure_list).compress(model)
    '''
@@ -86,6 +85,7 @@ def main():
    quantizer = QAT_Quantizer(model, configure_list)
    quantizer.compress()

+    model.to(device)
    optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
    for epoch in range(10):
        print('# Epoch {} #'.format(epoch))

--- a/examples/model_compress/fpgm_torch_mnist.py
+++ b/examples/model_compress/fpgm_torch_mnist.py
@@ -72,7 +72,7 @@ def test(model, device, test_loader):

 def main():
    torch.manual_seed(0)
-    device = torch.device('cpu')
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
    train_loader = torch.utils.data.DataLoader(
@@ -83,6 +83,7 @@ def main():
        batch_size=1000, shuffle=True)

    model = Mnist()
+    model.to(device)
    model.print_conv_filter_sparsity()

    configure_list = [{
@@ -92,7 +93,7 @@ def main():

    pruner = FPGMPruner(model, configure_list)
    pruner.compress()
-
+    model.to(device)
    optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
    for epoch in range(10):
        pruner.update_epoch(epoch)

--- a/examples/model_compress/main_torch_pruner.py
+++ b/examples/model_compress/main_torch_pruner.py
@@ -55,7 +55,7 @@ def test(model, device, test_loader):

 def main():
    torch.manual_seed(0)
-    device = torch.device('cuda')
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
    train_loader = torch.utils.data.DataLoader(

--- a/examples/model_compress/pruning_kd.py
+++ b/examples/model_compress/pruning_kd.py
@@ -49,7 +49,7 @@ def test(model, device, test_loader):

 def main():
    torch.manual_seed(0)
-    device = torch.device('cuda')
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    train_loader = torch.utils.data.DataLoader(
        datasets.CIFAR10('./data.cifar10', train=True, download=True,
                         transform=transforms.Compose([

--- a/examples/model_compress/slim_torch_cifar10.py
+++ b/examples/model_compress/slim_torch_cifar10.py
 import math
+import os
 import argparse
 import torch
 import torch.nn as nn
@@ -57,7 +58,7 @@ def main():
    args = parser.parse_args()

    torch.manual_seed(0)
-    device = torch.device('cuda')
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    train_loader = torch.utils.data.DataLoader(
        datasets.CIFAR10('./data.cifar10', train=True, download=True,
                         transform=transforms.Compose([
@@ -90,10 +91,11 @@ def main():
            train(model, device, train_loader, optimizer, True)
            test(model, device, test_loader)
        torch.save(model.state_dict(), 'vgg19_cifar10.pth')
-
+    else:
+        assert os.path.isfile('vgg19_cifar10.pth'), "can not find checkpoint 'vgg19_cifar10.pth'"
+        model.load_state_dict(torch.load('vgg19_cifar10.pth'))
    # Test base model accuracy
    print('=' * 10 + 'Test the original model' + '=' * 10)
-    model.load_state_dict(torch.load('vgg19_cifar10.pth'))
    test(model, device, test_loader)
    # top1 = 93.60%


--- a/examples/nas/darts/retrain.py
+++ b/examples/nas/darts/retrain.py
@@ -120,7 +120,7 @@ if __name__ == "__main__":
    dataset_train, dataset_valid = datasets.get_dataset("cifar10", cutout_length=16)

    model = CNN(32, 3, 36, 10, args.layers, auxiliary=True)
-    apply_fixed_architecture(model, args.arc_checkpoint, device=device)
+    apply_fixed_architecture(model, args.arc_checkpoint)
    criterion = nn.CrossEntropyLoss()

    model.to(device)

--- a/examples/nas/enas/micro.py
+++ b/examples/nas/enas/micro.py
@@ -115,7 +115,7 @@ class ENASLayer(nn.Module):
        nodes_used_mask = torch.zeros(self.num_nodes + 2, dtype=torch.bool, device=prev.device)
        for i in range(self.num_nodes):
            node_out, mask = self.nodes[i](prev_nodes_out)
-            nodes_used_mask[:mask.size(0)] |= mask
+            nodes_used_mask[:mask.size(0)] |= mask.to(node_out.device)
            prev_nodes_out.append(node_out)

        unused_nodes = torch.cat([out for used, out in zip(nodes_used_mask, prev_nodes_out) if not used], 1)

--- a/examples/nas/proxylessnas/main.py
+++ b/examples/nas/proxylessnas/main.py
@@ -101,6 +101,6 @@ if __name__ == "__main__":
        from nni.nas.pytorch.fixed import apply_fixed_architecture
        assert os.path.isfile(args.exported_arch_path), \
            "exported_arch_path {} should be a file.".format(args.exported_arch_path)
-        apply_fixed_architecture(model, args.exported_arch_path, device=device)
+        apply_fixed_architecture(model, args.exported_arch_path)
        trainer = Retrain(model, optimizer, device, data_provider, n_epochs=300)
        trainer.run()
--- a/src/nni_manager/config/kubeflow/pytorchjob-crd-v1.json
+++ b/src/nni_manager/config/kubeflow/pytorchjob-crd-v1.json
+{
+    "kind": "CustomResourceDefinition",
+    "spec": {
+        "scope": "Namespaced",
+        "version": "v1",
+        "group": "kubeflow.org",
+        "names": {
+            "kind": "PyTorchJob",
+            "plural": "pytorchjobs",
+            "singular": "pytorchjob"
+        }
+    },
+    "apiVersion": "kubeflow.org/v1",
+    "metadata": {
+        "name": "pytorchjobs.kubeflow.org"
+    }
+}
\ No newline at end of file
--- a/src/nni_manager/training_service/kubernetes/kubeflow/kubeflowApiClient.ts
+++ b/src/nni_manager/training_service/kubernetes/kubeflow/kubeflowApiClient.ts
@@ -83,7 +83,24 @@ class TFOperatorClientV1 extends KubernetesCRDClient {
        return 'tensorflow';
    }
 }
+class PyTorchOperatorClientV1 extends KubernetesCRDClient {
+    /**
+     * constructor, to initialize tfjob CRD definition
+     */
+    public constructor() {
+        super();
+        this.crdSchema = JSON.parse(fs.readFileSync('./config/kubeflow/pytorchjob-crd-v1.json', 'utf8'));
+        this.client.addCustomResourceDefinition(this.crdSchema);
+    }
+
+    protected get operator(): any {
+        return this.client.apis['kubeflow.org'].v1.namespaces('default').pytorchjobs;
+    }

+    public get containerName(): string {
+        return 'pytorch';
+    }
+}
 class PyTorchOperatorClientV1Alpha2 extends KubernetesCRDClient {
    /**
     * constructor, to initialize tfjob CRD definition
@@ -179,6 +196,9 @@ class KubeflowOperatorClientFactory {
                    case 'v1beta2': {
                        return new PyTorchOperatorClientV1Beta2();
                    }
+                    case 'v1': {
+                        return new PyTorchOperatorClientV1();
+                    }
                    default:
                        throw new Error(`Invalid pytorch-operator apiVersion ${operatorApiVersion}`);
                }

--- a/src/nni_manager/training_service/kubernetes/kubeflow/kubeflowConfig.ts
+++ b/src/nni_manager/training_service/kubernetes/kubeflow/kubeflowConfig.ts
@@ -13,7 +13,7 @@ import { AzureStorage, KeyVaultConfig, KubernetesClusterConfig, KubernetesCluste
 export type KubeflowOperator = 'tf-operator' | 'pytorch-operator' ;
 export type DistTrainRole = 'worker' | 'ps' | 'master';
 export type KubeflowJobStatus = 'Created' | 'Running' | 'Failed' | 'Succeeded';
-export type OperatorApiVersion = 'v1alpha2' | 'v1beta1' | 'v1beta2';
+export type OperatorApiVersion = 'v1alpha2' | 'v1beta1' | 'v1beta2' | 'v1';

 /**
 * Kubeflow Cluster Configuration