Add notebook for tabular data classification in AML (#3599)

Co-authored-by: kvartet <xuyiru1999@gmail.com>

Add notebook for tabular data classification in AML (#3599)
Co-authored-by: kvartet <xuyiru1999@gmail.com>
02eab99b · kvartet · GitHub · a2146d50 · a2146d50 · 02eab99b
Unverified Commit 02eab99b authored Apr 30, 2021 by kvartet Committed by GitHub Apr 30, 2021
2 changed files
--- a/examples/notebooks/retrieve_nni_info_with_python.ipynb
+++ b/examples/notebooks/retrieve_nni_info_with_python.ipynb
--- a/examples/notebooks/tabular_data_classification_in_AML.ipynb
+++ b/examples/notebooks/tabular_data_classification_in_AML.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Tabular Data Classification with NNI in AML"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This simple example is to use NNI NAS 2.0(Retiarii) framework to search for the best neural architecture for tabular data classification task in Azure Machine Learning training platform.\n",
+    "\n",
+    "The video demo is https://www.youtube.com/watch?v=PDVqBmm7Cro and https://www.bilibili.com/video/BV1oy4y1W7GF."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 1: Prepare the dataset"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The first step is to prepare the dataset. Here we use the Titanic dataset as an example."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import torch\n",
+    "import pandas as pd\n",
+    "\n",
+    "from sklearn.preprocessing import LabelEncoder\n",
+    "from torchvision.datasets.utils import download_url\n",
+    "\n",
+    "class TitanicDataset(torch.utils.data.Dataset):\n",
+    "    def __init__(self, root: str, train: bool = True):\n",
+    "        filename = 'train.csv' if train else 'eval.csv'\n",
+    "        if not os.path.exists(os.path.join(root, filename)):\n",
+    "            download_url(os.path.join(\n",
+    "                'https://storage.googleapis.com/tf-datasets/titanic/', filename), root, filename)\n",
+    "\n",
+    "        df = pd.read_csv(os.path.join(root, filename))\n",
+    "        object_colunmns = df.select_dtypes(include='object').columns.values\n",
+    "        for idx in df.columns:\n",
+    "            if idx in object_colunmns:\n",
+    "                df[idx] = LabelEncoder().fit_transform(df[idx])\n",
+    "        \n",
+    "        self.x = torch.tensor(df.iloc[:, 1:].values)\n",
+    "        self.y = torch.tensor(df.iloc[:, 0].values)\n",
+    "\n",
+    "    def __len__(self):\n",
+    "        return len(self.y)\n",
+    "\n",
+    "    def __getitem__(self, idx):\n",
+    "        return self.x[idx], self.y[idx]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_dataset = TitanicDataset('./data', train=True)\n",
+    "test_dataset = TitanicDataset('./data', train=False)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 2: Define the Model Space"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Model space is defined by users to express a set of models that they want to explore, which contains potentially good-performing models. In Retiarii(NNI NAS 2.0) framework, a model space is defined with two parts: a base model and possible mutations on the base model."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Step 2.1: Define the Base Model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Defining a base model is almost the same as defining a PyTorch (or TensorFlow) model. Usually, you only need to replace the code ``import torch.nn as nn`` with ``import nni.retiarii.nn.pytorch as nn`` to use NNI wrapped PyTorch modules. Below is a very simple example of defining a base model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import nni.retiarii.nn.pytorch as nn\n",
+    "import torch.nn.functional as F\n",
+    "\n",
+    "class Net(nn.Module):\n",
+    "\n",
+    "    def __init__(self, input_size):\n",
+    "        super().__init__()\n",
+    "\n",
+    "        self.fc1 = nn.Linear(input_size, 16)\n",
+    "        self.bn1 = nn.BatchNorm1d(16)\n",
+    "        self.dropout1 = nn.Dropout(0.0)\n",
+    "\n",
+    "        self.fc2 = nn.Linear(16, 16)\n",
+    "        self.bn2 = nn.BatchNorm1d(16)\n",
+    "        self.dropout2 = nn.Dropout(0.0)\n",
+    "\n",
+    "        self.fc3 = nn.Linear(16, 2)\n",
+    "\n",
+    "    def forward(self, x):\n",
+    "\n",
+    "        x = self.dropout1(F.relu(self.bn1(self.fc1(x))))\n",
+    "        x = self.dropout2(F.relu(self.bn2(self.fc2(x))))\n",
+    "        x = F.sigmoid(self.fc3(x))\n",
+    "        return x\n",
+    "    \n",
+    "model_space = Net(len(train_dataset.__getitem__(0)[0]))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Step 2.2: Define the Model Mutations"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "A base model is only one concrete model, not a model space. NNI provides APIs and primitives for users to express how the base model can be mutated, i.e., a model space that includes many models. The following will use inline Mutation APIs as a simple example. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import nni.retiarii.nn.pytorch as nn\n",
+    "import torch.nn.functional as F\n",
+    "\n",
+    "class Net(nn.Module):\n",
+    "\n",
+    "    def __init__(self, input_size):\n",
+    "        super().__init__()\n",
+    "\n",
+    "        self.hidden_dim1 = nn.ValueChoice(\n",
+    "            [16, 32, 64, 128, 256, 512, 1024], label='hidden_dim1')\n",
+    "        self.hidden_dim2 = nn.ValueChoice(\n",
+    "            [16, 32, 64, 128, 256, 512, 1024], label='hidden_dim2')\n",
+    "\n",
+    "        self.fc1 = nn.Linear(input_size, self.hidden_dim1)\n",
+    "        self.bn1 = nn.BatchNorm1d(self.hidden_dim1)\n",
+    "        self.dropout1 = nn.Dropout(nn.ValueChoice([0.0, 0.25, 0.5]))\n",
+    "\n",
+    "        self.fc2 = nn.Linear(self.hidden_dim1, self.hidden_dim2)\n",
+    "        self.bn2 = nn.BatchNorm1d(self.hidden_dim2)\n",
+    "        self.dropout2 = nn.Dropout(nn.ValueChoice([0.0, 0.25, 0.5]))\n",
+    "\n",
+    "        self.fc3 = nn.Linear(self.hidden_dim2, 2)\n",
+    "\n",
+    "    def forward(self, x):\n",
+    "\n",
+    "        x = self.dropout1(F.relu(self.bn1(self.fc1(x))))\n",
+    "        x = self.dropout2(F.relu(self.bn2(self.fc2(x))))\n",
+    "        x = F.sigmoid(self.fc3(x))\n",
+    "        return x\n",
+    "\n",
+    "model_space = Net(len(train_dataset.__getitem__(0)[0]))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Besides inline mutations, Retiarii also provides ``mutator``, a more general approach to express complex model space."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 3: Explore the Defined Model Space"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In the NAS process, the search strategy repeatedly generates new models, and the model evaluator is for training and validating each generated model. The obtained performance of a generated model is collected and sent to the search strategy for generating better models.\n",
+    "\n",
+    "Users can choose a proper search strategy to explore the model space, and use a chosen or user-defined model evaluator to evaluate the performance of each sampled model."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Step 3.1: Choose a Search Strategy"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import nni.retiarii.strategy as strategy\n",
+    "\n",
+    "simple_strategy = strategy.TPEStrategy()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Step 3.2: Choose or Write a Model Evaluator"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In the context of PyTorch, Retiarii has provided two built-in model evaluators, designed for simple use cases: classification and regression. These two evaluators are built upon the awesome library PyTorch-Lightning."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import nni.retiarii.evaluator.pytorch.lightning as pl\n",
+    "\n",
+    "trainer = pl.Classification(train_dataloader=pl.DataLoader(train_dataset, batch_size=16),\n",
+    "                                val_dataloaders=pl.DataLoader(\n",
+    "                                test_dataset, batch_size=16),\n",
+    "                                max_epochs=20)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 4: Configure the Experiment"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "After all the above are prepared, it is time to configure an experiment to do the model search. The basic experiment configuration is as follows: "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from nni.retiarii.experiment.pytorch import RetiariiExeConfig, RetiariiExperiment\n",
+    "\n",
+    "exp = RetiariiExperiment(model_space, trainer, [], simple_strategy)\n",
+    "\n",
+    "exp_config = RetiariiExeConfig('aml')\n",
+    "exp_config.experiment_name = 'titanic_example'\n",
+    "exp_config.trial_concurrency = 2\n",
+    "exp_config.max_trial_number = 20\n",
+    "exp_config.max_experiment_duration = '2h'\n",
+    "exp_config.trial_gpu_number = 1\n",
+    "exp_config.nni_manager_ip = '' # your nni_manager_ip"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Running NNI experiments on the AML(Azure Machine Learning) training service is also simple, you only need to configure the following additional fields:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "exp_config.training_service.use_active_gpu = True\n",
+    "exp_config.training_service.subscription_id = '' # your subscription id\n",
+    "exp_config.training_service.resource_group = '' # your resource group\n",
+    "exp_config.training_service.workspace_name = '' # your workspace name\n",
+    "exp_config.training_service.compute_target = '' # your compute target\n",
+    "exp_config.training_service.docker_image = ''  # your docker image"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 5: Run and View the Experiment"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You can launch the experiment now! \n",
+    "\n",
+    "Besides, NNI provides WebUI to help users view the experiment results and make more advanced analysis."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "exp.run(exp_config, 8081 + random.randint(0, 100))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 6: Export the top Model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Exporting the top model script is also very convenient."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print('Final model:')\n",
+    "for model_code in exp.export_top_models():\n",
+    "    print(model_code)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}