[Test] Basic regression test setup. (#2415)

* add machine name * update scripts * update script * test commit * change run.sh * model acc bench for gcn and sage * get basic pipeline setup for local benchmarking * try to bridge pytest with asv * fix deps * move asv to other folders * move dir * update script * new setup * delete useless file * delete outputs * remove dependency on pytest * update script * test commit * stuck by torch version in dgl-ci-gpu * update readme * update asv conf * missing files * remove the old regression folder * api bench * add batch api bench Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>

[Test] Basic regression test setup. (#2415)
* add machine name * update scripts * update script * test commit * change run.sh * model acc bench for gcn and sage * get basic pipeline setup for local benchmarking * try to bridge pytest with asv * fix deps * move asv to other folders * move dir * update script * new setup * delete useless file * delete outputs * remove dependency on pytest * update script * test commit * stuck by torch version in dgl-ci-gpu * update readme * update asv conf * missing files * remove the old regression folder * api bench * add batch api bench Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>
6634b984 · Minjie Wang · GitHub · 8ff47980 · 6634b984 · 6634b984
Unverified Commit 6634b984 authored Dec 15, 2020 by Minjie Wang Committed by GitHub Dec 15, 2020
20 changed files
--- a/benchmarks/.gitignore
+++ b/benchmarks/.gitignore
+html
+results
--- a/benchmarks/README.md
+++ b/benchmarks/README.md
+DGL Benchmarks
+====
+Benchmarking DGL with Airspeed Velocity.
+Usage
+---
+Before beginning, ensure that airspeed velocity is installed:
+```bash
+pip install asv
+```
+To run all benchmarks locally, build the project first and then run:
+```bash
+asv run -n -e --python=same --verbose
+```
+Note that local run will not produce any benchmark results on disk.
+To change the device for benchmarking, set the `DGL_BENCH_DEVICE` environment variable.
+Any valid PyTorch device strings are allowed.
+```bash
+export DGL_BENCH_DEVICE=cuda:0
+```
+DGL runs all benchmarks automatically in docker container. To run all benchmarks in docker,
+use the `publish.sh` script. It accepts two arguments, a name specifying the identity of
+the test machine and a device name.
+```bash
+bash publish.sh dev-machine cuda:0
+```
+The script will output two folders `results` and `html`. The `html` folder contains the
+generated static web pages. View it by:
+```bash
+asv preview
+```
+Adding a new benchmark suite
+---
+The benchmark folder is organized as follows:
+```
+|-- benchmarks/
+  |-- model_acc/           # benchmarks for model accuracy
+    |-- bench_gcn.py
+    |-- bench_gat.py
+    |-- bench_sage.py
+    ...
+  |-- model_speed/         # benchmarks for model training speed
+    |-- bench_gat.py
+    |-- bench_sage.py
+    ...
+  ...                      # other types of benchmarks
+|-- html/                  # generated html files
+|-- results/               # generated result files
+|-- asv.conf.json          # asv config file
+|-- build_dgl_asv.sh       # script for building dgl in asv
+|-- install_dgl_asv.sh     # script for installing dgl in asv
+|-- publish.sh             # script for running benchmarks in docker
+|-- README.md              # this readme
+|-- run.sh                 # script for calling asv in docker
+|-- ...                    # other aux files
+```
+To add a new benchmark, pick a suitable benchmark type and create a python script under
+it. We prefer to have the prefix `bench_` in the name. Here is a toy example:
+```python
+# bench_range.py
+import time
+from .. import utils
+@utils.benchmark('time')
+@utils.parametrize('l', [10, 100, 1000])
+@utils.parametrize('u', [10, 100, 1000])
+def track_time(l, u):
+    t0 = time.time()
+    for i in range(l, u):
+        pass
+    return time.time() - t0
+```
+* The main entry point of each benchmark script is a `track_*` function. The function
+  can have arbitrary arguments and must return the benchmark result.
+* There are two useful decorators: `utils.benchmark` and `utils.parametrize`.
+* `utils.benchmark` indicates the type of this benchmark. Currently supported types are:
+  `'time'` and `'acc'`. The decorator will perform some necessary setup and finalize
+  steps such as fixing the random seed for the `'acc'` type.
+* `utils.parametrize` specifies the parameters to test.
+  Multiple parametrize decorators mean benchmarking the combination.
+* Check out `model_acc/bench_gcn.py` and `model_speed/bench_sage.py`.
+* ASV's [official guide on writing benchmarks](https://asv.readthedocs.io/en/stable/writing_benchmarks.html)
+  is also very helpful.
+Tips
+----
+* Feed flags `-e --verbose` to `asv run` to print out stderr and more information. Use `--bench` flag
+  to run specific benchmarks.
+* When running benchmarks locally (e.g., with `--python=same`), ASV will not write results to disk
+  so `asv publish` will not generate plots.
+* When running benchmarks in docker, ASV will pull the codes from remote and build them in conda
+  environment. The repository to pull is determined by `origin`, so it works with forked repository.
+  The branches are configured in `asv.conf.json`. If you wish to test the impact of your local source
+  code changes on performance in docker, remember to before running `publish.sh`:
+    - Commit your local changes and push it to remote `origin`.
+    - Add the corresponding branch to `asv.conf.json`.
+* Try make your benchmarks compatible with all the versions being tested.
--- a/asv.conf.json
+++ b/asv.conf.json
@@ -5,10 +5,10 @@
    // The name of the project being benchmarked
    "project": "dgl",
    // The project's homepage
-    "project_url": "https://github.com/dmlc/dgl",
+    "project_url": "https://www.dgl.ai",
    // The URL or local path of the source code repository for the
    // project being benchmarked
-    "repo": ".",
+    "repo": "..",
    // The Python project's subdirectory in your repo.  If missing or
    // the empty string, the project is assumed to be located at the root
    // of the repository.
@@ -16,34 +16,29 @@
    // Customizable commands for building, installing, and
    // uninstalling the project. See asv.conf.json documentation.
    //
-    "install_command": [
-        "/bin/bash {build_dir}/tests/regression/install_dgl_asv.sh"
-    ],
    "build_command": [
-        "/bin/bash {build_dir}/tests/regression/build_dgl_asv.sh"
+        "/bin/bash {conf_dir}/build_dgl_asv.sh"
+    ],
+    "install_command": [
+        "/bin/bash {conf_dir}/install_dgl_asv.sh"
    ],
    "uninstall_command": [
-        "return-code=any python -mpip uninstall -y dgl"
+        "return-code=any python -m pip uninstall -y dgl"
    ],
-    // "build_command": [
-    //     "python setup.py build",
-    //     "PIP_NO_BUILD_ISOLATION=false python -mpip wheel --no-deps --no-index -w {build_cache_dir} {build_dir}"
-    // ],
    // List of branches to benchmark. If not provided, defaults to "master"
    // (for git) or "default" (for mercurial).
-    "branches": ["master"], // for git
+    "branches": ["master", "0.5.0", "0.5.2", "0.5.3", "0.4.3.post2"], // for git
-    // "branches": ["default"],    // for mercurial
    // The DVCS being used.  If not set, it will be automatically
    // determined from "repo" by looking at the protocol in the URL
    // (if remote), or by looking for special directories, such as
    // ".git" (if local).
-    // "dvcs": "git",
+    "dvcs": "git",
    // The tool to use to create environments.  May be "conda",
    // "virtualenv" or other value depending on the plugins in use.
    // If missing or the empty string, the tool will be automatically
    // determined by looking for tools on the PATH environment
    // variable.
-    // "environment_type": "conda",
+    "environment_type": "conda",
    // timeout in seconds for installing any dependencies in environment
    // defaults to 10 min
    "install_timeout": 600,
@@ -104,16 +99,16 @@
    // ],
    // The directory (relative to the current directory) that benchmarks are
    // stored in.  If not provided, defaults to "benchmarks"
-    "benchmark_dir": "tests/regression",
+    // "benchmark_dir": "benchmarks",
    // The directory (relative to the current directory) to cache the Python
    // environments in.  If not provided, defaults to "env"
-    "env_dir": ".asv/env",
+    "env_dir": "env",
    // The directory (relative to the current directory) that raw benchmark
    // results are stored in.  If not provided, defaults to "results".
-    "results_dir": "asv/results",
+    "results_dir": "results",
    // The directory (relative to the current directory) that the html tree
    // should be written to.  If not provided, defaults to "html".
-    "html_dir": "asv/html",
+    "html_dir": "html",
    // The number of characters to retain in the commit hashes.
    // "hash_length": 8,
    // `asv` will cache results of the recent builds in each

--- a/benchmarks/benchmarks/__init__.py
+++ b/benchmarks/benchmarks/__init__.py
--- a/benchmarks/benchmarks/api/__init__.py
+++ b/benchmarks/benchmarks/api/__init__.py
--- a/benchmarks/benchmarks/api/bench_batch.py
+++ b/benchmarks/benchmarks/api/bench_batch.py
+import time
+import dgl
+import torch
+from .. import utils
+@utils.benchmark('time')
+@utils.parametrize('batch_size', [4, 32, 256])
+def track_time(batch_size):
+    device = utils.get_bench_device()
+    # prepare graph
+    graphs = []
+    for i in range(batch_size):
+        u = torch.randint(20, (40,))
+        v = torch.randint(20, (40,))
+        graphs.append(dgl.graph((u, v)).to(device))
+    # dry run
+    for i in range(10):
+        g = dgl.batch(graphs)
+    # timing
+    t0 = time.time()
+    for i in range(100):
+        g = dgl.batch(graphs)
+    t1 = time.time()
+    return (t1 - t0) / 100
--- a/benchmarks/benchmarks/model_acc/__init__.py
+++ b/benchmarks/benchmarks/model_acc/__init__.py
--- a/benchmarks/benchmarks/model_acc/bench_gat.py
+++ b/benchmarks/benchmarks/model_acc/bench_gat.py
+import dgl
+from dgl.nn.pytorch import GATConv
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from .. import utils
+class GAT(nn.Module):
+    def __init__(self,
+                 num_layers,
+                 in_dim,
+                 num_hidden,
+                 num_classes,
+                 heads,
+                 activation,
+                 feat_drop,
+                 attn_drop,
+                 negative_slope,
+                 residual):
+        super(GAT, self).__init__()
+        self.num_layers = num_layers
+        self.gat_layers = nn.ModuleList()
+        self.activation = activation
+        # input projection (no residual)
+        self.gat_layers.append(GATConv(
+            in_dim, num_hidden, heads[0],
+            feat_drop, attn_drop, negative_slope, False, self.activation))
+        # hidden layers
+        for l in range(1, num_layers):
+            # due to multi-head, the in_dim = num_hidden * num_heads
+            self.gat_layers.append(GATConv(
+                num_hidden * heads[l-1], num_hidden, heads[l],
+                feat_drop, attn_drop, negative_slope, residual, self.activation))
+        # output projection
+        self.gat_layers.append(GATConv(
+            num_hidden * heads[-2], num_classes, heads[-1],
+            feat_drop, attn_drop, negative_slope, residual, None))
+    def forward(self, g, inputs):
+        h = inputs
+        for l in range(self.num_layers):
+            h = self.gat_layers[l](g, h).flatten(1)
+        # output projection
+        logits = self.gat_layers[-1](g, h).mean(1)
+        return logits
+def evaluate(model, g, features, labels, mask):
+    model.eval()
+    with torch.no_grad():
+        logits = model(g, features)
+        logits = logits[mask]
+        labels = labels[mask]
+        _, indices = torch.max(logits, dim=1)
+        correct = torch.sum(indices == labels)
+        return correct.item() * 1.0 / len(labels) * 100
+@utils.benchmark('acc')
+@utils.parametrize('data', ['cora', 'pubmed'])
+def track_acc(data):
+    data = utils.process_data(data)
+    device = utils.get_bench_device()
+    g = data[0].to(device)
+    features = g.ndata['feat']
+    labels = g.ndata['label']
+    train_mask = g.ndata['train_mask']
+    val_mask = g.ndata['val_mask']
+    test_mask = g.ndata['test_mask']
+    in_feats = features.shape[1]
+    n_classes = data.num_labels
+    g = dgl.remove_self_loop(g)
+    g = dgl.add_self_loop(g)
+    # create model
+    model = GAT(1, in_feats, 8, n_classes, [8, 1], F.elu,
+                0.6, 0.6, 0.2, False)
+    loss_fcn = torch.nn.CrossEntropyLoss()
+    model = model.to(device)
+    model.train()
+    # optimizer
+    optimizer = torch.optim.Adam(model.parameters(),
+                                 lr=1e-2,
+                                 weight_decay=5e-4)
+    for epoch in range(200):
+        logits = model(g, features)
+        loss = loss_fcn(logits[train_mask], labels[train_mask])
+        optimizer.zero_grad()
+        loss.backward()
+        optimizer.step()
+    acc = evaluate(model, g, features, labels, test_mask)
+    return acc
--- a/benchmarks/benchmarks/model_acc/bench_gcn.py
+++ b/benchmarks/benchmarks/model_acc/bench_gcn.py
+import dgl
+from dgl.nn.pytorch import GraphConv
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from .. import utils
+class GCN(nn.Module):
+    def __init__(self,
+                 in_feats,
+                 n_hidden,
+                 n_classes,
+                 n_layers,
+                 activation,
+                 dropout):
+        super(GCN, self).__init__()
+        self.layers = nn.ModuleList()
+        # input layer
+        self.layers.append(GraphConv(in_feats, n_hidden, activation=activation))
+        # hidden layers
+        for i in range(n_layers - 1):
+            self.layers.append(GraphConv(n_hidden, n_hidden, activation=activation))
+        # output layer
+        self.layers.append(GraphConv(n_hidden, n_classes))
+        self.dropout = nn.Dropout(p=dropout)
+    def forward(self, g, features):
+        h = features
+        for i, layer in enumerate(self.layers):
+            if i != 0:
+                h = self.dropout(h)
+            h = layer(g, h)
+        return h
+def evaluate(model, g, features, labels, mask):
+    model.eval()
+    with torch.no_grad():
+        logits = model(g, features)
+        logits = logits[mask]
+        labels = labels[mask]
+        _, indices = torch.max(logits, dim=1)
+        correct = torch.sum(indices == labels)
+        return correct.item() * 1.0 / len(labels) * 100
+@utils.benchmark('acc')
+@utils.parametrize('data', ['cora', 'pubmed'])
+def track_acc(data):
+    data = utils.process_data(data)
+    device = utils.get_bench_device()
+    g = data[0].to(device).int()
+    features = g.ndata['feat']
+    labels = g.ndata['label']
+    train_mask = g.ndata['train_mask']
+    val_mask = g.ndata['val_mask']
+    test_mask = g.ndata['test_mask']
+    in_feats = features.shape[1]
+    n_classes = data.num_labels
+    g = dgl.remove_self_loop(g)
+    g = dgl.add_self_loop(g)
+    # normalization
+    degs = g.in_degrees().float()
+    norm = torch.pow(degs, -0.5)
+    norm[torch.isinf(norm)] = 0
+    g.ndata['norm'] = norm.unsqueeze(1)
+    # create GCN model
+    model = GCN(in_feats, 16, n_classes, 1, F.relu, 0.5)
+    loss_fcn = torch.nn.CrossEntropyLoss()
+    model = model.to(device)
+    model.train()
+    # optimizer
+    optimizer = torch.optim.Adam(model.parameters(),
+                                 lr=1e-2,
+                                 weight_decay=5e-4)
+    for epoch in range(200):
+        logits = model(g, features)
+        loss = loss_fcn(logits[train_mask], labels[train_mask])
+        optimizer.zero_grad()
+        loss.backward()
+        optimizer.step()
+    acc = evaluate(model, g, features, labels, test_mask)
+    return acc
--- a/benchmarks/benchmarks/model_acc/bench_sage.py
+++ b/benchmarks/benchmarks/model_acc/bench_sage.py
+import dgl
+from dgl.nn.pytorch import SAGEConv
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from .. import utils
+class GraphSAGE(nn.Module):
+    def __init__(self,
+                 in_feats,
+                 n_hidden,
+                 n_classes,
+                 n_layers,
+                 activation,
+                 dropout,
+                 aggregator_type):
+        super(GraphSAGE, self).__init__()
+        self.layers = nn.ModuleList()
+        self.dropout = nn.Dropout(dropout)
+        self.activation = activation
+        # input layer
+        self.layers.append(SAGEConv(in_feats, n_hidden, aggregator_type))
+        # hidden layers
+        for i in range(n_layers - 1):
+            self.layers.append(SAGEConv(n_hidden, n_hidden, aggregator_type))
+        # output layer
+        self.layers.append(SAGEConv(n_hidden, n_classes, aggregator_type)) # activation None
+    def forward(self, graph, inputs):
+        h = self.dropout(inputs)
+        for l, layer in enumerate(self.layers):
+            h = layer(graph, h)
+            if l != len(self.layers) - 1:
+                h = self.activation(h)
+                h = self.dropout(h)
+        return h
+def evaluate(model, g, features, labels, mask):
+    model.eval()
+    with torch.no_grad():
+        logits = model(g, features)
+        logits = logits[mask]
+        labels = labels[mask]
+        _, indices = torch.max(logits, dim=1)
+        correct = torch.sum(indices == labels)
+        return correct.item() * 1.0 / len(labels) * 100
+@utils.benchmark('acc')
+@utils.parametrize('data', ['cora', 'pubmed'])
+def track_acc(data):
+    data = utils.process_data(data)
+    device = utils.get_bench_device()
+    g = data[0].to(device)
+    features = g.ndata['feat']
+    labels = g.ndata['label']
+    train_mask = g.ndata['train_mask']
+    val_mask = g.ndata['val_mask']
+    test_mask = g.ndata['test_mask']
+    in_feats = features.shape[1]
+    n_classes = data.num_labels
+    g = dgl.remove_self_loop(g)
+    g = dgl.add_self_loop(g)
+    # create model
+    model = GraphSAGE(in_feats, 16, n_classes, 1, F.relu, 0.5, 'gcn')
+    loss_fcn = torch.nn.CrossEntropyLoss()
+    model = model.to(device)
+    model.train()
+    # optimizer
+    optimizer = torch.optim.Adam(model.parameters(),
+                                 lr=1e-2,
+                                 weight_decay=5e-4)
+    for epoch in range(200):
+        logits = model(g, features)
+        loss = loss_fcn(logits[train_mask], labels[train_mask])
+        optimizer.zero_grad()
+        loss.backward()
+        optimizer.step()
+    acc = evaluate(model, g, features, labels, test_mask)
+    return acc
--- a/benchmarks/benchmarks/model_speed/__init__.py
+++ b/benchmarks/benchmarks/model_speed/__init__.py
--- a/benchmarks/benchmarks/model_speed/bench_gat.py
+++ b/benchmarks/benchmarks/model_speed/bench_gat.py
+import time
+import dgl
+from dgl.nn.pytorch import GATConv
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from .. import utils
+class GAT(nn.Module):
+    def __init__(self,
+                 num_layers,
+                 in_dim,
+                 num_hidden,
+                 num_classes,
+                 heads,
+                 activation,
+                 feat_drop,
+                 attn_drop,
+                 negative_slope,
+                 residual):
+        super(GAT, self).__init__()
+        self.num_layers = num_layers
+        self.gat_layers = nn.ModuleList()
+        self.activation = activation
+        # input projection (no residual)
+        self.gat_layers.append(GATConv(
+            in_dim, num_hidden, heads[0],
+            feat_drop, attn_drop, negative_slope, False, self.activation))
+        # hidden layers
+        for l in range(1, num_layers):
+            # due to multi-head, the in_dim = num_hidden * num_heads
+            self.gat_layers.append(GATConv(
+                num_hidden * heads[l-1], num_hidden, heads[l],
+                feat_drop, attn_drop, negative_slope, residual, self.activation))
+        # output projection
+        self.gat_layers.append(GATConv(
+            num_hidden * heads[-2], num_classes, heads[-1],
+            feat_drop, attn_drop, negative_slope, residual, None))
+    def forward(self, g, inputs):
+        h = inputs
+        for l in range(self.num_layers):
+            h = self.gat_layers[l](g, h).flatten(1)
+        # output projection
+        logits = self.gat_layers[-1](g, h).mean(1)
+        return logits
+@utils.benchmark('time')
+@utils.parametrize('data', ['cora', 'pubmed'])
+def track_time(data):
+    data = utils.process_data(data)
+    device = utils.get_bench_device()
+    num_epochs = 200
+    g = data[0].to(device)
+    features = g.ndata['feat']
+    labels = g.ndata['label']
+    train_mask = g.ndata['train_mask']
+    val_mask = g.ndata['val_mask']
+    test_mask = g.ndata['test_mask']
+    in_feats = features.shape[1]
+    n_classes = data.num_labels
+    g = dgl.remove_self_loop(g)
+    g = dgl.add_self_loop(g)
+    # create model
+    model = GAT(1, in_feats, 8, n_classes, [8, 1], F.elu,
+                0.6, 0.6, 0.2, False)
+    loss_fcn = torch.nn.CrossEntropyLoss()
+    model = model.to(device)
+    model.train()
+    # optimizer
+    optimizer = torch.optim.Adam(model.parameters(),
+                                 lr=1e-2,
+                                 weight_decay=5e-4)
+    # dry run
+    for epoch in range(10):
+        logits = model(g, features)
+        loss = loss_fcn(logits[train_mask], labels[train_mask])
+        optimizer.zero_grad()
+        loss.backward()
+        optimizer.step()
+    # timing
+    t0 = time.time()
+    for epoch in range(num_epochs):
+        logits = model(g, features)
+        loss = loss_fcn(logits[train_mask], labels[train_mask])
+        optimizer.zero_grad()
+        loss.backward()
+        optimizer.step()
+    t1 = time.time()
+    return t1 - t0
--- a/benchmarks/benchmarks/model_speed/bench_sage.py
+++ b/benchmarks/benchmarks/model_speed/bench_sage.py
+import time
+import dgl
+from dgl.nn.pytorch import SAGEConv
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from .. import utils
+class GraphSAGE(nn.Module):
+    def __init__(self,
+                 in_feats,
+                 n_hidden,
+                 n_classes,
+                 n_layers,
+                 activation,
+                 dropout,
+                 aggregator_type):
+        super(GraphSAGE, self).__init__()
+        self.layers = nn.ModuleList()
+        self.dropout = nn.Dropout(dropout)
+        self.activation = activation
+        # input layer
+        self.layers.append(SAGEConv(in_feats, n_hidden, aggregator_type))
+        # hidden layers
+        for i in range(n_layers - 1):
+            self.layers.append(SAGEConv(n_hidden, n_hidden, aggregator_type))
+        # output layer
+        self.layers.append(SAGEConv(n_hidden, n_classes, aggregator_type)) # activation None
+    def forward(self, graph, inputs):
+        h = self.dropout(inputs)
+        for l, layer in enumerate(self.layers):
+            h = layer(graph, h)
+            if l != len(self.layers) - 1:
+                h = self.activation(h)
+                h = self.dropout(h)
+        return h
+@utils.benchmark('time')
+@utils.parametrize('data', ['cora', 'pubmed'])
+def track_time(data):
+    data = utils.process_data(data)
+    device = utils.get_bench_device()
+    num_epochs = 200
+    g = data[0].to(device)
+    features = g.ndata['feat']
+    labels = g.ndata['label']
+    train_mask = g.ndata['train_mask']
+    val_mask = g.ndata['val_mask']
+    test_mask = g.ndata['test_mask']
+    in_feats = features.shape[1]
+    n_classes = data.num_labels
+    g = dgl.remove_self_loop(g)
+    g = dgl.add_self_loop(g)
+    # create model
+    model = GraphSAGE(in_feats, 16, n_classes, 1, F.relu, 0.5, 'gcn')
+    loss_fcn = torch.nn.CrossEntropyLoss()
+    model = model.to(device)
+    model.train()
+    # optimizer
+    optimizer = torch.optim.Adam(model.parameters(),
+                                 lr=1e-2,
+                                 weight_decay=5e-4)
+    # dry run
+    for i in range(10):
+        logits = model(g, features)
+        loss = loss_fcn(logits[train_mask], labels[train_mask])
+        optimizer.zero_grad()
+        loss.backward()
+        optimizer.step()
+    # timing
+    t0 = time.time()
+    for epoch in range(num_epochs):
+        logits = model(g, features)
+        loss = loss_fcn(logits[train_mask], labels[train_mask])
+        optimizer.zero_grad()
+        loss.backward()
+        optimizer.step()
+    t1 = time.time()
+    return t1 - t0
--- a/tests/regression/benchmarks/utils.py
+++ b/tests/regression/benchmarks/utils.py
@@ -4,6 +4,7 @@ import requests
 import numpy as np
 import pandas
 import dgl
+import torch
 def _download(url, path, filename):
    fn = os.path.join(path, filename)
@@ -35,3 +36,53 @@ def get_graph(name):
    else:
        print(name + " doesn't exist")
        return None
+def process_data(name):
+    if name == 'cora':
+        return dgl.data.CoraGraphDataset()
+    elif name == 'pubmed':
+        return dgl.data.PubmedGraphDataset()
+    else:
+        raise ValueError('Invalid dataset name:', name)
+def get_bench_device():
+    return os.environ.get('DGL_BENCH_DEVICE', 'cpu')
+def setup_track_time(*args, **kwargs):
+    # fix random seed
+    np.random.seed(42)
+    torch.random.manual_seed(42)
+def setup_track_acc(*args, **kwargs):
+    # fix random seed
+    np.random.seed(42)
+    torch.random.manual_seed(42)
+TRACK_UNITS = {
+    'time' : 's',
+    'acc' : '%',
+}
+TRACK_SETUP = {
+    'time' : setup_track_time,
+    'acc' : setup_track_acc,
+}
+def parametrize(param_name, params):
+    def _wrapper(func):
+        if getattr(func, 'params', None) is None:
+            func.params = []
+        func.params.append(params)
+        if getattr(func, 'param_names', None) is None:
+            func.param_names = []
+        func.param_names.append(param_name)
+        return func
+    return _wrapper
+def benchmark(track_type):
+    assert track_type in ['time', 'acc']
+    def _wrapper(func):
+        func.unit = TRACK_UNITS[track_type]
+        func.setup = TRACK_SETUP[track_type]
+        return func
+    return _wrapper
--- a/tests/regression/build_dgl_asv.sh
+++ b/tests/regression/build_dgl_asv.sh
-mkdir build
+#!/bin/bash
-CMAKE_VARS="-DUSE_CUDA=ON"
+set -e
-rm -rf _download
+. /opt/conda/etc/profile.d/conda.sh
+# build
+CMAKE_VARS="-DUSE_CUDA=ON"
+mkdir -p build
 pushd build
 cmake $CMAKE_VARS ..
-make -j4
+make -j
 popd
--- a/tests/regression/install_dgl_asv.sh
+++ b/tests/regression/install_dgl_asv.sh
@@ -2,21 +2,14 @@
 set -e
-python -m pip install numpy
 . /opt/conda/etc/profile.d/conda.sh
+pip install -r /asv/torch_gpu_pip.txt
+pip install pandas
+# install
 pushd python
-for backend in pytorch mxnet tensorflow
-do 
-conda activate "${backend}-ci"
 rm -rf build *.egg-info dist
 pip uninstall -y dgl
-# test install
 python3 setup.py install
-# test inplace build (for cython)
-python3 setup.py build_ext --inplace
-python3 -m pip install -r /root/requirement.txt
-done
 popd
-conda deactivate
\ No newline at end of file
--- a/benchmarks/publish.sh
+++ b/benchmarks/publish.sh
+#!/bin/bash
+if [ $# -eq 2 ]; then
+    MACHINE=$1
+    DEVICE=$2
+else
+    echo "publish.sh <machine_name> <device>"
+    exit 1
+fi
+WS_ROOT=/asv/dgl
+docker run --name dgl-reg                   \
+           --rm --runtime=nvidia            \
+           --hostname=$MACHINE -dit dgllib/dgl-ci-gpu:conda /bin/bash
+docker exec dgl-reg mkdir -p $WS_ROOT
+docker cp ../.git dgl-reg:$WS_ROOT
+docker cp . dgl-reg:$WS_ROOT/benchmarks/
+docker cp torch_gpu_pip.txt dgl-reg:/asv
+docker exec dgl-reg bash $WS_ROOT/benchmarks/run.sh $DEVICE
+docker cp dgl-reg:$WS_ROOT/benchmarks/results .
+docker cp dgl-reg:$WS_ROOT/benchmarks/html .
+docker stop dgl-reg
--- a/benchmarks/run.sh
+++ b/benchmarks/run.sh
+#!/bin/bash
+set -e
+DEVICE=$1
+ROOT=/asv/dgl
+. /opt/conda/etc/profile.d/conda.sh
+conda activate base
+pip install --upgrade pip
+pip install asv
+pip uninstall -y dgl
+export DGL_BENCH_DEVICE=$DEVICE
+pushd $ROOT/benchmarks
+cat asv.conf.json
+asv machine --yes
+asv run
+asv publish
+popd
--- a/benchmarks/torch_gpu_pip.txt
+++ b/benchmarks/torch_gpu_pip.txt
+--find-links https://download.pytorch.org/whl/torch_stable.html
+torch==1.5.1+cu101
+torchvision==0.6.1+cu101
+pytest
+nose
+numpy
+cython
+scipy
+networkx
+matplotlib
+nltk
+requests[security]
+tqdm  
\ No newline at end of file
--- a/tests/regression/README.md
+++ b/tests/regression/README.md
-How to add test to regression
-=================================
-Official link to [asv](https://asv.readthedocs.io/en/stable/writing_benchmarks.html)
-## Add test
-DGL reuses the ci docker image for the regression test. There are four conda envs, base, mxnet-ci, pytorch-ci, and tensorflow-ci.
-The basic use is execute a script, and get the needed results out of the printed results.
- Create a new file in the tests/regression/
- Follow the example `bench_gcn.py` or the [official instruction](https://asv.readthedocs.io/en/stable/writing_benchmarks.html)
-  - function name starts with `track` will be used to generate the stats, by the return value
-  - setup function would be execute every time before running track function
-  - Can use params to pass parameter into `setup` and `track_` functions
-## Run locally
-The default regression branch in asv is `master`. If you need to run on other branch on your fork, please change the `branches` value in the `asv.conf.json` at the root of your repo.
-```bash
-bash ./publish.sh <repo> <branch>
-```
-The running result will be at `./asv_data/`. You can use `python -m http.server` inside the `html` folder to start a server to see the result