[Fix] Various Fix after Bug Bash (#2620)

* update * Misc fix * Update

[Fix] Various Fix after Bug Bash (#2620)
* update * Misc fix * Update
3ca52b1e · Mufei Li · GitHub · f8c9d58a · 3ca52b1e · 3ca52b1e
Unverified Commit 3ca52b1e authored Feb 05, 2021 by Mufei Li Committed by GitHub Feb 05, 2021
16 changed files
--- a/docs/source/guide/message-edge.rst
+++ b/docs/source/guide/message-edge.rst
@@ -20,9 +20,10 @@ For example:
    import dgl.function as fn
-    graph.edata['a'] = affinity
+    # Suppose eweight is a tensor of shape (E, *), where E is the number of edges.
+    graph.edata['a'] = eweight
    graph.update_all(fn.u_mul_e('ft', 'a', 'm'),
                     fn.sum('m', 'ft'))
-The example above uses affinity as the edge weight. The edge weight should
+The example above uses eweight as the edge weight. The edge weight should
 usually be a scalar.
\ No newline at end of file
--- a/docs/source/guide/mixed_precision.rst
+++ b/docs/source/guide/mixed_precision.rst
@@ -5,7 +5,7 @@ Chapter 8: Mixed Precision Training
 DGL is compatible with `PyTorch's automatic mixed precision package
 <https://pytorch.org/docs/stable/amp.html>`_
 for mixed precision training, thus saving both training time and GPU memory
-consumption. To enable this feature, users need to install PyTorch 1.6+ and
+consumption. To enable this feature, users need to install PyTorch 1.6+ with python 3.7+ and
 build DGL from source file to support ``float16`` data type (this feature is
 still in its beta stage and we do not provide official pre-built pip wheels).

--- a/docs/source/guide/training-node.rst
+++ b/docs/source/guide/training-node.rst
@@ -157,8 +157,7 @@ all node types.
 .. code:: python
    # Define a Heterograph Conv model
-    import dgl.nn as dglnn
    class RGCN(nn.Module):
        def __init__(self, in_feats, hid_feats, out_feats, rel_names):
            super().__init__()

--- a/docs/source/guide_cn/distributed.rst
+++ b/docs/source/guide_cn/distributed.rst
@@ -12,7 +12,7 @@ DGL采用完全分布式的方法，可将数据和计算同时分布在一组
 对于训练脚本，DGL提供了分布式的API。它们与小批次训练的API相似。用户仅需对单机小批次训练的代码稍作修改就可实现分布式训练。
 以下代码给出了一个用分布式方式训练GraphSage的示例。仅有的代码修改出现在第4-7行：1)初始化DGL的分布式模块，2)创建分布式图对象，以及
-3)拆分训练集，并计算本地进程的节点。其余代码保持不变，与 :ref:`mini_cn-batch training <guide-minibatch>` 类似，
+3)拆分训练集，并计算本地进程的节点。其余代码保持不变，与 :ref:`mini_cn-batch training <guide_cn-minibatch>` 类似，
 包括：创建采样器，模型定义，模型训练的循环。
 .. code:: python

--- a/docs/source/guide_cn/message-edge.rst
+++ b/docs/source/guide_cn/message-edge.rst
@@ -18,8 +18,9 @@ DGL的处理方法是：
    import dgl.function as fn
-    graph.edata['a'] = affinity
+    # 假定eweight是一个形状为(E, *)的张量，E是边的数量。
+    graph.edata['a'] = eweight
    graph.update_all(fn.u_mul_e('ft', 'a', 'm'),
                     fn.sum('m', 'ft'))
-在以上代码中，affinity被用作边的权重。边权重通常是一个标量。
+在以上代码中，eweight被用作边的权重。边权重通常是一个标量。
\ No newline at end of file
--- a/docs/source/guide_cn/training-node.rst
+++ b/docs/source/guide_cn/training-node.rst
@@ -134,8 +134,7 @@ DGL提供了一些内置的图卷积模块，可以完成一轮消息传递计
 .. code:: python
    # Define a Heterograph Conv model
-    import dgl.nn as dglnn
    class RGCN(nn.Module):
        def __init__(self, in_feats, hid_feats, out_feats, rel_names):
            super().__init__()

--- a/examples/README.md
+++ b/examples/README.md
@@ -48,6 +48,36 @@ The folder contains example implementations of selected research papers related
 | [Neural Graph Collaborative Filtering](#ngcf) |             | :heavy_check_mark: |                     |                   |                   |
 | [Graph Cross Networks with Vertex Infomax Pooling](#gxn)                                   |                     |                                  | :heavy_check_mark:        |                    |                    |
 | [Towards Deeper Graph Neural Networks](#dagnn) | :heavy_check_mark:  |                                  |                           |                    |                    |
+| [The PageRank Citation Ranking: Bringing Order to the Web](#pagerank) |  |                                  |                           |                    |                    |
+| [Fast Suboptimal Algorithms for the Computation of Graph Edit Distance](#beam) |  |                                  |                           |                    |                    |
+| [Speeding Up Graph Edit Distance Computation with a Bipartite Heuristic](#astar) |  |                                  |                           |                    |                    |
+| [A Three-Way Model for Collective Learning on Multi-Relational Data](#rescal) |  |                                  |                           |                    |                    |
+| [Speeding Up Graph Edit Distance Computation through Fast Bipartite Matching](#bipartite) |  |                                  |                           |                    |                    |
+| [Translating Embeddings for Modeling Multi-relational Data](#transe) |  |                                  |                           |                    |                    |
+| [A Hausdorff Heuristic for Efficient Computation of Graph Edit Distance](#hausdorff) |  |                                  |                           |                    |                    |
+| [Embedding Entities and Relations for Learning and Inference in Knowledge Bases](#distmul) |  |                                  |                           |                    |                    |
+| [Learning Entity and Relation Embeddings for Knowledge Graph Completion](#transr) |  |                                  |                           |                    |                    |
+| [Order Matters: Sequence to sequence for sets](#seq2seq) |                     |                                  | :heavy_check_mark:        |                    |                    |
+| [Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks](#treelstm) |  |                                  |                           |                    |                    |
+| [Complex Embeddings for Simple Link Prediction](#complex) |  |                                  |                           |                    |                    |
+| [Gated Graph Sequence Neural Networks](#ggnn) |  |                                  |                           |                    |                    |
+| [Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity](#acnn) |  |                                  |                           |                    |                    |
+| [Attention Is All You Need](#transformer) |  |                                  |                           |                    |                    |
+| [PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space](#pointnet++) |  |                                  |                           |                    |                    |
+| [PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation](#pointnet) |  |                                  |                           |                    |                    |
+| [Dynamic Routing Between Capsules](#capsule) |  |                                  |                           |                    |                    |
+| [An End-to-End Deep Learning Architecture for Graph Classification](#dgcnn) |                     |                                  | :heavy_check_mark:        |                    |                    |
+| [Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting](#stgcn) |  |                                  |                           |                    |                    |
+| [Recurrent Relational Networks](#rrn) |  |                                  |                           |                    |                    |
+| [Junction Tree Variational Autoencoder for Molecular Graph Generation](#jtvae) |  |                                  |                           |                    |                    |
+| [Learning Deep Generative Models of Graphs](#dgmg) |  |                                  |                           |                    |                    |
+| [RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space](#rotate) |  |                                  |                           |                    |                    |
+| [A graph-convolutional neural network model for the prediction of chemical reactivity](#wln) |  |                                  |                           |                    |                    |
+| [Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks](#settrans) |                     |                                  | :heavy_check_mark:        |                    |                    |
+| [Graphical Contrastive Losses for Scene Graph Parsing](#scenegraph) |  |                                  |                           |                    |                    |
+| [Dynamic Graph CNN for Learning on Point Clouds](#dgcnnpoint) |  |                                  |                           |                    |                    |
+| [Supervised Community Detection with Line Graph Neural Networks](#lgnn) |  |                                  |                           |                    |                    |
+| [Text Generation from Knowledge Graphs with Graph Transformers](#graphwriter) |  |                                  |                           |                    |                    |
 ## 2020

--- a/examples/pytorch/ogb/ogbn-products/gat/main.py
+++ b/examples/pytorch/ogb/ogbn-products/gat/main.py
@@ -245,5 +245,5 @@ if __name__ == '__main__':
    # Run 10 times
    test_accs = []
    for i in range(10):
-        test_accs.append(run(args, device, data))
+        test_accs.append(run(args, device, data).cpu().numpy())
        print('Average test accuracy:', np.mean(test_accs), '±', np.std(test_accs))
--- a/examples/pytorch/ogb/ogbn-products/graphsage/main.py
+++ b/examples/pytorch/ogb/ogbn-products/graphsage/main.py
@@ -233,5 +233,5 @@ if __name__ == '__main__':
    # Run 10 times
    test_accs = []
    for i in range(10):
-        test_accs.append(run(args, device, data))
+        test_accs.append(run(args, device, data).cpu().numpy())
        print('Average test accuracy:', np.mean(test_accs), '±', np.std(test_accs))
--- a/examples/pytorch/ogb/sign/sign.py
+++ b/examples/pytorch/ogb/sign/sign.py
 import argparse
-import os
 import time
 import numpy as np
 import torch
 import torch.nn as nn
-import torch.nn.functional as F
 import dgl
 import dgl.function as fn
 from dataset import load_dataset
@@ -139,7 +137,6 @@ def train(model, feats, labels, loss_fcn, optimizer, train_loader):
 def test(model, feats, labels, test_loader, evaluator,
         train_nid, val_nid, test_nid):
    model.eval()
-    num_nodes = labels.shape[0]
    device = labels.device
    preds = []
    for batch in test_loader:

--- a/examples/tensorflow/sgc/README.md
+++ b/examples/tensorflow/sgc/README.md
@@ -8,6 +8,9 @@
 Note: TensorFlow uses a different implementation of weight decay in AdamW to PyTorch. This results in differences in performance. You can see this by manually adding the L2 of the weights to the loss like [this](https://github.com/dmlc/dgl/blob/d696558b0bbcb60f1c4cf68dc93cd22c1077ce06/examples/tensorflow/gcn/train.py#L99) for comparison.
 ## Requirements
+This example is tested with TensorFlow 2.3.0.
 ```bash
 $ pip install dgl tensorflow tensorflow_addons
 ```

--- a/new-tutorial/3_message_passing.py
+++ b/new-tutorial/3_message_passing.py
@@ -76,7 +76,7 @@ import torch.nn.functional as F
 # opposite direction.
 # 
 # Although DGL has builtin support of GraphSAGE via
-# :class:```dgl.nn.SAGEConv`` <dgl.nn.pytorch.SAGEConv>`,
+# :class:`dgl.nn.SAGEConv <dgl.nn.pytorch.SAGEConv>`,
 # here is how you can implement GraphSAGE convolution in DGL by your own.
 # 
@@ -118,7 +118,7 @@ class SAGEConv(nn.Module):
 ######################################################################
 # The central piece in this code is the
-# :func:```g.update_all`` <dgl.DGLGraph.update_all>`
+# :func:`g.update_all <dgl.DGLGraph.update_all>`
 # function, which gathers and averages the neighbor features. There are
 # three concepts here:
 #

--- a/new-tutorial/5_graph_classification.py
+++ b/new-tutorial/5_graph_classification.py
@@ -64,7 +64,7 @@ print('Number of graph categories:', dataset.gclasses)
 # dataset. In DGL, you can use the ``GraphDataLoader``.
 # 
 # You can also use various dataset samplers provided in
-# ```torch.utils.data.sampler`` <https://pytorch.org/docs/stable/data.html#data-loading-order-and-sampler>`__.
+# `torch.utils.data.sampler <https://pytorch.org/docs/stable/data.html#data-loading-order-and-sampler>`__.
 # For example, this tutorial creates a training ``GraphDataLoader`` and
 # test ``GraphDataLoader``, using ``SubsetRandomSampler`` to tell PyTorch
 # to sample from only a subset of the dataset.

--- a/python/dgl/nn/pytorch/conv/graphconv.py
+++ b/python/dgl/nn/pytorch/conv/graphconv.py
@@ -250,7 +250,7 @@ class GraphConv(nn.Module):
    >>> # Case 2: Unidirectional bipartite graph
    >>> u = [0, 1, 0, 0, 1]
    >>> v = [0, 1, 2, 3, 2]
-    >>> g = dgl.bipartite((u, v))
+    >>> g = dgl.heterograph({('_U', '_E', '_V') : (u, v)})
    >>> u_fea = th.rand(2, 5)
    >>> v_fea = th.rand(4, 5)
    >>> conv = GraphConv(5, 2, norm='both', weight=True, bias=True)

--- a/python/dgl/nn/pytorch/conv/sageconv.py
+++ b/python/dgl/nn/pytorch/conv/sageconv.py
@@ -28,11 +28,11 @@ class SAGEConv(nn.Module):
    If a weight tensor on each edge is provided, the aggregation becomes:
    .. math::
-        h_{\mathcal{N}(i)}^{(l+1)} &= \mathrm{aggregate}
+        h_{\mathcal{N}(i)}^{(l+1)} = \mathrm{aggregate}
        \left(\{e_{ji} h_{j}^{l}, \forall j \in \mathcal{N}(i) \}\right)
    where :math:`e_{ji}` is the scalar weight on the edge from node :math:`j` to node :math:`i`.
-    Please make sure that `e_{ji}` is broadcastable with `h_j^{l}`.
+    Please make sure that :math:`e_{ji}` is broadcastable with :math:`h_j^{l}`.
    Parameters
    ----------

--- a/python/dgl/ops/edge_softmax.py
+++ b/python/dgl/ops/edge_softmax.py
@@ -23,21 +23,21 @@ def edge_softmax(graph, logits, eids=ALL, norm_by='dst'):
    By default edge softmax is normalized by destination nodes(i.e. :math:`ij`
    are incoming edges of `i` in the formula above). We also support edge
    softmax normalized by source nodes(i.e. :math:`ij` are outgoing edges of
-    `i` in the formula). The previous case correspond to softmax in GAT and
+    `i` in the formula). The former case corresponds to softmax in GAT and
-    Transformer, and the later case correspond to softmax in Capsule network.
+    Transformer, and the latter case corresponds to softmax in Capsule network.
    An example of using edge softmax is in
    `Graph Attention Network <https://arxiv.org/pdf/1710.10903.pdf>`__ where
-    the attention weights are computed with such an edge softmax operation.
+    the attention weights are computed with this operation.
    Parameters
    ----------
    graph : DGLGraph
-        The graph to perform edge softmax on.
+        The graph over which edge softmax will be performed.
    logits : torch.Tensor
        The input edge feature.
    eids : torch.Tensor or ALL, optional
-        A tensor of edge index on which to apply edge softmax. If ALL, apply edge
+        The IDs of the edges to apply edge softmax. If ALL, it will apply edge
-        softmax on all edges in the graph. Default: ALL.
+        softmax to all edges in the graph. Default: ALL.
    norm_by : str, could be `src` or `dst`
        Normalized by source nodes or destination nodes. Default: `dst`.
@@ -58,15 +58,13 @@ def edge_softmax(graph, logits, eids=ALL, norm_by='dst'):
    --------
    The following example uses PyTorch backend.
-    >>> from dgl.ops import edge_softmax
+    >>> from dgl.nn.functional import edge_softmax
    >>> import dgl
    >>> import torch as th
-    Create a :code:`DGLGraph` object g and initialize its edge features.
+    Create a :code:`DGLGraph` object and initialize its edge features.
-    >>> g = dgl.DGLGraph()
+    >>> g = dgl.graph((th.tensor([0, 0, 0, 1, 1, 2]), th.tensor([0, 1, 2, 1, 2, 2])))
-    >>> g.add_nodes(3)
-    >>> g.add_edges([0, 0, 0, 1, 1, 2], [0, 1, 2, 1, 2, 2])
    >>> edata = th.ones(6, 1).float()
    >>> edata
        tensor([[1.],
@@ -76,7 +74,7 @@ def edge_softmax(graph, logits, eids=ALL, norm_by='dst'):
                [1.],
                [1.]])
-    Apply edge softmax on g:
+    Apply edge softmax over g:
    >>> edge_softmax(g, edata)
        tensor([[1.0000],
@@ -86,7 +84,7 @@ def edge_softmax(graph, logits, eids=ALL, norm_by='dst'):
                [0.3333],
                [0.3333]])
-    Apply edge softmax on g normalized by source nodes:
+    Apply edge softmax over g normalized by source nodes:
    >>> edge_softmax(g, edata, norm_by='src')
        tensor([[0.3333],
@@ -96,7 +94,7 @@ def edge_softmax(graph, logits, eids=ALL, norm_by='dst'):
                [0.5000],
                [1.0000]])
-    Apply edge softmax on first 4 edges of g:
+    Apply edge softmax to first 4 edges of g:
    >>> edge_softmax(g, edata[:4], th.Tensor([0,1,2,3]))
        tensor([[1.0000],