[Bug fix] Various fix from bug bash (#3133)

* Update * Update * Update dependencies * Update * Update * Fix ogbn-products gat * Update * Update * Reformat * Fix typo in node2vec_random_walk * Specify file encoding * Working for 6.7 * Update * Fix subgraph * Fix doc for sample_neighbors_biased * Fix hyperlink * Add example for udf cross reducer * Fix * Add example for slice_batch * Replace dgl.bipartite * Fix GATConv * Fix math rendering * Fix doc Co-authored-by: Ubuntu <ubuntu@ip-172-31-28-17.us-west-2.compute.internal> Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-22-156.us-west-2.compute.internal>

[Bug fix] Various fix from bug bash (#3133)
* Update * Update * Update dependencies * Update * Update * Fix ogbn-products gat * Update * Update * Reformat * Fix typo in node2vec_random_walk * Specify file encoding * Working for 6.7 * Update * Fix subgraph * Fix doc for sample_neighbors_biased * Fix hyperlink * Add example for udf cross reducer * Fix * Add example for slice_batch * Replace dgl.bipartite * Fix GATConv * Fix math rendering * Fix doc Co-authored-by: Ubuntu <ubuntu@ip-172-31-28-17.us-west-2.compute.internal> Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-22-156.us-west-2.compute.internal>
3f6f6941 · Mufei Li · GitHub · 5f2639e2 · 3f6f6941 · 3f6f6941
Unverified Commit 3f6f6941 authored Jul 15, 2021 by Mufei Li Committed by GitHub Jul 15, 2021
20 changed files
--- a/docs/source/guide/minibatch-gpu-sampling.rst
+++ b/docs/source/guide/minibatch-gpu-sampling.rst
@@ -49,7 +49,7 @@ the same as the other user guides and tutorials.
       
 GPU-based neighbor sampling also works for custom neighborhood samplers as long as
 (1) your sampler is subclassed from :class:`~dgl.dataloading.BlockSampler`, and (2)
-your code in the sampler entirely works on GPU.
+your sampler entirely works on GPU.

 .. note::


--- a/docs/source/guide/minibatch-node.rst
+++ b/docs/source/guide/minibatch-node.rst
@@ -153,7 +153,7 @@ of MFGs, we:
   training.
   
   If the features are stored in ``g.ndata``, then the labels
-   can be loaded by accessing the features in ``blocks[-1].srcdata``,
+   can be loaded by accessing the features in ``blocks[-1].dstdata``,
   the features of destination nodes of the last MFG, which is identical to
   the nodes we wish to compute the final representation.


--- a/docs/source/guide_cn/minibatch-node.rst
+++ b/docs/source/guide_cn/minibatch-node.rst
@@ -120,7 +120,7 @@ DGL提供了几个邻居采样类，这些类会生成需计算的节点在每
 3. 将与输出节点相对应的节点标签加载到GPU上。同样，节点标签可以存储在内存或外部存储器中。
   再次提醒下，用户只需要加载输出节点的标签，而不是像整图训练那样加载所有节点的标签。

-   如果特征存储在 ``g.ndata`` 中，则可以通过访问 ``blocks[-1].srcdata`` 中的特征来加载标签，
+   如果特征存储在 ``g.ndata`` 中，则可以通过访问 ``blocks[-1].dstdata`` 中的特征来加载标签，
   它是最后一个块的输出节点的特征，这些节点与用户希望计算最终表示的节点相同。

 4. 计算损失并反向传播。

--- a/examples/pytorch/TAHIN/main.py
+++ b/examples/pytorch/TAHIN/main.py
@@ -51,7 +51,7 @@ def main(args):
            tr_loss.backward()
            optimizer.step()

-        train_loss = np.sum(train_loss)
+        train_loss = torch.stack(train_loss).sum().cpu().item()

        model.eval()
        with torch.no_grad():
@@ -67,7 +67,7 @@ def main(args):
                validate_loss.append(val_loss)
                validate_acc.append(val_acc)

-            validate_loss = np.sum(validate_loss)
+            validate_loss = torch.stack(validate_loss).sum().cpu().item()
            validate_acc = np.mean(validate_acc)
        
            #validate
@@ -112,7 +112,7 @@ def main(args):
            test_f1.append(f1)
            test_logloss.append(log_loss)

-        test_loss = np.sum(test_loss)
+        test_loss = torch.stack(test_loss).sum().cpu().item()
        test_acc = np.mean(test_acc)
        test_auc = np.mean(test_auc)
        test_f1 = np.mean(test_f1)

--- a/examples/pytorch/TAHIN/readme.md
+++ b/examples/pytorch/TAHIN/readme.md
@@ -10,6 +10,7 @@ Dependencies
 ----------------------
 - pytorch 1.7.1
 - dgl 0.6.0
+- sklearn 0.22.1

 Datasets
 ---------------------------------------

--- a/examples/pytorch/gcmc/data.py
+++ b/examples/pytorch/gcmc/data.py
@@ -466,11 +466,11 @@ class MovieLens(object):
            file_path = os.path.join(self._dir, 'u.item')
            self.movie_info = pd.read_csv(file_path, sep='|', header=None,
                                          names=['id', 'title', 'release_date', 'video_release_date', 'url'] + GENRES,
-                                          engine='python')
+                                          encoding='iso-8859-1')
        elif self._name == 'ml-1m' or self._name == 'ml-10m':
            file_path = os.path.join(self._dir, 'movies.dat')
            movie_info = pd.read_csv(file_path, sep='::', header=None,
-                                     names=['id', 'title', 'genres'], engine='python')
+                                     names=['id', 'title', 'genres'], encoding='iso-8859-1')
            genre_map = {ele: i for i, ele in enumerate(GENRES)}
            genre_map['Children\'s'] = genre_map['Children']
            genre_map['Childrens'] = genre_map['Children']

--- a/examples/pytorch/grace/README.md
+++ b/examples/pytorch/grace/README.md
@@ -12,6 +12,7 @@ This example was implemented by [Hengrui Zhang](https://github.com/hengruizhang9
 - Python 3.7
 - PyTorch 1.7.1
 - dgl 0.6.0
+- sklearn 0.22.1

 ## Datasets


--- a/examples/pytorch/graphsage/README.md
+++ b/examples/pytorch/graphsage/README.md
@@ -9,9 +9,9 @@ Requirements
 ------------
 - requests

-``bash
+```bash
 pip install requests
-``
+```


 Results
@@ -34,10 +34,14 @@ Train w/ mini-batch sampling (on the Reddit dataset)
 ```bash
 python3 train_sampling.py --num-epochs 30       # neighbor sampling
 python3 train_sampling.py --num-epochs 30 --inductive  # inductive learning with neighbor sampling
-python3 train_sampling_multi_gpu.py --num-epochs 30    # neighbor sampling with multi GPU
-python3 train_sampling_multi_gpu.py --num-epochs 30 --inductive  # inductive learning with neighbor sampling, multi GPU
 python3 train_cv.py --num-epochs 30             # control variate sampling
-python3 train_cv_multi_gpu.py --num-epochs 30   # control variate sampling with multi GPU
+```
+
+For multi-gpu training
+```bash
+python3 train_sampling_multi_gpu.py --num-epochs 30 --gpu 0,1,...    # neighbor sampling
+python3 train_sampling_multi_gpu.py --num-epochs 30 --inductive --gpu 0,1,...  # inductive learning
+python3 train_cv_multi_gpu.py --num-epochs 30 --gpu 0,1,...   # control variate sampling
 ```

 Accuracy:

--- a/examples/pytorch/graphsaint/README.md
+++ b/examples/pytorch/graphsaint/README.md
@@ -24,6 +24,8 @@ All datasets used are provided by Author's [code](https://github.com/GraphSAINT/
 | PPI | 14,755 | 225,270 | 15 | 50 | 121(m) | 0.66/0.12/0.22 |
 | Flickr | 89,250 | 899,756 | 10 | 500 | 7(s) | 0.50/0.25/0.25 |

+Note that the PPI dataset here is different from DGL's built-in variant.
+
 ## Minibatch training

 Run with following:

--- a/examples/pytorch/han/README.md
+++ b/examples/pytorch/han/README.md
@@ -11,6 +11,8 @@ The authors' implementation can be found [here](https://github.com/Jhy1993/HAN).
 [here](https://github.com/Jhy1993/HAN/tree/master/data/acm).  The dataset is noisy
 because there are same author occurring multiple times as different nodes.

+For sampling-based training, `python train_sampling.py`
+
 ## Performance

 Reference performance numbers for the ACM dataset:

--- a/examples/pytorch/ogb/ogbn-products/gat/gat.py
+++ b/examples/pytorch/ogb/ogbn-products/gat/gat.py
@@ -92,7 +92,7 @@ def gen_model(args):
        input_drop=args.input_drop,
        attn_drop=args.attn_dropout,
        edge_drop=args.edge_drop,
-        use_attn_dst=not args.use_attn_dst,
+        use_attn_dst=not args.no_attn_dst,
        allow_zero_in_degree=True,
        residual=False,
    )

--- a/python/dgl/batch.py
+++ b/python/dgl/batch.py
@@ -433,6 +433,28 @@ def slice_batch(g, gid, store_ids=False):
    -------
    DGLGraph
        Retrieved graph.
+
+    Examples
+    --------
+
+    The following example uses PyTorch backend.
+
+    >>> import dgl
+    >>> import torch
+
+    Create a batched graph.
+
+    >>> g1 = dgl.graph(([0, 1], [2, 3]))
+    >>> g2 = dgl.graph(([1], [2]))
+    >>> bg = dgl.batch([g1, g2])
+
+    Get the second component graph.
+
+    >>> g = dgl.slice_batch(bg, 1)
+    >>> print(g)
+    Graph(num_nodes=3, num_edges=1,
+          ndata_schemes={}
+          edata_schemes={})
    """
    start_nid = []
    num_nodes = []

--- a/python/dgl/heterograph.py
+++ b/python/dgl/heterograph.py
@@ -4951,6 +4951,18 @@ class DGLHeteroGraph(object):
        >>> g.nodes['user'].data['h']
        tensor([[0.],
                [4.]])
+
+        User-defined cross reducer equivalent to "sum".
+
+        >>> def cross_sum(flist):
+        ...     return torch.sum(torch.stack(flist, dim=0), dim=0) if len(flist) > 1 else flist[0]
+
+        Use the user-defined cross reducer.
+
+        >>> g.multi_update_all(
+        ...     {'follows': (fn.copy_src('h', 'm'), fn.sum('m', 'h')),
+        ...      'attracts': (fn.copy_src('h', 'm'), fn.sum('m', 'h'))},
+        ... cross_sum)
        """
        all_out = defaultdict(list)
        merge_order = defaultdict(list)

--- a/python/dgl/nn/mxnet/conv/gatconv.py
+++ b/python/dgl/nn/mxnet/conv/gatconv.py
@@ -117,7 +117,7 @@ class GATConv(nn.Block):
    >>> # Case 2: Unidirectional bipartite graph
    >>> u = [0, 1, 0, 0, 1]
    >>> v = [0, 1, 2, 3, 2]
-    >>> g = dgl.bipartite((u, v))
+    >>> g = dgl.heterograph({('A', 'r', 'B'): (u, v)})
    >>> u_feat = mx.nd.random.randn(2, 5)
    >>> v_feat = mx.nd.random.randn(4, 10)
    >>> gatconv = GATConv((5,10), 2, 3)

--- a/python/dgl/nn/pytorch/conv/__init__.py
+++ b/python/dgl/nn/pytorch/conv/__init__.py
@@ -21,7 +21,7 @@ from .densesageconv import DenseSAGEConv
 from .atomicconv import AtomicConv
 from .cfconv import CFConv
 from .dotgatconv import DotGatConv
-from .twirlsconv import TWIRLSConv, UnfoldingAndAttention as TWIRLSUnfoldingAndAttention
+from .twirlsconv import TWIRLSConv, TWIRLSUnfoldingAndAttention
 from .gcn2conv import GCN2Conv

 __all__ = ['GraphConv', 'EdgeWeightNorm', 'GATConv', 'TAGConv', 'RelGraphConv', 'SAGEConv',

--- a/python/dgl/nn/pytorch/conv/gatconv.py
+++ b/python/dgl/nn/pytorch/conv/gatconv.py
@@ -115,7 +115,7 @@ class GATConv(nn.Module):
    >>> # Case 2: Unidirectional bipartite graph
    >>> u = [0, 1, 0, 0, 1]
    >>> v = [0, 1, 2, 3, 2]
-    >>> g = dgl.bipartite((u, v))
+    >>> g = dgl.heterograph({('A', 'r', 'B'): (u, v)})
    >>> u_feat = th.tensor(np.random.rand(2, 5).astype(np.float32))
    >>> v_feat = th.tensor(np.random.rand(4, 10).astype(np.float32))
    >>> gatconv = GATConv((5,10), 2, 3)

--- a/python/dgl/nn/pytorch/conv/gcn2conv.py
+++ b/python/dgl/nn/pytorch/conv/gcn2conv.py
@@ -20,7 +20,9 @@ class GCN2Conv(nn.Module):
    and Identity mapping (GCNII) was introduced in `"Simple and Deep Graph Convolutional
    Networks" <https://arxiv.org/abs/2007.02133>`_ paper.
    It is mathematically is defined as follows:
+
    .. math::
+
        \mathbf{h}^{(l+1)} =\left( (1 - \alpha)(\mathbf{D}^{-1/2} \mathbf{\hat{A}}
        \mathbf{D}^{-1/2})\mathbf{h}^{(l)} + \alpha {\mathbf{h}^{(0)}} \right)
        \left( (1 - \beta_l) \mathbf{I} + \beta_l \mathbf{W} \right)

--- a/python/dgl/nn/pytorch/conv/twirlsconv.py
+++ b/python/dgl/nn/pytorch/conv/twirlsconv.py
@@ -444,7 +444,7 @@ def D_power_bias_X(graph, X, power, coeff, bias):
    return Y


-class UnfoldingAndAttention(nn.Module):
+class TWIRLSUnfoldingAndAttention(nn.Module):
    r"""

    Description

--- a/python/dgl/nn/tensorflow/conv/gatconv.py
+++ b/python/dgl/nn/tensorflow/conv/gatconv.py
@@ -117,7 +117,7 @@ class GATConv(layers.Layer):
    >>> # Case 2: Unidirectional bipartite graph
    >>> u = [0, 1, 0, 0, 1]
    >>> v = [0, 1, 2, 3, 2]
-    >>> g = dgl.bipartite((u, v))
+    >>> g = dgl.heterograph({('A', 'r', 'B'): (u, v)})
    >>> with tf.device("CPU:0"):
    >>>     u_feat = tf.convert_to_tensor(np.random.rand(2, 5))
    >>>     v_feat = tf.convert_to_tensor(np.random.rand(4, 10))

--- a/python/dgl/sampling/neighbor.py
+++ b/python/dgl/sampling/neighbor.py
@@ -302,8 +302,9 @@ def sample_neighbors_biased(g, nodes, fanout, bias, edge_dir='in',
            [0, 1, 2]])

    Set the probability of each tag:
+
    >>> bias = torch.tensor([1.0, 0.001])
-        # node 2 is almost impossible to be sampled because it has tag 1.
+    >>> # node 2 is almost impossible to be sampled because it has tag 1.

    To sample one out bound edge for node 0 and node 2: