[Doc] Rename block to message flow graph (#2702)

* rename block to mfg * revert * rename

[Doc] Rename block to message flow graph (#2702)
* rename block to mfg * revert * rename
99751d49 · Quan (Andy) Gan · GitHub · 491d908b · 99751d49 · 99751d49
Unverified Commit 99751d49 authored Feb 25, 2021 by Quan (Andy) Gan Committed by GitHub Feb 25, 2021
16 changed files
--- a/docs/source/api/python/dgl.dataloading.rst
+++ b/docs/source/api/python/dgl.dataloading.rst
@@ -39,6 +39,19 @@ the ``sample_blocks`` methods.
 .. autoclass:: MultiLayerFullNeighborSampler
    :show-inheritance:
+.. _api-dataloading-collators:
+Collators
+---------
+.. currentmodule:: dgl.dataloading
+Collators are platform-agnostic classes that generates the mini-batches
+given the graphs and indices to sample from.
+.. autoclass:: NodeCollator
+.. autoclass:: EdgeCollator
+.. autoclass:: GraphCollator
 .. _api-dataloading-negative-sampling:
 Negative Samplers for Link Prediction

--- a/docs/source/guide/minibatch-custom-sampler.rst
+++ b/docs/source/guide/minibatch-custom-sampler.rst
@@ -148,40 +148,47 @@ Since the number of nodes
 for input and output is different, we need to perform message passing on
 a small, bipartite-structured graph instead. We call such a
 bipartite-structured graph that only contains the necessary input nodes
-and output nodes a *block*. The following figure shows the block of the
+(referred as *source* nodes) and output nodes (referred as *destination* nodes)
-second GNN layer for node 8.
+of a *message flow graph* (MFG).
+The following figure shows the MFG of the second GNN layer for node 8.
 .. figure:: https://data.dgl.ai/asset/image/guide_6_4_4.png
   :alt: Imgur
+.. note::
+   See the :doc:`Stochastic Training Tutorial
+   <tutorials/large/L0_neighbor_sampling_overview>` for the concept of
+   message flow graph.
-Note that the output nodes also appear in the input nodes. The reason is
+Note that the destination nodes also appear in the source nodes. The reason is
-that representations of output nodes from the previous layer are needed
+that representations of destination nodes from the previous layer are needed
 for feature combination after message passing (i.e. :math:`\phi^{(2)}`).
 DGL provides :func:`dgl.to_block` to convert any frontier
-to a block where the first argument specifies the frontier and the
+to a MFG where the first argument specifies the frontier and the
-second argument specifies the output nodes. For instance, the frontier
+second argument specifies the destination nodes. For instance, the frontier
-above can be converted to a block with output node 8 with the code as
+above can be converted to a MFG with destination node 8 with the code as
 follows.
 .. code:: python
-    output_nodes = torch.LongTensor([8])
+    dst_nodes = torch.LongTensor([8])
-    block = dgl.to_block(frontier, output_nodes)
+    block = dgl.to_block(frontier, dst_nodes)
-To find the number of input nodes and output nodes of a given node type,
+To find the number of source nodes and destination nodes of a given node type,
 one can use :meth:`dgl.DGLHeteroGraph.number_of_src_nodes` and
 :meth:`dgl.DGLHeteroGraph.number_of_dst_nodes` methods.
 .. code:: python
-    num_input_nodes, num_output_nodes = block.number_of_src_nodes(), block.number_of_dst_nodes()
+    num_src_nodes, num_dst_nodes = block.number_of_src_nodes(), block.number_of_dst_nodes()
-    print(num_input_nodes, num_output_nodes)
+    print(num_src_nodes, num_dst_nodes)
-The block’s input node features can be accessed via member
+The MFG’s source node features can be accessed via member
 :attr:`dgl.DGLHeteroGraph.srcdata` and :attr:`dgl.DGLHeteroGraph.srcnodes`, and
-its output node features can be accessed via member
+its destination node features can be accessed via member
 :attr:`dgl.DGLHeteroGraph.dstdata` and :attr:`dgl.DGLHeteroGraph.dstnodes`. The
 syntax of ``srcdata``/``dstdata`` and ``srcnodes``/``dstnodes`` are
 identical to :attr:`dgl.DGLHeteroGraph.ndata` and
@@ -189,46 +196,36 @@ identical to :attr:`dgl.DGLHeteroGraph.ndata` and
 .. code:: python
-    block.srcdata['h'] = torch.randn(num_input_nodes, 5)
+    block.srcdata['h'] = torch.randn(num_src_nodes, 5)
-    block.dstdata['h'] = torch.randn(num_output_nodes, 5)
+    block.dstdata['h'] = torch.randn(num_dst_nodes, 5)
-If a block is converted from a frontier, which is in turn converted from
+If a MFG is converted from a frontier, which is in turn converted from
-a graph, one can directly read the feature of the block’s input and
+a graph, one can directly read the feature of the MFG’s source and
-output nodes via
+destination nodes via
 .. code:: python
    print(block.srcdata['x'])
    print(block.dstdata['y'])
-.. raw:: html
+.. note::
-   <div class="alert alert-info">
-::
-The original node IDs of the input nodes and output nodes in the block
-can be found as the feature ``dgl.NID``, and the mapping from the
-block’s edge IDs to the input frontier’s edge IDs can be found as the
-feature ``dgl.EID``.
-.. raw:: html
-   </div>
-**Output Nodes**
+   The original node IDs of the source nodes and destination nodes in the MFG
+   can be found as the feature ``dgl.NID``, and the mapping from the
+   MFG’s edge IDs to the input frontier’s edge IDs can be found as the
+   feature ``dgl.EID``.
-DGL ensures that the output nodes of a block will always appear in the
+DGL ensures that the destination nodes of a MFG will always appear in the
-input nodes. The output nodes will always index firstly in the input
+source nodes. The destination nodes will always index firstly in the source
 nodes.
 .. code:: python
-    input_nodes = block.srcdata[dgl.NID]
+    src_nodes = block.srcdata[dgl.NID]
-    output_nodes = block.dstdata[dgl.NID]
+    dst_nodes = block.dstdata[dgl.NID]
-    assert torch.equal(input_nodes[:len(output_nodes)], output_nodes)
+    assert torch.equal(src_nodes[:len(dst_nodes)], dst_nodes)
-As a result, the output nodes must cover all nodes that are the
+As a result, the destination nodes must cover all nodes that are the
 destination of an edge in the frontier.
 For example, consider the following frontier
@@ -240,15 +237,15 @@ For example, consider the following frontier
 where the red and green nodes (i.e. node 4, 5, 7, 8, and 11) are all
 nodes that is a destination of an edge. Then the following code will
-raise an error because the output nodes did not cover all those nodes.
+raise an error because the destination nodes did not cover all those nodes.
 .. code:: python
    dgl.to_block(frontier2, torch.LongTensor([4, 5]))   # ERROR
-However, the output nodes can have more nodes than above. In this case,
+However, the destination nodes can have more nodes than above. In this case,
 we will have isolated nodes that do not have any edge connecting to it.
-The isolated nodes will be included in both input nodes and output
+The isolated nodes will be included in both source nodes and destination
 nodes.
 .. code:: python
@@ -261,7 +258,7 @@ nodes.
 Heterogeneous Graphs
 ^^^^^^^^^^^^^^^^^^^^
-Blocks also work on heterogeneous graphs. Let’s say that we have the
+MFGs also work on heterogeneous graphs. Let’s say that we have the
 following frontier:
 .. code:: python
@@ -272,20 +269,20 @@ following frontier:
        ('game', 'played-by', 'user'): ([2], [6])
    }, num_nodes_dict={'user': 10, 'game': 10})
-One can also create a block with output nodes User #3, #6, and #8, as
+One can also create a MFG with destination nodes User #3, #6, and #8, as
 well as Game #2 and #6.
 .. code:: python
-    hetero_block = dgl.to_block(hetero_frontier, {'user': [3, 6, 8], 'block': [2, 6]})
+    hetero_block = dgl.to_block(hetero_frontier, {'user': [3, 6, 8], 'game': [2, 6]})
-One can also get the input nodes and output nodes by type:
+One can also get the source nodes and destination nodes by type:
 .. code:: python
-    # input users and games
+    # source users and games
    print(hetero_block.srcnodes['user'].data[dgl.NID], hetero_block.srcnodes['game'].data[dgl.NID])
-    # output users and games
+    # destination users and games
    print(hetero_block.dstnodes['user'].data[dgl.NID], hetero_block.dstnodes['game'].data[dgl.NID])
@@ -307,10 +304,10 @@ see what :class:`~dgl.dataloading.dataloader.BlockSampler`, the parent class of
 :class:`~dgl.dataloading.neighbor.MultiLayerFullNeighborSampler`, is.
 :class:`~dgl.dataloading.dataloader.BlockSampler` is responsible for
-generating the list of blocks starting from the last layer, with method
+generating the list of MFGs starting from the last layer, with method
 :meth:`~dgl.dataloading.dataloader.BlockSampler.sample_blocks`. The default implementation of
 ``sample_blocks`` is to iterate backwards, generating the frontiers and
-converting them to blocks.
+converting them to MFGs.
 Therefore, for neighborhood sampling, **you only need to implement
 the**\ :meth:`~dgl.dataloading.dataloader.BlockSampler.sample_frontier`\ **method**. Given which
@@ -386,7 +383,7 @@ nodes with a probability, one can simply define the sampler as follows:
            return self.n_layers
 After implementing your sampler, you can create a data loader that takes
-in your sampler and it will keep generating lists of blocks while
+in your sampler and it will keep generating lists of MFGs while
 iterating over the seed nodes as usual.
 .. code:: python

--- a/docs/source/guide/minibatch-edge.rst
+++ b/docs/source/guide/minibatch-edge.rst
@@ -22,11 +22,11 @@ To use the neighborhood sampler provided by DGL for edge classification,
 one need to instead combine it with
 :class:`~dgl.dataloading.pytorch.EdgeDataLoader`, which iterates
 over a set of edges in minibatches, yielding the subgraph induced by the
-edge minibatch and ``blocks`` to be consumed by the module below.
+edge minibatch and *message flow graphs* (MFGs) to be consumed by the module below.
 For example, the following code creates a PyTorch DataLoader that
 iterates over the training edge ID array ``train_eids`` in batches,
-putting the list of generated blocks onto GPU.
+putting the list of generated MFGs onto GPU.
 .. code:: python
@@ -37,12 +37,18 @@ putting the list of generated blocks onto GPU.
        drop_last=False,
        num_workers=4)
-For a complete list of supported builtin samplers, please refer to the
+.. note::
-:ref:`neighborhood sampler API reference <api-dataloading-neighbor-sampling>`.
-If you wish to develop your own neighborhood sampler or you want a more
+   See the :doc:`Stochastic Training Tutorial
-detailed explanation of the concept of blocks, please refer to
+   <tutorials/large/L0_neighbor_sampling_overview>` for the concept of
-:ref:`guide-minibatch-customizing-neighborhood-sampler`.
+   message flow graph.
+   For a complete list of supported builtin samplers, please refer to the
+   :ref:`neighborhood sampler API reference <api-dataloading-neighbor-sampling>`.
+   If you wish to develop your own neighborhood sampler or you want a more
+   detailed explanation of the concept of MFGs, please refer to
+   :ref:`guide-minibatch-customizing-neighborhood-sampler`.
 Removing edges in the minibatch from the original graph for neighbor sampling
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -92,7 +98,7 @@ The edge classification model usually consists of two parts:
 The former part is exactly the same as
 :ref:`that from node classification <guide-minibatch-node-classification-model>`
 and we can simply reuse it. The input is still the list of
-blocks generated from a data loader provided by DGL, as well as the
+MFGs generated from a data loader provided by DGL, as well as the
 input features.
 .. code:: python
@@ -135,7 +141,7 @@ layer.
                edge_subgraph.apply_edges(self.apply_edges)
                return edge_subgraph.edata['score']
-The entire model will take the list of blocks and the edge subgraph
+The entire model will take the list of MFGs and the edge subgraph
 generated by the data loader, as well as the input node features as
 follows:
@@ -153,14 +159,14 @@ follows:
            return self.predictor(edge_subgraph, x)
 DGL ensures that that the nodes in the edge subgraph are the same as the
-output nodes of the last block in the generated list of blocks.
+output nodes of the last MFG in the generated list of MFGs.
 Training Loop
 ~~~~~~~~~~~~~
 The training loop is very similar to node classification. You can
 iterate over the dataloader and get a subgraph induced by the edges in
-the minibatch, as well as the list of blocks necessary for computing
+the minibatch, as well as the list of MFGs necessary for computing
 their incident node representations.
 .. code:: python

--- a/docs/source/guide/minibatch-link.rst
+++ b/docs/source/guide/minibatch-link.rst
@@ -109,9 +109,11 @@ When a negative sampler is provided, DGL’s data loader will generate
 three items per minibatch:
 -  A positive graph containing all the edges sampled in the minibatch.
 -  A negative graph containing all the non-existent edges generated by
   the negative sampler.
-  A list of blocks generated by the neighborhood sampler.
+-  A list of *message flow graphs* (MFGs) generated by the neighborhood sampler.
 So one can define the link prediction model as follows that takes in the
 three items as well as the input features.

--- a/docs/source/guide/minibatch-nn.rst
+++ b/docs/source/guide/minibatch-nn.rst
@@ -5,10 +5,16 @@
 :ref:`(中文版) <guide_cn-minibatch-custom-gnn-module>`
+.. note::
+   :doc:`This tutorial <tutorials/large/L4_message_passing>` has similar
+   content to this section for the homogeneous graph case.
 If you were familiar with how to write a custom GNN module for updating
 the entire graph for homogeneous or heterogeneous graphs (see
 :ref:`guide-nn`), the code for computing on
-blocks is similar, with the exception that the nodes are divided into
+MFGs is similar, with the exception that the nodes are divided into
 input nodes and output nodes.
 For example, consider the following custom graph convolution module
@@ -30,7 +36,7 @@ like.
                return self.W(torch.cat([g.ndata['h'], g.ndata['h_neigh']], 1))
 If you have a custom message passing NN module for the full graph, and
-you would like to make it work for blocks, you only need to rewrite the
+you would like to make it work for MFGs, you only need to rewrite the
 forward function as follows. Note that the corresponding statements from
 the full-graph implementation are commented; you can compare the
 original statements with the new statements.
@@ -62,7 +68,7 @@ original statements with the new statements.
                    [block.dstdata['h'], block.dstdata['h_neigh']], 1))
 In general, you need to do the following to make your NN module work for
-blocks.
+MFGs.
 -  Obtain the features for output nodes from the input features by
   slicing the first few rows. The number of rows can be obtained by
@@ -149,22 +155,22 @@ serve for input or output.
                return {ntype: g.dstnodes[ntype].data['h_dst']
                        for ntype in g.ntypes}
-Writing modules that work on homogeneous graphs, bipartite graphs, and blocks
+Writing modules that work on homogeneous graphs, bipartite graphs, and MFGs
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 All message passing modules in DGL work on homogeneous graphs,
 unidirectional bipartite graphs (that have two node types and one edge
-type), and a block with one edge type. Essentially, the input graph and
+type), and a MFG with one edge type. Essentially, the input graph and
 feature of a builtin DGL neural network module must satisfy either of
 the following cases.
 -  If the input feature is a pair of tensors, then the input graph must
   be unidirectional bipartite.
 -  If the input feature is a single tensor and the input graph is a
-   block, DGL will automatically set the feature on the output nodes as
+   MFG, DGL will automatically set the feature on the output nodes as
   the first few rows of the input node features.
 -  If the input feature must be a single tensor and the input graph is
-   not a block, then the input graph must be homogeneous.
+   not a MFG, then the input graph must be homogeneous.
 For example, the following is simplified from the PyTorch implementation
 of :class:`dgl.nn.pytorch.SAGEConv` (also available in MXNet and Tensorflow)
@@ -194,6 +200,6 @@ of :class:`dgl.nn.pytorch.SAGEConv` (also available in MXNet and Tensorflow)
                self.W(torch.cat([g.dstdata['h'], g.dstdata['h_neigh']], 1)))
 :ref:`guide-nn` also provides a walkthrough on :class:`dgl.nn.pytorch.SAGEConv`,
-which works on unidirectional bipartite graphs, homogeneous graphs, and blocks.
+which works on unidirectional bipartite graphs, homogeneous graphs, and MFGs.
--- a/docs/source/guide/minibatch-node.rst
+++ b/docs/source/guide/minibatch-node.rst
@@ -31,7 +31,7 @@ over a set of nodes in minibatches.
 For example, the following code creates a PyTorch DataLoader that
 iterates over the training node ID array ``train_nids`` in batches,
-putting the list of generated blocks onto GPU.
+putting the list of generated MFGs onto GPU.
 .. code:: python
@@ -51,7 +51,7 @@ putting the list of generated blocks onto GPU.
 Iterating over the DataLoader will yield a list of specially created
 graphs representing the computation dependencies on each layer. They are
-called *blocks* in DGL.
+called *message flow graphs* (MFGs) in DGL.
 .. code:: python
@@ -65,12 +65,19 @@ be computed as output, which node representations are needed as input,
 and how does representation from the input nodes propagate to the output
 nodes.
-For a complete list of supported builtin samplers, please refer to the
+.. note::
-:ref:`neighborhood sampler API reference <api-dataloading-neighbor-sampling>`.
+   See the :doc:`Stochastic Training Tutorial
+   <tutorials/large/L0_neighbor_sampling_overview>` for the concept of
+   message flow graph.
+   For a complete list of supported builtin samplers, please refer to the
+   :ref:`neighborhood sampler API reference <api-dataloading-neighbor-sampling>`.
+   If you wish to develop your own neighborhood sampler or you want a more
+   detailed explanation of the concept of MFGs, please refer to
+   :ref:`guide-minibatch-customizing-neighborhood-sampler`.
-If you wish to develop your own neighborhood sampler or you want a more
-detailed explanation of the concept of blocks, please refer to
-:ref:`guide-minibatch-customizing-neighborhood-sampler`.
 .. _guide-minibatch-node-classification-model:
@@ -114,7 +121,7 @@ The DGL ``GraphConv`` modules above accepts an element in ``blocks``
 generated by the data loader as an argument.
 :ref:`The API reference of each NN module <apinn>` will tell you
-whether it supports accepting a block as an argument.
+whether it supports accepting a MFG as an argument.
 If you wish to use your own message passing module, please refer to
 :ref:`guide-minibatch-custom-gnn-module`.
@@ -124,7 +131,7 @@ Training Loop
 The training loop simply consists of iterating over the dataset with the
 customized batching iterator. During each iteration that yields a list
-of blocks, we:
+of MFGs, we:
 1. Load the node features corresponding to the input nodes onto GPU. The
   node features can be stored in either memory or external storage.
@@ -133,10 +140,10 @@ of blocks, we:
   If the features are stored in ``g.ndata``, then the features can be loaded
   by accessing the features in ``blocks[0].srcdata``, the features of
-   input nodes of the first block, which is identical to all the
+   source nodes of the first MFG, which is identical to all the
   necessary nodes needed for computing the final representations.
-2. Feed the list of blocks and the input node features to the multilayer
+2. Feed the list of MFGs and the input node features to the multilayer
   GNN and get the outputs.
 3. Load the node labels corresponding to the output nodes onto GPU.
@@ -147,7 +154,7 @@ of blocks, we:
   If the features are stored in ``g.ndata``, then the labels
   can be loaded by accessing the features in ``blocks[-1].srcdata``,
-   the features of output nodes of the last block, which is identical to
+   the features of destination nodes of the last MFG, which is identical to
   the nodes we wish to compute the final representation.
 4. Compute the loss and backpropagate.

--- a/python/dgl/batch.py
+++ b/python/dgl/batch.py
@@ -166,7 +166,7 @@ def batch(graphs, ndata=ALL, edata=ALL, *,
        raise DGLError('Invalid argument edata: must be a string list but got {}.'.format(
            type(edata)))
    if any(g.is_block for g in graphs):
-        raise DGLError("Batching a block is not supported.")
+        raise DGLError("Batching a MFG is not supported.")
    relations = list(sorted(graphs[0].canonical_etypes))
    relation_ids = [graphs[0].get_etype_id(r) for r in relations]

--- a/python/dgl/convert.py
+++ b/python/dgl/convert.py
@@ -358,14 +358,14 @@ def heterograph(data_dict,
    return retg.to(device)
 def create_block(data_dict, num_src_nodes=None, num_dst_nodes=None, idtype=None, device=None):
-    """Create a :class:`DGLBlock` object.
+    """Create a message flow graph (MFG) as a :class:`DGLBlock` object.
    Parameters
    ----------
    data_dict : graph data
-        The dictionary data for constructing a block. The keys are in the form of
+        The dictionary data for constructing a MFG. The keys are in the form of
-        string triplets (src_type, edge_type, dst_type), specifying the input node type,
+        string triplets (src_type, edge_type, dst_type), specifying the source node type,
-        edge type, and output node type. The values are graph data in the form of
+        edge type, and destination node type. The values are graph data in the form of
        :math:`(U, V)`, where :math:`(U[i], V[i])` forms the edge with ID :math:`i`.
        The allowed graph data formats are:
@@ -376,35 +376,35 @@ def create_block(data_dict, num_src_nodes=None, num_dst_nodes=None, idtype=None,
        - ``(iterable[int], iterable[int])``: Similar to the tuple of node-tensors
          format, but stores node IDs in two sequences (e.g. list, tuple, numpy.ndarray).
-        If you would like to create a block with a single input node type, a single output
+        If you would like to create a MFG with a single source node type, a single destination
        node type, and a single edge type, then you can pass in the graph data directly
        without wrapping it as a dictionary.
    num_src_nodes : dict[str, int] or int, optional
-        The number of nodes for each input node type, which is a dictionary mapping a node type
+        The number of nodes for each source node type, which is a dictionary mapping a node type
-        :math:`T` to the number of :math:`T`-typed input nodes.
+        :math:`T` to the number of :math:`T`-typed source nodes.
        If not given for a node type :math:`T`, DGL finds the largest ID appearing in *every*
-        graph data whose input node type is :math:`T`, and sets the number of nodes to
+        graph data whose source node type is :math:`T`, and sets the number of nodes to
        be that ID plus one. If given and the value is no greater than the largest ID for some
-        input node type, DGL will raise an error. By default, DGL infers the number of nodes for
+        source node type, DGL will raise an error. By default, DGL infers the number of nodes for
-        all input node types.
+        all source node types.
-        If you would like to create a block with a single input node type, a single output
+        If you would like to create a MFG with a single source node type, a single destination
        node type, and a single edge type, then you can pass in an integer to directly
-        represent the number of input nodes.
+        represent the number of source nodes.
    num_dst_nodes : dict[str, int] or int, optional
-        The number of nodes for each output node type, which is a dictionary mapping a node type
+        The number of nodes for each destination node type, which is a dictionary mapping a node
-        :math:`T` to the number of :math:`T`-typed output nodes.
+        type :math:`T` to the number of :math:`T`-typed destination nodes.
        If not given for a node type :math:`T`, DGL finds the largest ID appearing in *every*
-        graph data whose output node type is :math:`T`, and sets the number of nodes to
+        graph data whose destination node type is :math:`T`, and sets the number of nodes to
        be that ID plus one. If given and the value is no greater than the largest ID for some
-        output node type, DGL will raise an error. By default, DGL infers the number of nodes for
+        destination node type, DGL will raise an error. By default, DGL infers the number of nodes
-        all output node types.
+        for all destination node types.
-        If you would like to create a block with a single output node type, a single output
+        If you would like to create a MFG with a single destination node type, a single
-        node type, and a single edge type, then you can pass in an integer to directly
+        destination node type, and a single edge type, then you can pass in an integer to directly
-        represent the number of output nodes.
+        represent the number of destination nodes.
    idtype : int32 or int64, optional
        The data type for storing the structure-related graph information such as node and
        edge IDs. It should be a framework-specific data type object (e.g., ``torch.int32``).
@@ -419,7 +419,7 @@ def create_block(data_dict, num_src_nodes=None, num_dst_nodes=None, idtype=None,
    Returns
    -------
    DGLBlock
-        The created block.
+        The created MFG.
    Notes
    -----
@@ -501,12 +501,12 @@ def create_block(data_dict, num_src_nodes=None, num_dst_nodes=None, idtype=None,
            num_dst_nodes[dty] = max(num_dst_nodes[dty], vrange)
        else:  # sanity check
            if num_src_nodes[sty] < urange:
-                raise DGLError('The given number of nodes of input node type {} must be larger'
+                raise DGLError('The given number of nodes of source node type {} must be larger'
                               ' than the max ID in the data, but got {} and {}.'.format(
                                   sty, num_src_nodes[sty], urange - 1))
            if num_dst_nodes[dty] < vrange:
-                raise DGLError('The given number of nodes of output node type {} must be larger'
+                raise DGLError('The given number of nodes of destination node type {} must be'
-                               ' than the max ID in the data, but got {} and {}.'.format(
+                               ' larger than the max ID in the data, but got {} and {}.'.format(
                                   dty, num_dst_nodes[dty], vrange - 1))
    # Create the graph
@@ -546,17 +546,17 @@ def create_block(data_dict, num_src_nodes=None, num_dst_nodes=None, idtype=None,
    return retg.to(device)
 def block_to_graph(block):
-    """Convert a :class:`DGLBlock` object to a :class:`DGLGraph`.
+    """Convert a message flow graph (MFG) as a :class:`DGLBlock` object to a :class:`DGLGraph`.
-    DGL will rename all the input node types by suffixing with ``_src``, and
+    DGL will rename all the source node types by suffixing with ``_src``, and
-    all the output node types by suffixing with ``_dst``.
+    all the destination node types by suffixing with ``_dst``.
    Features on the returned graph will be preserved.
    Parameters
    ----------
    block : DGLBlock
-        The block.
+        The MFG.
    Returns
    -------

--- a/python/dgl/dataloading/dataloader.py
+++ b/python/dgl/dataloading/dataloader.py
@@ -15,7 +15,7 @@ from ..distributed.dist_graph import DistGraph
 # pylint: disable=unused-argument
 def assign_block_eids(block, frontier):
-    """Assigns edge IDs from the original graph to the block.
+    """Assigns edge IDs from the original graph to the message flow graph (MFG).
    See also
    --------
@@ -117,8 +117,8 @@ class BlockSampler(object):
    """Abstract class specifying the neighborhood sampling strategy for DGL data loaders.
    The main method for BlockSampler is :meth:`sample_blocks`,
-    which generates a list of blocks for a multi-layer GNN given a set of seed nodes to
+    which generates a list of message flow graphs (MFGs) for a multi-layer GNN given a set of
-    have their outputs computed.
+    seed nodes to have their outputs computed.
    The default implementation of :meth:`sample_blocks` is
    to repeat :attr:`num_layers` times the following procedure from the last layer to the first
@@ -133,13 +133,13 @@ class BlockSampler(object):
      reverse edges.  This is controlled by the argument :attr:`exclude_eids` in
      :meth:`sample_blocks` method.
-    * Convert the frontier into a block.
+    * Convert the frontier into a MFG.
    * Optionally assign the IDs of the edges in the original graph selected in the first step
-      to the block, controlled by the argument ``return_eids`` in
+      to the MFG, controlled by the argument ``return_eids`` in
      :meth:`sample_blocks` method.
-    * Prepend the block to the block list to be returned.
+    * Prepend the MFG to the MFG list to be returned.
    All subclasses should override :meth:`sample_frontier`
    method while specifying the number of layers to sample in :attr:`num_layers` argument.
@@ -149,19 +149,21 @@ class BlockSampler(object):
    num_layers : int
        The number of layers to sample.
    return_eids : bool, default False
-        Whether to return the edge IDs involved in message passing in the block.
+        Whether to return the edge IDs involved in message passing in the MFG.
        If True, the edge IDs will be stored as an edge feature named ``dgl.EID``.
    Notes
    -----
-    For the concept of frontiers and blocks, please refer to User Guide Section 6 [TODO].
+    For the concept of frontiers and MFGs, please refer to
+    :ref:`User Guide Section 6 <guide-minibatch>` and
+    :doc:`Minibatch Training Tutorials <tutorials/large/L0_neighbor_sampling_overview>`.
    """
    def __init__(self, num_layers, return_eids):
        self.num_layers = num_layers
        self.return_eids = return_eids
    def sample_frontier(self, block_id, g, seed_nodes):
-        """Generate the frontier given the output nodes.
+        """Generate the frontier given the destination nodes.
        The subclasses should override this function.
@@ -172,7 +174,7 @@ class BlockSampler(object):
        g : DGLGraph
            The original graph.
        seed_nodes : Tensor or dict[ntype, Tensor]
-            The output nodes by node type.
+            The destination nodes by node type.
            If the graph only has one node type, one can just specify a single tensor
            of node IDs.
@@ -184,19 +186,21 @@ class BlockSampler(object):
        Notes
        -----
-        For the concept of frontiers and blocks, please refer to User Guide Section 6 [TODO].
+        For the concept of frontiers and MFGs, please refer to
+        :ref:`User Guide Section 6 <guide-minibatch>` and
+        :doc:`Minibatch Training Tutorials <tutorials/large/L0_neighbor_sampling_overview>`.
        """
        raise NotImplementedError
    def sample_blocks(self, g, seed_nodes, exclude_eids=None):
-        """Generate the a list of blocks given the output nodes.
+        """Generate the a list of MFGs given the destination nodes.
        Parameters
        ----------
        g : DGLGraph
            The original graph.
        seed_nodes : Tensor or dict[ntype, Tensor]
-            The output nodes by node type.
+            The destination nodes by node type.
            If the graph only has one node type, one can just specify a single tensor
            of node IDs.
@@ -206,11 +210,13 @@ class BlockSampler(object):
        Returns
        -------
        list[DGLGraph]
-            The blocks generated for computing the multi-layer GNN output.
+            The MFGs generated for computing the multi-layer GNN output.
        Notes
        -----
-        For the concept of frontiers and blocks, please refer to User Guide Section 6 [TODO].
+        For the concept of frontiers and MFGs, please refer to
+        :ref:`User Guide Section 6 <guide-minibatch>` and
+        :doc:`Minibatch Training Tutorials <tutorials/large/L0_neighbor_sampling_overview>`.
        """
        blocks = []
        exclude_eids = (
@@ -259,11 +265,13 @@ class Collator(ABC):
    Provides a :attr:`dataset` object containing the collection of all nodes or edges,
    as well as a :attr:`collate` method that combines a set of items from
-    :attr:`dataset` and obtains the blocks.
+    :attr:`dataset` and obtains the message flow graphs (MFGs).
    Notes
    -----
-    For the concept of blocks, please refer to User Guide Section 6 [TODO].
+    For the concept of MFGs, please refer to
+    :ref:`User Guide Section 6 <guide-minibatch>` and
+    :doc:`Minibatch Training Tutorials <tutorials/large/L0_neighbor_sampling_overview>`.
    """
    @abstractproperty
    def dataset(self):
@@ -272,7 +280,7 @@ class Collator(ABC):
    @abstractmethod
    def collate(self, items):
-        """Combines the items from the dataset object and obtains the list of blocks.
+        """Combines the items from the dataset object and obtains the list of MFGs.
        Parameters
        ----------
@@ -281,7 +289,9 @@ class Collator(ABC):
        Notes
        -----
-        For the concept of blocks, please refer to User Guide Section 6 [TODO].
+        For the concept of MFGs, please refer to
+        :ref:`User Guide Section 6 <guide-minibatch>` and
+        :doc:`Minibatch Training Tutorials <tutorials/large/L0_neighbor_sampling_overview>`.
        """
        raise NotImplementedError
@@ -330,6 +340,12 @@ class NodeCollator(Collator):
    ...     batch_size=1024, shuffle=True, drop_last=False, num_workers=4)
    >>> for input_nodes, output_nodes, blocks in dataloader:
    ...     train_on(input_nodes, output_nodes, blocks)
+    Notes
+    -----
+    For the concept of MFGs, please refer to
+    :ref:`User Guide Section 6 <guide-minibatch>` and
+    :doc:`Minibatch Training Tutorials <tutorials/large/L0_neighbor_sampling_overview>`.
    """
    def __init__(self, g, nids, block_sampler):
        self.g = g
@@ -351,7 +367,7 @@ class NodeCollator(Collator):
        return self._dataset
    def collate(self, items):
-        """Find the list of blocks necessary for computing the representation of given
+        """Find the list of MFGs necessary for computing the representation of given
        nodes for a node classification/regression task.
        Parameters
@@ -372,8 +388,8 @@ class NodeCollator(Collator):
            If the original graph has multiple node types, return a dictionary of
            node type names and node ID tensors.  Otherwise, return a single tensor.
-        blocks : list[DGLGraph]
+        MFGs : list[DGLGraph]
-            The list of blocks necessary for computing the representation.
+            The list of MFGs necessary for computing the representation.
        """
        if isinstance(items[0], tuple):
            # returns a list of pairs: group them by node types into a dict
@@ -404,7 +420,7 @@ class EdgeCollator(Collator):
    * If a negative sampler is given, another graph that contains the "negative edges",
      connecting the source and destination nodes yielded from the given negative sampler.
-    * A list of blocks necessary for computing the representation of the incident nodes
+    * A list of MFGs necessary for computing the representation of the incident nodes
      of the edges in the minibatch.
    Parameters
@@ -552,6 +568,12 @@ class EdgeCollator(Collator):
    ...     batch_size=1024, shuffle=True, drop_last=False, num_workers=4)
    >>> for input_nodes, pos_pair_graph, neg_pair_graph, blocks in dataloader:
    ...     train_on(input_nodes, pair_graph, neg_pair_graph, blocks)
+    Notes
+    -----
+    For the concept of MFGs, please refer to
+    :ref:`User Guide Section 6 <guide-minibatch>` and
+    :doc:`Minibatch Training Tutorials <tutorials/large/L0_neighbor_sampling_overview>`.
    """
    def __init__(self, g, eids, block_sampler, g_sampling=None, exclude=None,
                 reverse_eids=None, reverse_etypes=None, negative_sampler=None):
@@ -690,7 +712,7 @@ class EdgeCollator(Collator):
            Note that the metagraph of this graph will be identical to that of the original
            graph.
        blocks : list[DGLGraph]
-            The list of blocks necessary for computing the representation of the edges.
+            The list of MFGs necessary for computing the representation of the edges.
        """
        if self.negative_sampler is None:
            return self._collate(items)

--- a/python/dgl/dataloading/neighbor.py
+++ b/python/dgl/dataloading/neighbor.py
@@ -25,7 +25,7 @@ class MultiLayerNeighborSampler(BlockSampler):
    replace : bool, default True
        Whether to sample with replacement
    return_eids : bool, default False
-        Whether to return the edge IDs involved in message passing in the block.
+        Whether to return the edge IDs involved in message passing in the MFG.
        If True, the edge IDs will be stored as an edge feature named ``dgl.EID``.
    Examples
@@ -50,6 +50,12 @@ class MultiLayerNeighborSampler(BlockSampler):
    ...     {('user', 'follows', 'user'): 5,
    ...      ('user', 'plays', 'game'): 4,
    ...      ('game', 'played-by', 'user'): 3}] * 3)
+    Notes
+    -----
+    For the concept of MFGs, please refer to
+    :ref:`User Guide Section 6 <guide-minibatch>` and
+    :doc:`Minibatch Training Tutorials <tutorials/large/L0_neighbor_sampling_overview>`.
    """
    def __init__(self, fanouts, replace=False, return_eids=False):
        super().__init__(len(fanouts), return_eids)
@@ -84,7 +90,7 @@ class MultiLayerFullNeighborSampler(MultiLayerNeighborSampler):
    n_layers : int
        The number of GNN layers to sample.
    return_eids : bool, default False
-        Whether to return the edge IDs involved in message passing in the block.
+        Whether to return the edge IDs involved in message passing in the MFG.
        If True, the edge IDs will be stored as an edge feature named ``dgl.EID``.
    Examples
@@ -100,6 +106,12 @@ class MultiLayerFullNeighborSampler(MultiLayerNeighborSampler):
    ...     batch_size=1024, shuffle=True, drop_last=False, num_workers=4)
    >>> for blocks in dataloader:
    ...     train_on(blocks)
+    Notes
+    -----
+    For the concept of MFGs, please refer to
+    :ref:`User Guide Section 6 <guide-minibatch>` and
+    :doc:`Minibatch Training Tutorials <tutorials/large/L0_neighbor_sampling_overview>`.
    """
    def __init__(self, n_layers, return_eids=False):
        super().__init__([None] * n_layers, return_eids=return_eids)
--- a/python/dgl/dataloading/pytorch/__init__.py
+++ b/python/dgl/dataloading/pytorch/__init__.py
@@ -16,8 +16,8 @@ def _remove_kwargs_dist(kwargs):
 # The following code is a fix to the PyTorch-specific issue in
 # https://github.com/dmlc/dgl/issues/2137
 #
-# Basically the sampled blocks/subgraphs contain the features extracted from the
+# Basically the sampled MFGs/subgraphs contain the features extracted from the
-# parent graph.  In DGL, the blocks/subgraphs will hold a reference to the parent
+# parent graph.  In DGL, the MFGs/subgraphs will hold a reference to the parent
 # graph feature tensor and an index tensor, so that the features could be extracted upon
 # request.  However, in the context of multiprocessed sampling, we do not need to
 # transmit the parent graph feature tensor from the subprocess to the main process,
@@ -26,13 +26,13 @@ def _remove_kwargs_dist(kwargs):
 # it with the following trick:
 #
 # In the collator running in the sampler processes:
-# For each frame in the block, we check each column and the column with the same name
+# For each frame in the MFG, we check each column and the column with the same name
 # in the corresponding parent frame.  If the storage of the former column is the
 # same object as the latter column, we are sure that the former column is a
 # subcolumn of the latter, and set the storage of the former column as None.
 #
 # In the iterator of the main process:
-# For each frame in the block, we check each column and the column with the same name
+# For each frame in the MFG, we check each column and the column with the same name
 # in the corresponding parent frame.  If the storage of the former column is None,
 # we replace it with the storage of the latter column.
@@ -118,7 +118,7 @@ def _restore_blocks_storage(blocks, g):
 class _NodeCollator(NodeCollator):
    def collate(self, items):
-        # input_nodes, output_nodes, [items], blocks
+        # input_nodes, output_nodes, blocks
        result = super().collate(items)
        _pop_blocks_storage(result[-1], self.g)
        return result
@@ -173,7 +173,7 @@ class _EdgeDataLoaderIter:
        result_ = next(self.iter_)
        if self.edge_dataloader.collator.negative_sampler is not None:
-            # input_nodes, pair_graph, neg_pair_graph, blocks
+            # input_nodes, pair_graph, neg_pair_graph, blocks if None.
            # Otherwise, input_nodes, pair_graph, blocks
            _restore_subgraph_storage(result_[2], self.edge_dataloader.collator.g)
        _restore_subgraph_storage(result_[1], self.edge_dataloader.collator.g)
@@ -184,7 +184,7 @@ class _EdgeDataLoaderIter:
 class NodeDataLoader:
    """PyTorch dataloader for batch-iterating over a set of nodes, generating the list
-    of blocks as computation dependency of the said minibatch.
+    of message flow graphs (MFGs) as computation dependency of the said minibatch.
    Parameters
    ----------
@@ -195,7 +195,7 @@ class NodeDataLoader:
    block_sampler : dgl.dataloading.BlockSampler
        The neighborhood sampler.
    device : device context, optional
-        The device of the generated blocks in each iteration, which should be a
+        The device of the generated MFGs in each iteration, which should be a
        PyTorch device object (e.g., ``torch.device``).
    kwargs : dict
        Arguments being passed to :py:class:`torch.utils.data.DataLoader`.
@@ -212,6 +212,12 @@ class NodeDataLoader:
    ...     batch_size=1024, shuffle=True, drop_last=False, num_workers=4)
    >>> for input_nodes, output_nodes, blocks in dataloader:
    ...     train_on(input_nodes, output_nodes, blocks)
+    Notes
+    -----
+    Please refer to
+    :doc:`Minibatch Training Tutorials <tutorials/large/L0_neighbor_sampling_overview>`
+    and :ref:`User Guide Section 6 <guide-minibatch>` for usage.
    """
    collator_arglist = inspect.getfullargspec(NodeCollator).args
@@ -261,8 +267,8 @@ class NodeDataLoader:
 class EdgeDataLoader:
    """PyTorch dataloader for batch-iterating over a set of edges, generating the list
-    of blocks as computation dependency of the said minibatch for edge classification,
+    of message flow graphs (MFGs) as computation dependency of the said minibatch for
-    edge regression, and link prediction.
+    edge classification, edge regression, and link prediction.
    For each iteration, the object will yield
@@ -275,7 +281,7 @@ class EdgeDataLoader:
    * If a negative sampler is given, another graph that contains the "negative edges",
      connecting the source and destination nodes yielded from the given negative sampler.
-    * A list of blocks necessary for computing the representation of the incident nodes
+    * A list of MFGs necessary for computing the representation of the incident nodes
      of the edges in the minibatch.
    For more details, please refer to :ref:`guide-minibatch-edge-classification-sampler`
@@ -290,7 +296,7 @@ class EdgeDataLoader:
    block_sampler : dgl.dataloading.BlockSampler
        The neighborhood sampler.
    device : device context, optional
-        The device of the generated blocks and graphs in each iteration, which should be a
+        The device of the generated MFGs and graphs in each iteration, which should be a
        PyTorch device object (e.g., ``torch.device``).
    g_sampling : DGLGraph, optional
        The graph where neighborhood sampling is performed.
@@ -406,11 +412,17 @@ class EdgeDataLoader:
    ...     negative_sampler=neg_sampler,
    ...     batch_size=1024, shuffle=True, drop_last=False, num_workers=4)
    >>> for input_nodes, pos_pair_graph, neg_pair_graph, blocks in dataloader:
-    ...     train_on(input_nodse, pair_graph, neg_pair_graph, blocks)
+    ...     train_on(input_nodes, pair_graph, neg_pair_graph, blocks)
    See also
    --------
-    :class:`~dgl.dataloading.dataloader.EdgeCollator`
+    dgl.dataloading.dataloader.EdgeCollator
+    Notes
+    -----
+    Please refer to
+    :doc:`Minibatch Training Tutorials <tutorials/large/L0_neighbor_sampling_overview>`
+    and :ref:`User Guide Section 6 <guide-minibatch>` for usage.
    For end-to-end usages, please refer to the following tutorial/examples:

--- a/python/dgl/transform.py
+++ b/python/dgl/transform.py
@@ -1668,35 +1668,35 @@ def to_block(g, dst_nodes=None, include_dst_in_src=True):
    """Convert a graph into a bipartite-structured *block* for message passing.
    A block is a graph consisting of two sets of nodes: the
-    *input* nodes and *output* nodes.  The input and output nodes can have multiple
+    *source* nodes and *destination* nodes.  The source and destination nodes can have multiple
-    node types.  All the edges connect from input nodes to output nodes.
+    node types.  All the edges connect from source nodes to destination nodes.
-    Specifically, the input nodes and output nodes will have the same node types as the
+    Specifically, the source nodes and destination nodes will have the same node types as the
    ones in the original graph.  DGL maps each edge ``(u, v)`` with edge type
    ``(utype, etype, vtype)`` in the original graph to the edge with type
-    ``etype`` connecting from node ID ``u`` of type ``utype`` in the input side to node
+    ``etype`` connecting from node ID ``u`` of type ``utype`` in the source side to node
-    ID ``v`` of type ``vtype`` in the output side.
+    ID ``v`` of type ``vtype`` in the destination side.
-    For blocks returned by :func:`to_block`, the output nodes of the block will only
+    For blocks returned by :func:`to_block`, the destination nodes of the block will only
-    contain the nodes that have at least one inbound edge of any type.  The input nodes
+    contain the nodes that have at least one inbound edge of any type.  The source nodes
-    of the block will only contain the nodes that appear in the output nodes, as well
+    of the block will only contain the nodes that appear in the destination nodes, as well
-    as the nodes that have at least one outbound edge connecting to one of the output nodes.
+    as the nodes that have at least one outbound edge connecting to one of the destination nodes.
-    If the :attr:`dst_nodes` argument is not None, it specifies the output nodes instead.
+    The destination nodes are specified by the :attr:`dst_nodes` argument if it is not None.
    Parameters
    ----------
    graph : DGLGraph
        The graph.
    dst_nodes : Tensor or dict[str, Tensor], optional
-        The list of output nodes.
+        The list of destination nodes.
        If a tensor is given, the graph must have only one node type.
        If given, it must be a superset of all the nodes that have at least one inbound
        edge.  An error will be raised otherwise.
    include_dst_in_src : bool
-        If False, do not include output nodes in input nodes.
+        If False, do not include destination nodes in source nodes.
        (Default: True)
@@ -1734,13 +1734,13 @@ def to_block(g, dst_nodes=None, include_dst_in_src=True):
    >>> g = dgl.graph(([1, 2], [2, 3]))
    >>> block = dgl.to_block(g, torch.LongTensor([3, 2]))
-    The output nodes would be exactly the same as the ones given: [3, 2].
+    The destination nodes would be exactly the same as the ones given: [3, 2].
    >>> induced_dst = block.dstdata[dgl.NID]
    >>> induced_dst
    tensor([3, 2])
-    The first few input nodes would also be exactly the same as
+    The first few source nodes would also be exactly the same as
    the ones given.  The rest of the nodes are the ones necessary for message passing
    into nodes 3, 2.  This means that the node 1 would be included.
@@ -1749,7 +1749,7 @@ def to_block(g, dst_nodes=None, include_dst_in_src=True):
    tensor([3, 2, 1])
    You can notice that the first two nodes are identical to the given nodes as well as
-    the output nodes.
+    the destination nodes.
    The induced edges can also be obtained by the following:
@@ -1764,20 +1764,20 @@ def to_block(g, dst_nodes=None, include_dst_in_src=True):
    >>> induced_src[src], induced_dst[dst]
    (tensor([2, 1]), tensor([3, 2]))
-    The output nodes specified must be a superset of the nodes that have edges connecting
+    The destination nodes specified must be a superset of the nodes that have edges connecting
-    to them.  For example, the following will raise an error since the output nodes
+    to them.  For example, the following will raise an error since the destination nodes
    does not contain node 3, which has an edge connecting to it.
    >>> g = dgl.graph(([1, 2], [2, 3]))
    >>> dgl.to_block(g, torch.LongTensor([2]))     # error
    Converting a heterogeneous graph to a block is similar, except that when specifying
-    the output nodes, you have to give a dict:
+    the destination nodes, you have to give a dict:
    >>> g = dgl.heterograph({('A', '_E', 'B'): ([1, 2], [2, 3])})
-    If you don't specify any node of type A on the output side, the node type ``A``
+    If you don't specify any node of type A on the destination side, the node type ``A``
-    in the block would have zero nodes on the output side.
+    in the block would have zero nodes on the destination side.
    >>> block = dgl.to_block(g, {'B': torch.LongTensor([3, 2])})
    >>> block.number_of_dst_nodes('A')
@@ -1787,12 +1787,12 @@ def to_block(g, dst_nodes=None, include_dst_in_src=True):
    >>> block.dstnodes['B'].data[dgl.NID]
    tensor([3, 2])
-    The input side would contain all the nodes on the output side:
+    The source side would contain all the nodes on the destination side:
    >>> block.srcnodes['B'].data[dgl.NID]
    tensor([3, 2])
-    As well as all the nodes that have connections to the nodes on the output side:
+    As well as all the nodes that have connections to the nodes on the destination side:
    >>> block.srcnodes['A'].data[dgl.NID]
    tensor([2, 1])

--- a/tutorials/large/L0_neighbor_sampling_overview.py
+++ b/tutorials/large/L0_neighbor_sampling_overview.py
@@ -93,15 +93,16 @@ By the end of this tutorial, you will be able to
 ######################################################################
 # You can also notice in the animation above that the computation
 # dependencies in the animation above can be described as a series of
-# *bipartite graphs*.
+# bipartite graphs.
-# The output nodes are on one side and all the nodes necessary for inputs
+# The output nodes (called *destination nodes*) are on one side and all the
-# are on the other side. The arrows indicate how the sampled neighbors
+# nodes necessary for inputs (called *source nodes*) are on the other side.
-# propagates messages to the nodes.
+# The arrows indicate how the sampled neighbors propagates messages to the nodes.
+# DGL calls such graphs *message flow graphs* (MFG).
 #
-# Note that some GNN modules, such as `SAGEConv`, need to use the output
+# Note that some GNN modules, such as `SAGEConv`, need to use the destination
 # nodes' features on the previous layer to compute the outputs.  Without
-# loss of generality, DGL always includes the output nodes themselves
+# loss of generality, DGL always includes the destination nodes themselves
-# in the input nodes.
+# in the source nodes.
 #

--- a/tutorials/large/L1_large_node_classification.py
+++ b/tutorials/large/L1_large_node_classification.py
@@ -70,7 +70,7 @@ test_nids = idx_split['test']
 #
 # In the :doc:`previous tutorial <L0_neighbor_sampling_overview>`, you
 # have seen that the computation dependency for message passing of a
-# single node can be described as a series of bipartite graphs.
+# single node can be described as a series of *message flow graphs* (MFG).
 #
 # |image1|
 #
@@ -84,10 +84,10 @@ test_nids = idx_split['test']
 #
 # DGL provides tools to iterate over the dataset in minibatches
 # while generating the computation dependencies to compute their outputs
-# with the bipartite graphs above. For node classification, you can use
+# with the MFGs above. For node classification, you can use
 # ``dgl.dataloading.NodeDataLoader`` for iterating over the dataset.
 # It accepts a sampler object to control how to generate the computation
-# dependencies in the form of bipartite graphs.  DGL provides
+# dependencies in the form of MFGs.  DGL provides
 # implementations of common sampling algorithms such as
 # ``dgl.dataloading.MultiLayerNeighborSampler`` which randomly picks
 # a fixed number of neighbors for each node.
@@ -113,7 +113,7 @@ train_dataloader = dgl.dataloading.NodeDataLoader(
    graph,              # The graph
    train_nids,         # The node IDs to iterate over in minibatches
    sampler,            # The neighbor sampler
-    device=device,      # Put the sampled bipartite graphs on CPU or GPU
+    device=device,      # Put the sampled MFGs on CPU or GPU
    # The following arguments are inherited from PyTorch DataLoader.
    batch_size=1024,    # Batch size
    shuffle=True,       # Whether to shuffle the nodes for every epoch
@@ -126,7 +126,7 @@ train_dataloader = dgl.dataloading.NodeDataLoader(
 # You can iterate over the data loader and see what it yields.
 #
-input_nodes, output_nodes, bipartites = example_minibatch = next(iter(train_dataloader))
+input_nodes, output_nodes, mfgs = example_minibatch = next(iter(train_dataloader))
 print(example_minibatch)
 print("To compute {} nodes' outputs, we need {} nodes' input features".format(len(output_nodes), len(input_nodes))) 
@@ -138,24 +138,24 @@ print("To compute {} nodes' outputs, we need {} nodes' input features".format(le
 #    are needed on the first GNN layer for this minibatch.
 # -  An ID tensor for the output nodes, i.e. nodes whose representations
 #    are to be computed.
-# -  A list of bipartite graphs storing the computation dependencies
+# -  A list of MFGs storing the computation dependencies
 #    for each GNN layer.
 #
 ######################################################################
-# You can get the input and output node IDs of the bipartite graphs
+# You can get the source and destination node IDs of the MFGs
-# and verify that the first few input nodes are always the same as the output
+# and verify that the first few source nodes are always the same as the destination
 # nodes.  As we described in the :doc:`overview <L0_neighbor_sampling_overview>`,
-# output nodes' own features from the previous layer may also be necessary in
+# destination nodes' own features from the previous layer may also be necessary in
 # the computation of the new features.
 #
-bipartite_0_src = bipartites[0].srcdata[dgl.NID]
+mfg_0_src = mfgs[0].srcdata[dgl.NID]
-bipartite_0_dst = bipartites[0].dstdata[dgl.NID]
+mfg_0_dst = mfgs[0].dstdata[dgl.NID]
-print(bipartite_0_src)
+print(mfg_0_src)
-print(bipartite_0_dst)
+print(mfg_0_dst)
-print(torch.equal(bipartite_0_src[:bipartites[0].num_dst_nodes()], bipartite_0_dst))
+print(torch.equal(mfg_0_src[:mfgs[0].num_dst_nodes()], mfg_0_dst))
 ######################################################################
@@ -177,14 +177,14 @@ class Model(nn.Module):
        self.conv2 = SAGEConv(h_feats, num_classes, aggregator_type='mean')
        self.h_feats = h_feats
-    def forward(self, bipartites, x):
+    def forward(self, mfgs, x):
        # Lines that are changed are marked with an arrow: "<---"
-        h_dst = x[:bipartites[0].num_dst_nodes()]  # <---
+        h_dst = x[:mfgs[0].num_dst_nodes()]  # <---
-        h = self.conv1(bipartites[0], (x, h_dst))  # <---
+        h = self.conv1(mfgs[0], (x, h_dst))  # <---
        h = F.relu(h)
-        h_dst = h[:bipartites[1].num_dst_nodes()]  # <---
+        h_dst = h[:mfgs[1].num_dst_nodes()]  # <---
-        h = self.conv2(bipartites[1], (h, h_dst))  # <---
+        h = self.conv2(mfgs[1], (h, h_dst))  # <---
        return h
 model = Model(num_features, 128, num_classes).to(device)
@@ -195,44 +195,44 @@ model = Model(num_features, 128, num_classes).to(device)
 # :doc:`introduction <../blitz/1_introduction>`, you will notice several
 # differences:
 #
-# -  **DGL GNN layers on bipartite graphs**. Instead of computing on the
+# -  **DGL GNN layers on MFGs**. Instead of computing on the
 #    full graph:
 #
 #    .. code:: python
 #
 #       h = self.conv1(g, x)
 #
-#    you only compute on the sampled bipartite graph:
+#    you only compute on the sampled MFG:
 #
 #    .. code:: python
 #
-#       h = self.conv1(bipartites[0], (x, h_dst))
+#       h = self.conv1(mfgs[0], (x, h_dst))
 #
-#    All DGL’s GNN modules support message passing on bipartite graphs,
+#    All DGL’s GNN modules support message passing on MFGs,
-#    where you supply a pair of features, one for input nodes and another
+#    where you supply a pair of features, one for source nodes and another
-#    for output nodes.
+#    for destination nodes.
 #
 # -  **Feature slicing for self-dependency**. There are statements that
 #    perform slicing to obtain the previous-layer representation of the
-#    output nodes:
+#     nodes:
 #
 #    .. code:: python
 #
-#       h_dst = x[:bipartites[0].num_dst_nodes()]
+#       h_dst = x[:mfgs[0].num_dst_nodes()]
 #
-#    ``num_dst_nodes`` method works with bipartite graphs, where it will
+#    ``num_dst_nodes`` method works with MFGs, where it will
-#    return the number of output nodes.
+#    return the number of destination nodes.
 #
-#    Since the first few input nodes of the yielded bipartite graph are
+#    Since the first few source nodes of the yielded MFG are
-#    always the same as the output nodes, these statements obtain the
+#    always the same as the destination nodes, these statements obtain the
-#    representations of the output nodes on the previous layer. They are
+#    representations of the destination nodes on the previous layer. They are
 #    then combined with neighbor aggregation in ``dgl.nn.SAGEConv`` layer.
 #
 # .. note::
 #
 #    See the :doc:`custom message passing
 #    tutorial <L4_message_passing>` for more details on how to
-#    manipulate bipartite graphs produced in this way, such as the usage
+#    manipulate MFGs produced in this way, such as the usage
 #    of ``num_dst_nodes``.
 #
@@ -277,12 +277,12 @@ for epoch in range(10):
    model.train()
    with tqdm.tqdm(train_dataloader) as tq:
-        for step, (input_nodes, output_nodes, bipartites) in enumerate(tq):
+        for step, (input_nodes, output_nodes, mfgs) in enumerate(tq):
            # feature copy from CPU to GPU takes place here
-            inputs = bipartites[0].srcdata['feat']
+            inputs = mfgs[0].srcdata['feat']
-            labels = bipartites[-1].dstdata['label']
+            labels = mfgs[-1].dstdata['label']
-            predictions = model(bipartites, inputs)
+            predictions = model(mfgs, inputs)
            loss = F.cross_entropy(predictions, labels)
            opt.zero_grad()
@@ -298,10 +298,10 @@ for epoch in range(10):
    predictions = []
    labels = []
    with tqdm.tqdm(valid_dataloader) as tq, torch.no_grad():
-        for input_nodes, output_nodes, bipartites in tq:
+        for input_nodes, output_nodes, mfgs in tq:
-            inputs = bipartites[0].srcdata['feat']
+            inputs = mfgs[0].srcdata['feat']
-            labels.append(bipartites[-1].dstdata['label'].cpu().numpy())
+            labels.append(mfgs[-1].dstdata['label'].cpu().numpy())
-            predictions.append(model(bipartites, inputs).argmax(1).cpu().numpy())
+            predictions.append(model(mfgs, inputs).argmax(1).cpu().numpy())
        predictions = np.concatenate(predictions)
        labels = np.concatenate(labels)
        accuracy = sklearn.metrics.accuracy_score(labels, predictions)

--- a/tutorials/large/L2_large_link_prediction.py
+++ b/tutorials/large/L2_large_link_prediction.py
@@ -117,7 +117,7 @@ train_dataloader = dgl.dataloading.EdgeDataLoader(
    torch.arange(graph.number_of_edges()),  # The edges to iterate over
    sampler,                                # The neighbor sampler
    negative_sampler=negative_sampler,      # The negative sampler
-    device=device,                          # Put the bipartite graphs on CPU or GPU
+    device=device,                          # Put the MFGs on CPU or GPU
    # The following arguments are inherited from PyTorch DataLoader.
    batch_size=1024,    # Batch size
    shuffle=True,       # Whether to shuffle the nodes for every epoch
@@ -131,11 +131,11 @@ train_dataloader = dgl.dataloading.EdgeDataLoader(
 # will give you.
 #
-input_nodes, pos_graph, neg_graph, bipartites = next(iter(train_dataloader))
+input_nodes, pos_graph, neg_graph, mfgs = next(iter(train_dataloader))
 print('Number of input nodes:', len(input_nodes))
 print('Positive graph # nodes:', pos_graph.number_of_nodes(), '# edges:', pos_graph.number_of_edges())
 print('Negative graph # nodes:', neg_graph.number_of_nodes(), '# edges:', neg_graph.number_of_edges())
-print(bipartites)
+print(mfgs)
 ######################################################################
@@ -152,9 +152,9 @@ print(bipartites)
 # necessary for computing the pair-wise scores of positive and negative examples
 # in the current minibatch.
 #
-# The last element is a list of bipartite graphs storing the computation
+# The last element is a list of :doc:`MFGs <L0_neighbor_sampling_overview>`
-# dependencies for each GNN layer.
+# storing the computation dependencies for each GNN layer.
-# The bipartite graphs are used to compute the GNN outputs of the nodes
+# The MFGs are used to compute the GNN outputs of the nodes
 # involved in positive/negative graph.
 #
@@ -180,12 +180,12 @@ class Model(nn.Module):
        self.conv2 = SAGEConv(h_feats, h_feats, aggregator_type='mean')
        self.h_feats = h_feats
-    def forward(self, bipartites, x):
+    def forward(self, mfgs, x):
-        h_dst = x[:bipartites[0].num_dst_nodes()]
+        h_dst = x[:mfgs[0].num_dst_nodes()]
-        h = self.conv1(bipartites[0], (x, h_dst))
+        h = self.conv1(mfgs[0], (x, h_dst))
        h = F.relu(h)
-        h_dst = h[:bipartites[1].num_dst_nodes()]
+        h_dst = h[:mfgs[1].num_dst_nodes()]
-        h = self.conv2(bipartites[1], (h, h_dst))
+        h = self.conv2(mfgs[1], (h, h_dst))
        return h
 model = Model(num_features, 128).to(device)
@@ -256,10 +256,10 @@ def inference(model, graph, node_features):
            device=device)
        result = []
-        for input_nodes, output_nodes, bipartites in train_dataloader:
+        for input_nodes, output_nodes, mfgs in train_dataloader:
            # feature copy from CPU to GPU takes place here
-            inputs = bipartites[0].srcdata['feat']
+            inputs = mfgs[0].srcdata['feat']
-            result.append(model(bipartites, inputs))
+            result.append(model(mfgs, inputs))
        return torch.cat(result)
@@ -324,11 +324,11 @@ best_accuracy = 0
 best_model_path = 'model.pt'
 for epoch in range(1):
    with tqdm.tqdm(train_dataloader) as tq:
-        for step, (input_nodes, pos_graph, neg_graph, bipartites) in enumerate(tq):
+        for step, (input_nodes, pos_graph, neg_graph, mfgs) in enumerate(tq):
            # feature copy from CPU to GPU takes place here
-            inputs = bipartites[0].srcdata['feat']
+            inputs = mfgs[0].srcdata['feat']
-            outputs = model(bipartites, inputs)
+            outputs = model(mfgs, inputs)
            pos_score = predictor(pos_graph, outputs)
            neg_score = predictor(neg_graph, outputs)

--- a/tutorials/large/L4_message_passing.py
+++ b/tutorials/large/L4_message_passing.py
@@ -38,33 +38,34 @@ train_dataloader = dgl.dataloading.NodeDataLoader(
    num_workers=0
 )
-input_nodes, output_nodes, bipartites = next(iter(train_dataloader))
+input_nodes, output_nodes, mfgs = next(iter(train_dataloader))
 ######################################################################
 # DGL Bipartite Graph Introduction
 # --------------------------------
 #
-# In the previous tutorials, you have seen the concept *bipartite graph*,
+# In the previous tutorials, you have seen the concept *message flow graph*
-# where nodes are divided into two parts.
+# (MFG), where nodes are divided into two parts.  It is a kind of (directional)
+# bipartite graph.
 # This section introduces how you can manipulate (directional) bipartite
 # graphs.
 #
-# You can access the input node features and output node features via
+# You can access the source node features and destination node features via
 # ``srcdata`` and ``dstdata`` attributes:
 #
-bipartite = bipartites[0]
+mfg = mfgs[0]
-print(bipartite.srcdata)
+print(mfg.srcdata)
-print(bipartite.dstdata)
+print(mfg.dstdata)
 ######################################################################
 # It also has ``num_src_nodes`` and ``num_dst_nodes`` functions to query
-# how many input nodes and output nodes exist in the bipartite graph:
+# how many source nodes and destination nodes exist in the bipartite graph:
 #
-print(bipartite.num_src_nodes(), bipartite.num_dst_nodes())
+print(mfg.num_src_nodes(), mfg.num_dst_nodes())
 ######################################################################
@@ -72,18 +73,18 @@ print(bipartite.num_src_nodes(), bipartite.num_dst_nodes())
 # will do with ``ndata`` on the graphs you have seen earlier:
 #
-bipartite.srcdata['x'] = torch.zeros(bipartite.num_src_nodes(), bipartite.num_dst_nodes())
+mfg.srcdata['x'] = torch.zeros(mfg.num_src_nodes(), mfg.num_dst_nodes())
-dst_feat = bipartite.dstdata['feat']
+dst_feat = mfg.dstdata['feat']
 ######################################################################
 # Also, since the bipartite graphs are constructed by DGL, you can
-# retrieve the input node IDs (i.e. those that are required to compute the
+# retrieve the source node IDs (i.e. those that are required to compute the
-# output) and output node IDs (i.e. those whose representations the
+# output) and destination node IDs (i.e. those whose representations the
 # current GNN layer should compute) as follows.
 #
-bipartite.srcdata[dgl.NID], bipartite.dstdata[dgl.NID]
+mfg.srcdata[dgl.NID], mfg.dstdata[dgl.NID]
 ######################################################################
@@ -93,30 +94,30 @@ bipartite.srcdata[dgl.NID], bipartite.dstdata[dgl.NID]
 ######################################################################
-# Recall that the bipartite graphs yielded by the ``NodeDataLoader`` and
+# Recall that the MFGs yielded by the ``NodeDataLoader`` and
-# ``EdgeDataLoader`` have the property that the first few input nodes are
+# ``EdgeDataLoader`` have the property that the first few source nodes are
-# always identical to the output nodes:
+# always identical to the destination nodes:
 #
 # |image1|
 #
 # .. |image1| image:: https://data.dgl.ai/tutorial/img/bipartite.gif
 #
-print(torch.equal(bipartite.srcdata[dgl.NID][:bipartite.num_dst_nodes()], bipartite.dstdata[dgl.NID]))
+print(torch.equal(mfg.srcdata[dgl.NID][:mfg.num_dst_nodes()], mfg.dstdata[dgl.NID]))
 ######################################################################
-# Suppose you have obtained the input node representations
+# Suppose you have obtained the source node representations
 # :math:`h_u^{(l-1)}`:
 #
-bipartite.srcdata['h'] = torch.randn(bipartite.num_src_nodes(), 10)
+mfg.srcdata['h'] = torch.randn(mfg.num_src_nodes(), 10)
 ######################################################################
 # Recall that DGL provides the `update_all` interface for expressing how
 # to compute messages and how to aggregate them on the nodes that receive
-# them. This concept naturally applies to bipartite graphs -- message
+# them. This concept naturally applies to bipartite graphs like MFGs -- message
 # computation happens on the edges between source and destination nodes of
 # the edges, and message aggregation happens on the destination nodes.
 #
@@ -129,8 +130,8 @@ bipartite.srcdata['h'] = torch.randn(bipartite.num_src_nodes(), 10)
 import dgl.function as fn
-bipartite.update_all(message_func=fn.copy_u('h', 'm'), reduce_func=fn.mean('m', 'h'))
+mfg.update_all(message_func=fn.copy_u('h', 'm'), reduce_func=fn.mean('m', 'h'))
-m_v = bipartite.dstdata['h']
+m_v = mfg.dstdata['h']
 m_v
@@ -165,9 +166,9 @@ class SAGEConv(nn.Module):
        Parameters
        ----------
        g : Graph
-            The input bipartite graph.
+            The input MFG.
        h : (Tensor, Tensor)
-            The feature of input nodes and output nodes as a pair of Tensors.
+            The feature of source nodes and destination nodes as a pair of Tensors.
        """
        with g.local_scope():
            h_src, h_dst = h
@@ -185,12 +186,12 @@ class Model(nn.Module):
        self.conv1 = SAGEConv(in_feats, h_feats)
        self.conv2 = SAGEConv(h_feats, num_classes)
-    def forward(self, bipartites, x):
+    def forward(self, mfgs, x):
-        h_dst = x[:bipartites[0].num_dst_nodes()]
+        h_dst = x[:mfgs[0].num_dst_nodes()]
-        h = self.conv1(bipartites[0], (x, h_dst))
+        h = self.conv1(mfgs[0], (x, h_dst))
        h = F.relu(h)
-        h_dst = h[:bipartites[1].num_dst_nodes()]
+        h_dst = h[:mfgs[1].num_dst_nodes()]
-        h = self.conv2(bipartites[1], (h, h_dst))
+        h = self.conv2(mfgs[1], (h, h_dst))
        return h
 sampler = dgl.dataloading.MultiLayerNeighborSampler([4, 4])
@@ -205,15 +206,15 @@ train_dataloader = dgl.dataloading.NodeDataLoader(
 model = Model(graph.ndata['feat'].shape[1], 128, dataset.num_classes).to(device)
 with tqdm.tqdm(train_dataloader) as tq:
-    for step, (input_nodes, output_nodes, bipartites) in enumerate(tq):
+    for step, (input_nodes, output_nodes, mfgs) in enumerate(tq):
-        inputs = bipartites[0].srcdata['feat']
+        inputs = mfgs[0].srcdata['feat']
-        labels = bipartites[-1].dstdata['label']
+        labels = mfgs[-1].dstdata['label']
-        predictions = model(bipartites, inputs)
+        predictions = model(mfgs, inputs)
 ######################################################################
 # Both ``update_all`` and the functions in ``nn.functional`` namespace
-# support bipartite graphs, so you can migrate the code working for small
+# support MFGs, so you can migrate the code working for small
 # graphs to large graph training with minimal changes introduced above.
 #