[Doc Fix] fix the format of gt doc (#6949)

982f2028 · rudongyu · GitHub · dfff53bc · 982f2028 · 982f2028
Unverified Commit 982f2028 authored Jan 15, 2024 by rudongyu Committed by GitHub Jan 15, 2024
3 changed files
--- a/docs/source/graphtransformer/data.rst
+++ b/docs/source/graphtransformer/data.rst
@@ -5,6 +5,7 @@ In this section, we will prepare the data for the Graphormer model introduced be
 .. code:: python
    def collate(graphs):
        # compute shortest path features, can be done in advance
        for g in graphs:

--- a/docs/source/graphtransformer/index.rst
+++ b/docs/source/graphtransformer/index.rst
-🆕 Tutorial: GraphTransformer
+🆕 Tutorial: Graph Transformer
 ==========
-This tutorial introduces the **graphtransformer** module, which is a set of
+This tutorial introduces the **graph transformer** (:mod:`~dgl.nn.gt`) module,
-utility modules for building and training graph transformer models.
+which is a set of utility modules for building and training graph transformer models.
 .. toctree::
  :maxdepth: 2

--- a/docs/source/graphtransformer/model.rst
+++ b/docs/source/graphtransformer/model.rst
@@ -12,6 +12,7 @@ Degree Encoding
 The degree encoder is a learnable embedding layer that encodes the degree of each node into a vector. It takes as input the batched input and output degrees of graph nodes, and outputs the degree embeddings of the nodes.
 .. code:: python
    degree_encoder = dgl.nn.DegreeEncoder(
        max_degree=8,  # the maximum degree to cut off
        embedding_dim=512  # the dimension of the degree embedding
@@ -22,6 +23,7 @@ Path Encoding
 The path encoder encodes the edge features on the shortest path between two nodes to get attention bias for the self-attention module. It takes as input the batched edge features in shape  and outputs the attention bias based on path encoding.
 .. code:: python
    path_encoder = PathEncoder(
        max_len=5,  # the maximum length of the shortest path
        feat_dim=512,  # the dimension of the edge feature
@@ -33,6 +35,7 @@ Spatial Encoding
 The spatial encoder encodes the shortest distance between two nodes to get attention bias for the self-attention module. It takes as input the shortest distance between two nodes and outputs the attention bias based on spatial encoding.
 .. code:: python
    spatial_encoder = SpatialEncoder(
        max_dist=5,  # the maximum distance between two nodes
        num_heads=8,  # the number of attention heads
@@ -46,6 +49,7 @@ The Graphormer layer is like a Transformer encoder layer with the Multi-head Att
 We can stack multiple Graphormer layers as a list just like implementing a Transformer encoder in PyTorch.
 .. code:: python
    layers = th.nn.ModuleList([
        GraphormerLayer(
            feat_size=512,  # the dimension of the input node features
@@ -63,6 +67,7 @@ Model Forward
 Grouping the modules above defines the primary components of the Graphormer model. We then can define the forward process as follows:
 .. code:: python
    node_feat, in_degree, out_degree, attn_mask, path_data, dist = \
        next(iter(dataloader))  #  we will use the first batch as an example
    num_graphs, max_num_nodes, _ = node_feat.shape
@@ -84,6 +89,6 @@ Grouping the modules above defines the primary components of the Graphormer mode
            attn_bias=attn_bias,
        )
-For simplicity, we omit some details in the forward process. For the complete implementation, please refer to the `Graphormer example <https://github.com/dmlc/dgl/tree/master/examples/core/Graphormer`_.
+For simplicity, we omit some details in the forward process. For the complete implementation, please refer to the `Graphormer example <https://github.com/dmlc/dgl/tree/master/examples/core/Graphormer>`_.
 You can also explore other `utility modules <https://docs.dgl.ai/api/python/nn-pytorch.html#utility-modules-for-graph-transformer>`_ to customize your own graph transformer model. In the next section, we will show how to prepare the data for training.