[Doc] Fix hyperlink in tutorial of tutorals (#260)

* fix all code tutorial links and typos in texts * sse readme format * fix * sse paper link * gat readme * fix

[Doc] Fix hyperlink in tutorial of tutorals (#260)
* fix all code tutorial links and typos in texts * sse readme format * fix * sse paper link * gat readme * fix
f4a9a455 · Lingfan Yu · GitHub · e557ed89 · f4a9a455 · f4a9a455
Unverified Commit f4a9a455 authored Dec 05, 2018 by Lingfan Yu Committed by GitHub Dec 05, 2018
7 changed files
--- a/examples/mxnet/sse/README.md
+++ b/examples/mxnet/sse/README.md
-# Benchmark SSE on multi-GPUs
+Benchmark SSE on multi-GPUs
-# Use a small embedding.
+=======================
-DGLBACKEND=mxnet python3 examples/mxnet/sse/sse_batch.py --graph-file ../../data/5_5_csr.nd  --n-epochs 1 --lr 0.0005 --batch-size 1024 --use-spmv --dgl --num-parallel-subgraphs 32 --gpu 1 --num-feats 100 --n-hidden 100
-# test convergence
+Paper link:
-DGLBACKEND=mxnet python3 examples/mxnet/sse/sse_batch.py --dataset "pubmed" --n-epochs 100 --lr 0.001 --batch-size 1024 --dgl --use-spmv --neigh-expand 4
+[http://proceedings.mlr.press/v80/dai18a/dai18a.pdf](http://proceedings.mlr.press/v80/dai18a/dai18a.pdf)
+Use a small embedding
+---------------------
+```bash
+DGLBACKEND=mxnet python3 sse_batch.py --graph-file ../../data/5_5_csr.nd \
+                                      --n-epochs 1 \
+                                      --lr 0.0005 \
+                                      --batch-size 1024 \
+                                      --use-spmv \
+                                      --dgl \
+                                      --num-parallel-subgraphs 32 \
+                                      --gpu 1 \
+                                      --num-feats 100 \
+                                      --n-hidden 100
+```
+Test convergence
+----------------
+```bash
+DGLBACKEND=mxnet python3 sse_batch.py --dataset "pubmed" \
+                                      --n-epochs 100 \
+                                      --lr 0.001 \
+                                      --batch-size 1024 \
+                                      --dgl \
+                                      --use-spmv \
+                                      --neigh-expand 4
+```
--- a/examples/pytorch/gat/README.md
+++ b/examples/pytorch/gat/README.md
+Graph Attention Networks (GAT)
+============
+- Paper link: [https://arxiv.org/abs/1710.10903](https://arxiv.org/abs/1710.10903)
+- Author's code repo:
+  [https://github.com/PetarV-/GAT](https://github.com/PetarV-/GAT).
+Note that the original code is implemented with Tensorflow for the paper.
+Results
+-------
+Run with following (available dataset: "cora", "citeseer", "pubmed")
+```bash
+python gat.py --dataset cora --gpu 0 --num-heads 8
+```
--- a/tutorials/models/1_gnn/README.txt
+++ b/tutorials/models/1_gnn/README.txt
@@ -3,32 +3,39 @@
 Graph Neural Network and its variant
 ------------------------------------
-* **GCN** `[paper] <https://arxiv.org/abs/1609.02907>`__ `[tutorial] <models/1_gcn.html>`__
+* **GCN** `[paper] <https://arxiv.org/abs/1609.02907>`__ `[tutorial]
-  `[code] <https://github.com/jermainewang/dgl/blob/master/examples/pytorch/gcn/gcn.py>`__:
+  <1_gnn/1_gcn.html>`__ `[code]
+  <https://github.com/jermainewang/dgl/blob/master/examples/pytorch/gcn>`__:
  this is the vanilla GCN. The tutorial covers the basic uses of DGL APIs.
-* **GAT** `[paper] <https://arxiv.org/abs/1710.10903>`__
+* **GAT** `[paper] <https://arxiv.org/abs/1710.10903>`__ `[code]
-  `[code] <https://github.com/jermainewang/dgl/blob/master/examples/pytorch/gat/gat.py>`__:
+  <https://github.com/jermainewang/dgl/blob/master/examples/pytorch/gat>`__:
  the key extension of GAT w.r.t vanilla GCN is deploying multi-head attention
  among neighborhood of a node, thus greatly enhances the capacity and
  expressiveness of the model.
-* **R-GCN** `[paper] <https://arxiv.org/abs/1703.06103>`__ `[tutorial] <models/4_rgcn.html>`__
+* **R-GCN** `[paper] <https://arxiv.org/abs/1703.06103>`__ `[tutorial]
-  [code (wip)]: the key
+  <1_gnn/4_rgcn.html>`__ `[code]
-  difference of RGNN is to allow multi-edges among two entities of a graph, and
+  <https://github.com/jermainewang/dgl/tree/master/examples/pytorch/rgcn>`__:
-  edges with distinct relationships are encoded differently. This is an
+  the key difference of RGNN is to allow multi-edges among two entities of a
-  interesting extension of GCN that can have a lot of applications of its own.
+  graph, and edges with distinct relationships are encoded differently. This
+  is an interesting extension of GCN that can have a lot of applications of
+  its own.
-* **LGNN** `[paper] <https://arxiv.org/abs/1705.08415>`__ `[tutorial (wip)]` `[code (wip)]`:
+* **LGNN** `[paper] <https://arxiv.org/abs/1705.08415>`__ `[tutorial]
+  <1_gnn/6_line_graph.html>`__ `[code]
+  <https://github.com/jermainewang/dgl/tree/master/examples/pytorch/line_graph>`__:
  this model focuses on community detection by inspecting graph structures. It
-  uses representations of both the orignal graph and its line-graph companion. In
+  uses representations of both the original graph and its line-graph
-  addition to demonstrate how an algorithm can harness multiple graphs, our
+  companion. In addition to demonstrate how an algorithm can harness multiple
-  implementation shows how one can judiciously mix vanilla tensor operation,
+  graphs, our implementation shows how one can judiciously mix vanilla tensor
-  sparse-matrix tensor operations, along with message-passing with DGL.
+  operation, sparse-matrix tensor operations, along with message-passing with
+  DGL.
-* **SSE** `[paper] <http://proceedings.mlr.press/v80/dai18a/dai18a.pdf>`__ `[tutorial (wip)]`
+* **SSE** `[paper] <http://proceedings.mlr.press/v80/dai18a/dai18a.pdf>`__
-  `[code] <https://github.com/jermainewang/dgl/blob/master/examples/mxnet/sse/sse_batch.py>`__:
+  `[tutorial (wip)]` `[code]
+  <https://github.com/jermainewang/dgl/blob/master/examples/mxnet/sse>`__:
  the emphasize here is *giant* graph that cannot fit comfortably on one GPU
-  card. SSE is an example to illustrate the co-design of both algrithm and
+  card. SSE is an example to illustrate the co-design of both algorithm and
-  system: sampling to guarantee asymptotic covergence while lowering the
+  system: sampling to guarantee asymptotic convergence while lowering the
  complexity, and batching across samples for maximum parallelism.
\ No newline at end of file
--- a/tutorials/models/2_small_graph/README.txt
+++ b/tutorials/models/2_small_graph/README.txt
@@ -4,13 +4,14 @@
 Dealing with many small graphs
 ------------------------------
-* **Tree-LSTM** `[paper] <https://arxiv.org/abs/1503.00075>`__ `[tutorial] <models/3_tree-lstm.html>`__
+* **Tree-LSTM** `[paper] <https://arxiv.org/abs/1503.00075>`__ `[tutorial]
-  `[code] <https://github.com/jermainewang/dgl/blob/master/examples/pytorch/tree_lstm/tree_lstm.py>`__:
+  <2_small_graph/3_tree-lstm.html>`__ `[code]
-  sentences of natural languages have inherent structures, which are thrown away
+  <https://github.com/jermainewang/dgl/blob/master/examples/pytorch/tree_lstm>`__:
-  by treating them simply as sequences. Tree-LSTM is a powerful model that learns
+  sentences of natural languages have inherent structures, which are thrown
-  the representation by leveraging prior syntactic structures (e.g. parse-tree).
+  away by treating them simply as sequences. Tree-LSTM is a powerful model
-  The challenge to train it well is that simply by padding a sentence to the
+  that learns the representation by leveraging prior syntactic structures
-  maximum length no longer works, since trees of different sentences have
+  (e.g. parse-tree). The challenge to train it well is that simply by padding
-  different sizes and topologies. DGL solves this problem by throwing the trees
+  a sentence to the maximum length no longer works, since trees of different
-  into a bigger "container" graph, and use message-passing to explore maximum
+  sentences have different sizes and topologies. DGL solves this problem by
-  parallelism. The key API we use is batching.
+  throwing the trees into a bigger "container" graph, and use message-passing
+  to explore maximum parallelism. The key API we use is batching.
--- a/tutorials/models/3_generative_model/README.txt
+++ b/tutorials/models/3_generative_model/README.txt
@@ -3,17 +3,19 @@
 Generative models
 ------------------------------
-* **DGMG** `[paper] <https://arxiv.org/abs/1803.03324>`__ `[tutorial] <models/5_dgmg.html>`__
+* **DGMG** `[paper] <https://arxiv.org/abs/1803.03324>`__ `[tutorial]
-  `[code] <https://github.com/jermainewang/dgl/tree/master/examples/pytorch/dgmg>`__:
+  <3_generative_model/5_dgmg.html>`__ `[code]
+  <https://github.com/jermainewang/dgl/tree/master/examples/pytorch/dgmg>`__:
  this model belongs to the important family that deals with structural
-  generation. DGMG is interesting because its state-machine approach is the most
+  generation. DGMG is interesting because its state-machine approach is the
-  general. It is also very challenging because, unlike Tree-LSTM, every sample
+  most general. It is also very challenging because, unlike Tree-LSTM, every
-  has a dynamic, probability-driven structure that is not available before
+  sample has a dynamic, probability-driven structure that is not available
-  training. We are able to progressively leverage intra- and inter-graph
+  before training. We are able to progressively leverage intra- and
-  parallelism to steadily improve the performance.
+  inter-graph parallelism to steadily improve the performance.
-* **JTNN** `[paper] <https://arxiv.org/abs/1802.04364>`__ `[code (wip)]`: unlike DGMG, this
+* **JTNN** `[paper] <https://arxiv.org/abs/1802.04364>`__ `[code]
-  paper generates molecular graphs using the framework of variational
+  <https://github.com/jermainewang/dgl/tree/master/examples/pytorch/jtnn>`__:
-  auto-encoder. Perhaps more interesting is its approach to build structure
+  unlike DGMG, this paper generates molecular graphs using the framework of
-  hierarchically, in the case of molecular, with junction tree as the middle
+  variational auto-encoder. Perhaps more interesting is its approach to build
-  scaffolding.
+  structure hierarchically, in the case of molecular, with junction tree as
+  the middle scaffolding.
--- a/tutorials/models/4_old_wines/README.txt
+++ b/tutorials/models/4_old_wines/README.txt
@@ -3,18 +3,20 @@
 Old (new) wines in new bottle
 -----------------------------
-* **Capsule** `[paper] <https://arxiv.org/abs/1710.09829>`__ `[tutorial] <models/2_capsule.html>`__
+* **Capsule** `[paper] <https://arxiv.org/abs/1710.09829>`__ `[tutorial]
-  `[code] <https://github.com/jermainewang/dgl/tree/master/examples/pytorch/capsule>`__: this new
+  <4_old_wines/2_capsule.html>`__ `[code]
-  computer vision model has two key ideas -- enhancing the feature representation
+  <https://github.com/jermainewang/dgl/tree/master/examples/pytorch/capsule>`__:
-  in a vector form (instead of a scalar) called *capsule*, and replacing
+  this new computer vision model has two key ideas -- enhancing the feature
-  maxpooling with dynamic routing. The idea of dynamic routing is to integrate a
+  representation in a vector form (instead of a scalar) called *capsule*, and
-  lower level capsule to one (or several) of a higher level one with
+  replacing max-pooling with dynamic routing. The idea of dynamic routing is to
-  non-parametric message-passing. We show how the later can be nicely implemented
+  integrate a lower level capsule to one (or several) of a higher level one
-  with DGL APIs.
+  with non-parametric message-passing. We show how the later can be nicely
+  implemented with DGL APIs.
-* **Transformer** `[paper] <https://arxiv.org/abs/1706.03762>`__ `[tutorial (wip)]` `[code (wip)]` and
+* **Transformer** `[paper] <https://arxiv.org/abs/1706.03762>`__ `[tutorial
-  **Universal Transformer** `[paper] <https://arxiv.org/abs/1807.03819>`__ `[tutorial (wip)]`
+  (wip)]` `[code (wip)]` and **Universal Transformer** `[paper]
-  `[code (wip)]`: these
+  <https://arxiv.org/abs/1807.03819>`__ `[tutorial (wip)]` `[code (wip)]`:
-  two models replace RNN with several layers of multi-head attention to encode
+  these two models replace RNN with several layers of multi-head attention to
-  and discover structures among tokens of a sentence. These attention mechanisms
+  encode and discover structures among tokens of a sentence. These attention
-  can similarly formulated as graph operations with message-passing.
+  mechanisms can similarly formulated as graph operations with
+  message-passing.
--- a/tutorials/models/README.txt
+++ b/tutorials/models/README.txt
@@ -15,4 +15,3 @@ We categorize the models below, providing links to the original code and
 tutorial when appropriate. As will become apparent, these models stress the use
 of different DGL APIs.