[Doc] fix colab tutorials (#5205)

* Created using Colaboratory * Created using Colaboratory * Created using Colaboratory * Created using Colaboratory * minor tweak * minor fix * Created using Colaboratory * Update notebooks/sparse/quickstart.ipynb

[Doc] fix colab tutorials (#5205)
* Created using Colaboratory * Created using Colaboratory * Created using Colaboratory * Created using Colaboratory * minor tweak * minor fix * Created using Colaboratory * Update notebooks/sparse/quickstart.ipynb
c7e3754d · Minjie Wang · GitHub · d965acab · c7e3754d · c7e3754d
Unverified Commit c7e3754d authored Jan 19, 2023 by Minjie Wang Committed by GitHub Jan 19, 2023
5 changed files
--- a/docs/source/notebooks/sparse/index.rst
+++ b/docs/source/notebooks/sparse/index.rst
@@ -4,7 +4,7 @@
 TODO(minjie): intro for the new library.

 .. toctree::
-  :maxdepth: 2
+  :maxdepth: 3
  :titlesonly:

  quickstart.nblink

--- a/notebooks/sparse/gcn.ipynb
+++ b/notebooks/sparse/gcn.ipynb
@@ -45,16 +45,16 @@
        "    installed = True\n",
        "except ImportError:\n",
        "    installed = False\n",
-        "print(\"DGL installed!\" if installed else \"Failed to install DGL!\")"
+        "print(\"DGL installed!\" if installed else \"DGL not found!\")"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "FTqB360eRvya",
-        "outputId": "f5cfb27c-82ba-43af-fb58-3fdc62cec193"
+        "outputId": "df54b94e-fd1b-4b96-fca1-21948284254c"
      },
-      "execution_count": 1,
+      "execution_count": 2,
      "outputs": [
        {
          "output_type": "stream",
@@ -75,11 +75,12 @@
        "with $\\hat{A} = A + I$, where $A$ denotes the adjacency matrix and $I$ denotes the identity matrix, $\\hat{D}$ refers to the diagonal node degree matrix of $\\hat{A}$ and $W^{(l)}$ denotes a trainable weight matrix. $\\sigma$ refers to a non-linear activation (e.g. relu).\n",
        "\n",
        "The code below shows how to implement it using the `dgl.sparse` package. The core operations are:\n",
+        "\n",
        "* `dgl.sparse.identity` creates the identity matrix $I$.\n",
        "* The augmented adjacency matrix $\\hat{A}$ is then computed by adding the identity matrix to the adjacency matrix $A$.\n",
-        "* `A_hat.sum(0)` aggregates the augmented adjacency matrix $\\hat{A}$ along the first dimension which gives the degree vector of the augmented graph.\n",
-        "* `dgl.sparse.diag` creates the diagonal degree matrix $\\hat{D}$ from the degree vector.\n",
-        "* `D_hat @ A_hat @_hat` computes the convolution matrix which is then multiplied by the linearly transformed node features."
+        "* `A_hat.sum(0)` aggregates the augmented adjacency matrix $\\hat{A}$ along the first dimension which gives the degree vector of the augmented graph. The diagonal degree matrix $\\hat{D}$ is then created by `dgl.sparse.diag`.\n",
+        "* Compute $\\hat{D}^{-\\frac{1}{2}}$.\n",
+        "* `D_hat_invsqrt @ A_hat @ D_hat_invsqrt` computes the convolution matrix which is then multiplied by the linearly transformed node features."
      ],
      "metadata": {
        "id": "r3qB1atg_ld0"
@@ -104,14 +105,16 @@
        "        # (HIGHLIGHT) Compute the symmetrically normalized adjacency matrix with\n",
        "        # Sparse Matrix API\n",
        "        ########################################################################\n",
-        "        A_hat = A + dglsp.identity(A.shape)\n",
-        "        D_hat = dglsp.diag(A_hat.sum(0)) ** -0.5\n",
-        "        return D_hat @ A_hat @ D_hat @ self.W(X)"
+        "        I = dglsp.identity(A.shape)\n",
+        "        A_hat = A + I\n",
+        "        D_hat = dglsp.diag(A_hat.sum(0))\n",
+        "        D_hat_invsqrt = D_hat ** -0.5\n",
+        "        return D_hat_invsqrt @ A_hat @ D_hat_invsqrt @ self.W(X)"
      ],
      "metadata": {
        "id": "Y4I4EhHQ_kKb"
      },
-      "execution_count": 2,
+      "execution_count": 3,
      "outputs": []
    },
    {
@@ -141,7 +144,7 @@
      "metadata": {
        "id": "BHX3vRjDWJTO"
      },
-      "execution_count": 3,
+      "execution_count": 4,
      "outputs": []
    },
    {
@@ -224,9 +227,9 @@
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
-        "outputId": "8ea64434-1b03-4c4e-8a07-752b438c9603"
+        "outputId": "552e2c22-44f4-4495-c7f9-a57f13484270"
      },
-      "execution_count": 4,
+      "execution_count": 5,
      "outputs": [
        {
          "output_type": "stream",
@@ -243,26 +246,26 @@
            "  NumValidationSamples: 500\n",
            "  NumTestSamples: 1000\n",
            "Done saving data into cached files.\n",
-            "In epoch 0, loss: 1.957, val acc: 0.122, test acc: 0.130\n",
-            "In epoch 5, loss: 1.932, val acc: 0.200, test acc: 0.210\n",
-            "In epoch 10, loss: 1.897, val acc: 0.386, test acc: 0.433\n",
-            "In epoch 15, loss: 1.851, val acc: 0.518, test acc: 0.571\n",
-            "In epoch 20, loss: 1.788, val acc: 0.542, test acc: 0.569\n",
-            "In epoch 25, loss: 1.706, val acc: 0.710, test acc: 0.729\n",
-            "In epoch 30, loss: 1.606, val acc: 0.746, test acc: 0.780\n",
-            "In epoch 35, loss: 1.491, val acc: 0.756, test acc: 0.787\n",
-            "In epoch 40, loss: 1.366, val acc: 0.770, test acc: 0.789\n",
-            "In epoch 45, loss: 1.237, val acc: 0.768, test acc: 0.789\n",
-            "In epoch 50, loss: 1.111, val acc: 0.772, test acc: 0.795\n",
-            "In epoch 55, loss: 0.995, val acc: 0.770, test acc: 0.796\n",
-            "In epoch 60, loss: 0.891, val acc: 0.772, test acc: 0.801\n",
-            "In epoch 65, loss: 0.801, val acc: 0.776, test acc: 0.806\n",
-            "In epoch 70, loss: 0.723, val acc: 0.774, test acc: 0.807\n",
-            "In epoch 75, loss: 0.657, val acc: 0.780, test acc: 0.810\n",
-            "In epoch 80, loss: 0.600, val acc: 0.782, test acc: 0.811\n",
-            "In epoch 85, loss: 0.551, val acc: 0.788, test acc: 0.811\n",
-            "In epoch 90, loss: 0.510, val acc: 0.788, test acc: 0.814\n",
-            "In epoch 95, loss: 0.475, val acc: 0.788, test acc: 0.819\n"
+            "In epoch 0, loss: 1.954, val acc: 0.114, test acc: 0.103\n",
+            "In epoch 5, loss: 1.921, val acc: 0.158, test acc: 0.147\n",
+            "In epoch 10, loss: 1.878, val acc: 0.288, test acc: 0.283\n",
+            "In epoch 15, loss: 1.822, val acc: 0.344, test acc: 0.353\n",
+            "In epoch 20, loss: 1.751, val acc: 0.388, test acc: 0.389\n",
+            "In epoch 25, loss: 1.663, val acc: 0.406, test acc: 0.410\n",
+            "In epoch 30, loss: 1.562, val acc: 0.472, test acc: 0.481\n",
+            "In epoch 35, loss: 1.450, val acc: 0.558, test acc: 0.573\n",
+            "In epoch 40, loss: 1.333, val acc: 0.636, test acc: 0.641\n",
+            "In epoch 45, loss: 1.216, val acc: 0.684, test acc: 0.683\n",
+            "In epoch 50, loss: 1.102, val acc: 0.726, test acc: 0.713\n",
+            "In epoch 55, loss: 0.996, val acc: 0.740, test acc: 0.740\n",
+            "In epoch 60, loss: 0.899, val acc: 0.754, test acc: 0.760\n",
+            "In epoch 65, loss: 0.813, val acc: 0.762, test acc: 0.771\n",
+            "In epoch 70, loss: 0.737, val acc: 0.768, test acc: 0.781\n",
+            "In epoch 75, loss: 0.671, val acc: 0.776, test acc: 0.786\n",
+            "In epoch 80, loss: 0.614, val acc: 0.784, test acc: 0.790\n",
+            "In epoch 85, loss: 0.566, val acc: 0.780, test acc: 0.788\n",
+            "In epoch 90, loss: 0.524, val acc: 0.780, test acc: 0.791\n",
+            "In epoch 95, loss: 0.489, val acc: 0.772, test acc: 0.795\n"
          ]
        }
      ]
@@ -270,11 +273,11 @@
    {
      "cell_type": "markdown",
      "source": [
-        "Check out the full example script [here](https://github.com/dmlc/dgl/blob/master/examples/sparse/gcn.py)."
+        "*Check out the full example script* [here](https://github.com/dmlc/dgl/blob/master/examples/sparse/gcn.py)."
      ],
      "metadata": {
        "id": "yQnJZvE9ZduM"
      }
    }
  ]
-}
\ No newline at end of file
+}
--- a/notebooks/sparse/graph_diffusion.ipynb
+++ b/notebooks/sparse/graph_diffusion.ipynb
--- a/notebooks/sparse/graph_transformer.ipynb
+++ b/notebooks/sparse/graph_transformer.ipynb
@@ -62,19 +62,21 @@
      "source": [
        "## Sparse Multi-head Attention\n",
        "\n",
-        "Unlike the all-pairs scaled-dot-product attention mechanism in vanillar Transformer:\n",
+        "Recall the all-pairs scaled-dot-product attention mechanism in vanillar Transformer:\n",
        "\n",
        "$$\\text{Attn}=\\text{softmax}(\\dfrac{QK^T} {\\sqrt{d}})V,$$\n",
        "\n",
-        "the graph transformer (GT) model employs a Sparse Multi-head Attention block for message passing:\n",
+        "The graph transformer (GT) model employs a Sparse Multi-head Attention block:\n",
        "\n",
        "$$\\text{SparseAttn}(Q, K, V, A) = \\text{softmax}(\\frac{(QK^T) \\circ A}{\\sqrt{d}})V,$$\n",
        "\n",
-        "where $Q, K, V ∈\\mathbb{R}^{N\\times d}$ are query feature, key feature, and value feature, respectively. $A\\in[0,1]^{N\\times N}$ is the adjacency matrix of the input graph. $(QK^T)\\circ A$ denotes Sampled Dense Dense Matrix Multiplication (SDDMM) as shown in the figure below:\n",
+        "where $Q, K, V ∈\\mathbb{R}^{N\\times d}$ are query feature, key feature, and value feature, respectively. $A\\in[0,1]^{N\\times N}$ is the adjacency matrix of the input graph. $(QK^T)\\circ A$ means that the multiplication of query matrix and key matrix is followed by a Hadamard product (or element-wise multiplication) with the sparse adjacency matrix as illustrated in the figure below:\n",
        "\n",
        "<img src=\"https://drive.google.com/uc?id=1OgMAewLR3Z1vz5y4J8aPRSeaU3g8iQfX\" width=\"500\">\n",
        "\n",
-        "Only attention scores between connected nodes are computed according to the sparsity of $A$. Enjoying the [batched SDDMM API](https://docs.dgl.ai/en/latest/generated/dgl.sparse.bsddmm.html) in DGL, we can parallel the computation on multiple attention heads (different representation subspaces).\n",
+        "Essentially, only the attention scores between connected nodes are computed according to the sparsity of $A$. This operation is also called *Sampled Dense Dense Matrix Multiplication (SDDMM)*.\n",
+        "\n",
+        "Enjoying the [batched SDDMM API](https://docs.dgl.ai/en/latest/generated/dgl.sparse.bsddmm.html) in DGL, we can parallel the computation on multiple attention heads (different representation subspaces).\n",
        "\n"
      ]
    },
@@ -129,8 +131,9 @@
        "        ######################################################################\n",
        "        # (HIGHLIGHT) Compute the multi-head attention with Sparse Matrix API\n",
        "        ######################################################################\n",
-        "        attn = dglsp.bsddmm(A, q, k.transpose(1, 0))  # [N, N, nh]\n",
-        "        attn = attn.softmax()  # [N, N, nh]\n",
+        "        attn = dglsp.bsddmm(A, q, k.transpose(1, 0))  # (sparse) [N, N, nh]\n",
+        "        # Sparse softmax by default applies on the last sparse dimension.\n",
+        "        attn = attn.softmax()  # (sparse) [N, N, nh]\n",
        "        out = dglsp.bspmm(attn, v)  # [N, dh, nh]\n",
        "\n",
        "        return self.out_proj(out.reshape(N, -1))"
@@ -188,11 +191,7 @@
      "source": [
        "## Graph Transformer Model\n",
        "\n",
-        "The GT model is constructed by stacking GT layers.\n",
-        "\n",
-        "The input positional encoding of vanilla transformer is replaced with Laplacian positional encoding [(Dwivedi et al. 2020)](https://arxiv.org/abs/2003.00982).\n",
-        "\n",
-        "For the graph-level prediction task, an extra pooler is stacked on top of GT layers to aggregate node feature of the same graph."
+        "The GT model is constructed by stacking GT layers. The input positional encoding of vanilla transformer is replaced with Laplacian positional encoding [(Dwivedi et al. 2020)](https://arxiv.org/abs/2003.00982). For the graph-level prediction task, an extra pooler is stacked on top of GT layers to aggregate node feature of the same graph."
      ]
    },
    {
@@ -247,7 +246,9 @@
      "source": [
        "## Training\n",
        "\n",
-        "We train the GT model on [ogbg-molhiv](https://ogb.stanford.edu/docs/graphprop/#ogbg-mol) benchmark. The Laplacian positional encoding of each graph is pre-computed (with the API [here](https://docs.dgl.ai/en/latest/generated/dgl.laplacian_pe.html)) as part of the input to the model."
+        "We train the GT model on [ogbg-molhiv](https://ogb.stanford.edu/docs/graphprop/#ogbg-mol) benchmark. The Laplacian positional encoding of each graph is pre-computed (with the API [here](https://docs.dgl.ai/en/latest/generated/dgl.laplacian_pe.html)) as part of the input to the model.\n",
+        "\n",
+        "*Note that we down-sample the dataset to make this demo runs faster. See the* [*example script*](https://github.com/dmlc/dgl/blob/master/examples/sparse/graph_transformer.py) *for the performance on the full dataset.*"
      ]
    },
    {
@@ -398,15 +399,6 @@
        "# Kick off training.\n",
        "train(model, dataset, evaluator, dev)"
      ]
-    },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "*Check out the full example script [here](https://github.com/dmlc/dgl/blob/master/examples/sparse/graph_transformer.py).*"
-      ],
-      "metadata": {
-        "id": "mifdq1Ftc-Nz"
-      }
    }
  ],
  "metadata": {
@@ -415,8 +407,7 @@
    },
    "orig_nbformat": 4,
    "colab": {
-      "provenance": [],
-      "toc_visible": true
+      "provenance": []
    },
    "gpuClass": "standard",
    "kernelspec": {
@@ -427,4 +418,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 0
-}
\ No newline at end of file
+}
--- a/notebooks/sparse/quickstart.ipynb
+++ b/notebooks/sparse/quickstart.ipynb
@@ -5,8 +5,7 @@
    "colab": {
      "provenance": [],
      "private_outputs": true,
-      "toc_visible": true,
-      "include_colab_link": true
+      "toc_visible": true
    },
    "kernelspec": {
      "name": "python3",
@@ -15,8 +14,7 @@
    "language_info": {
      "name": "python"
    },
-    "gpuClass": "standard",
-    "accelerator": "GPU"
+    "gpuClass": "standard"
  },
  "cells": [
    {
@@ -51,7 +49,7 @@
        "    installed = True\n",
        "except ImportError:\n",
        "    installed = False\n",
-        "print(\"DGL installed!\" if installed else \"Failed to install DGL!\")"
+        "print(\"DGL installed!\" if installed else \"DGL not found!\")"
      ],
      "metadata": {
        "id": "19UZd7wyWzpT"
@@ -465,7 +463,7 @@
        "\n",
        "The supported combinations are shown as follows.\n",
        "\n",
-        "A \\ B           | **DiagMatrix**|**SparseMatrix**|**scalar**\n",
+        "A \\\\ B          | **DiagMatrix**|**SparseMatrix**|**scalar**\n",
        "----------------|---------------|----------------|----------\n",
        "**DiagMatrix**  |✅             |✅              |🚫\n",
        "**SparseMatrix**|✅             |✅              |🚫\n",
@@ -520,14 +518,14 @@
    {
      "cell_type": "markdown",
      "source": [
-        "**sub(A, B) equivalent to A - B**\n",
+        "**sub(A, B), equivalent to A - B**\n",
        "\n",
        "The supported combinations are shown as follows.\n",
        "\n",
-        "A \\ B           | **DiagMatrix**|**SparseMatrix**|**scalar**\n",
+        "A \\\\ B          | **DiagMatrix**|**SparseMatrix**|**scalar**\n",
        "----------------|---------------|----------------|----------\n",
-        "**DiagMatrix**  |✅             |🚫              |🚫\n",
-        "**SparseMatrix**|🚫             |🚫              |🚫\n",
+        "**DiagMatrix**  |✅             |✅              |🚫\n",
+        "**SparseMatrix**|✅             |✅              |🚫\n",
        "**scalar**      |🚫             |🚫              |🚫"
      ],
      "metadata": {
@@ -586,7 +584,7 @@
        "\n",
        "The supported combinations are shown as follows.\n",
        "\n",
-        "A \\ B           | **DiagMatrix**|**SparseMatrix**|**scalar**\n",
+        "A \\\\ B          | **DiagMatrix**|**SparseMatrix**|**scalar**\n",
        "----------------|---------------|----------------|----------\n",
        "**DiagMatrix**  |✅             |🚫              |✅\n",
        "**SparseMatrix**|🚫             |🚫              |✅\n",
@@ -642,7 +640,7 @@
        "\n",
        "The supported combinations are shown as follows.\n",
        "\n",
-        "A \\ B           | **DiagMatrix**|**SparseMatrix**|**scalar**\n",
+        "A \\\\ B          | **DiagMatrix**|**SparseMatrix**|**scalar**\n",
        "----------------|---------------|----------------|----------\n",
        "**DiagMatrix**  |✅             |🚫              |✅\n",
        "**SparseMatrix**|🚫             |🚫              |✅\n",
@@ -694,7 +692,7 @@
        "\n",
        "The supported combinations are shown as follows.\n",
        "\n",
-        "A \\ B           | **DiagMatrix**|**SparseMatrix**|**scalar**\n",
+        "A \\\\ B          | **DiagMatrix**|**SparseMatrix**|**scalar**\n",
        "----------------|---------------|----------------|----------\n",
        "**DiagMatrix**  |🚫             |🚫              |✅\n",
        "**SparseMatrix**|🚫             |🚫              |✅\n",
@@ -878,10 +876,10 @@
        "\n",
        "The supported combinations are shown as follows.\n",
        "\n",
-        "A \\ B           | **Tensor**|**DiagMatrix**|**SparseMatrix**\n",
+        "A \\\\ B          | **Tensor**|**DiagMatrix**|**SparseMatrix**\n",
        "----------------|-----------|--------------|----------\n",
        "**Tensor**      |✅         |🚫            |🚫\n",
-        "**DiagMatrix**. |✅         |✅            |✅\n",
+        "**DiagMatrix**  |✅         |✅            |✅\n",
        "**SparseMatrix**|✅         |✅            |✅"
      ],
      "metadata": {
@@ -1023,7 +1021,7 @@
    {
      "cell_type": "markdown",
      "source": [
-        "**Sampled-Dense-Dense Matrix Multiplication**\n",
+        "**Sampled-Dense-Dense Matrix Multiplication (SDDMM)**\n",
        "\n",
        "``sddmm`` matrix-multiplies two dense matrices X1 and X2, then elementwise-multiplies the result with sparse matrix A at the nonzero locations. This is designed for sparse matrix with scalar values.\n",
        "\n",
@@ -1101,18 +1099,76 @@
    {
      "cell_type": "markdown",
      "source": [
-        "## Non-linear activation functions\n",
+        "## Non-linear activation functions"
+      ],
+      "metadata": {
+        "id": "fVkbTT28ZzPr"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Element-wise functions\n",
        "\n",
-        "*   `softmax()`"
+        "Most activation functions are element-wise and can be further grouped into two categories:\n",
+        "\n",
+        "**Sparse-preserving functions** such as `sin()`, `tanh()`, `sigmoid()`, `relu()`, etc. You can directly apply them on the `val` tensor of the sparse matrix and then recreate a new matrix of the same sparsity using `val_like`."
      ],
      "metadata": {
        "id": "XuaNdFO7XG2r"
      }
    },
+    {
+      "cell_type": "code",
+      "source": [
+        "row = torch.tensor([0, 1, 1, 2])\n",
+        "col = torch.tensor([1, 0, 2, 0])\n",
+        "val = torch.randn(4)\n",
+        "A = dglsp.from_coo(row, col, val)\n",
+        "print(A.to_dense())\n",
+        "\n",
+        "print(\"Apply tanh.\")\n",
+        "A_new = dglsp.val_like(A, torch.tanh(A.val))\n",
+        "print(A_new.to_dense())"
+      ],
+      "metadata": {
+        "id": "GZkCJJ0TX0cI"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "**Non-sparse-preserving functions** such as `exp()`, `cos()`, etc. You can first convert the sparse matrix to dense before applying the functions."
+      ],
+      "metadata": {
+        "id": "i92lhMEnYas3"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "row = torch.tensor([0, 1, 1, 2])\n",
+        "col = torch.tensor([1, 0, 2, 0])\n",
+        "val = torch.randn(4)\n",
+        "A = dglsp.from_coo(row, col, val)\n",
+        "print(A.to_dense())\n",
+        "\n",
+        "print(\"Apply exp.\")\n",
+        "A_new = A.to_dense().exp()\n",
+        "print(A_new)"
+      ],
+      "metadata": {
+        "id": "sroJpzRNYZq5"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
    {
      "cell_type": "markdown",
      "source": [
-        "**Softmax**\n",
+        "### Softmax\n",
        "\n",
        "Apply row-wise softmax to the nonzero entries of the sparse matrix."
      ],