"vscode:/vscode.git/clone" did not exist on "3ffe0c09b20e0c0367b7835ebf985198d20910a6"
Unverified Commit c7e3754d authored by Minjie Wang's avatar Minjie Wang Committed by GitHub
Browse files

[Doc] fix colab tutorials (#5205)

* Created using Colaboratory

* Created using Colaboratory

* Created using Colaboratory

* Created using Colaboratory

* minor tweak

* minor fix

* Created using Colaboratory

* Update notebooks/sparse/quickstart.ipynb
parent d965acab
......@@ -4,7 +4,7 @@
TODO(minjie): intro for the new library.
.. toctree::
:maxdepth: 2
:maxdepth: 3
:titlesonly:
quickstart.nblink
......
......@@ -45,16 +45,16 @@
" installed = True\n",
"except ImportError:\n",
" installed = False\n",
"print(\"DGL installed!\" if installed else \"Failed to install DGL!\")"
"print(\"DGL installed!\" if installed else \"DGL not found!\")"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "FTqB360eRvya",
"outputId": "f5cfb27c-82ba-43af-fb58-3fdc62cec193"
"outputId": "df54b94e-fd1b-4b96-fca1-21948284254c"
},
"execution_count": 1,
"execution_count": 2,
"outputs": [
{
"output_type": "stream",
......@@ -75,11 +75,12 @@
"with $\\hat{A} = A + I$, where $A$ denotes the adjacency matrix and $I$ denotes the identity matrix, $\\hat{D}$ refers to the diagonal node degree matrix of $\\hat{A}$ and $W^{(l)}$ denotes a trainable weight matrix. $\\sigma$ refers to a non-linear activation (e.g. relu).\n",
"\n",
"The code below shows how to implement it using the `dgl.sparse` package. The core operations are:\n",
"\n",
"* `dgl.sparse.identity` creates the identity matrix $I$.\n",
"* The augmented adjacency matrix $\\hat{A}$ is then computed by adding the identity matrix to the adjacency matrix $A$.\n",
"* `A_hat.sum(0)` aggregates the augmented adjacency matrix $\\hat{A}$ along the first dimension which gives the degree vector of the augmented graph.\n",
"* `dgl.sparse.diag` creates the diagonal degree matrix $\\hat{D}$ from the degree vector.\n",
"* `D_hat @ A_hat @_hat` computes the convolution matrix which is then multiplied by the linearly transformed node features."
"* `A_hat.sum(0)` aggregates the augmented adjacency matrix $\\hat{A}$ along the first dimension which gives the degree vector of the augmented graph. The diagonal degree matrix $\\hat{D}$ is then created by `dgl.sparse.diag`.\n",
"* Compute $\\hat{D}^{-\\frac{1}{2}}$.\n",
"* `D_hat_invsqrt @ A_hat @ D_hat_invsqrt` computes the convolution matrix which is then multiplied by the linearly transformed node features."
],
"metadata": {
"id": "r3qB1atg_ld0"
......@@ -104,14 +105,16 @@
" # (HIGHLIGHT) Compute the symmetrically normalized adjacency matrix with\n",
" # Sparse Matrix API\n",
" ########################################################################\n",
" A_hat = A + dglsp.identity(A.shape)\n",
" D_hat = dglsp.diag(A_hat.sum(0)) ** -0.5\n",
" return D_hat @ A_hat @ D_hat @ self.W(X)"
" I = dglsp.identity(A.shape)\n",
" A_hat = A + I\n",
" D_hat = dglsp.diag(A_hat.sum(0))\n",
" D_hat_invsqrt = D_hat ** -0.5\n",
" return D_hat_invsqrt @ A_hat @ D_hat_invsqrt @ self.W(X)"
],
"metadata": {
"id": "Y4I4EhHQ_kKb"
},
"execution_count": 2,
"execution_count": 3,
"outputs": []
},
{
......@@ -141,7 +144,7 @@
"metadata": {
"id": "BHX3vRjDWJTO"
},
"execution_count": 3,
"execution_count": 4,
"outputs": []
},
{
......@@ -224,9 +227,9 @@
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "8ea64434-1b03-4c4e-8a07-752b438c9603"
"outputId": "552e2c22-44f4-4495-c7f9-a57f13484270"
},
"execution_count": 4,
"execution_count": 5,
"outputs": [
{
"output_type": "stream",
......@@ -243,26 +246,26 @@
" NumValidationSamples: 500\n",
" NumTestSamples: 1000\n",
"Done saving data into cached files.\n",
"In epoch 0, loss: 1.957, val acc: 0.122, test acc: 0.130\n",
"In epoch 5, loss: 1.932, val acc: 0.200, test acc: 0.210\n",
"In epoch 10, loss: 1.897, val acc: 0.386, test acc: 0.433\n",
"In epoch 15, loss: 1.851, val acc: 0.518, test acc: 0.571\n",
"In epoch 20, loss: 1.788, val acc: 0.542, test acc: 0.569\n",
"In epoch 25, loss: 1.706, val acc: 0.710, test acc: 0.729\n",
"In epoch 30, loss: 1.606, val acc: 0.746, test acc: 0.780\n",
"In epoch 35, loss: 1.491, val acc: 0.756, test acc: 0.787\n",
"In epoch 40, loss: 1.366, val acc: 0.770, test acc: 0.789\n",
"In epoch 45, loss: 1.237, val acc: 0.768, test acc: 0.789\n",
"In epoch 50, loss: 1.111, val acc: 0.772, test acc: 0.795\n",
"In epoch 55, loss: 0.995, val acc: 0.770, test acc: 0.796\n",
"In epoch 60, loss: 0.891, val acc: 0.772, test acc: 0.801\n",
"In epoch 65, loss: 0.801, val acc: 0.776, test acc: 0.806\n",
"In epoch 70, loss: 0.723, val acc: 0.774, test acc: 0.807\n",
"In epoch 75, loss: 0.657, val acc: 0.780, test acc: 0.810\n",
"In epoch 80, loss: 0.600, val acc: 0.782, test acc: 0.811\n",
"In epoch 85, loss: 0.551, val acc: 0.788, test acc: 0.811\n",
"In epoch 90, loss: 0.510, val acc: 0.788, test acc: 0.814\n",
"In epoch 95, loss: 0.475, val acc: 0.788, test acc: 0.819\n"
"In epoch 0, loss: 1.954, val acc: 0.114, test acc: 0.103\n",
"In epoch 5, loss: 1.921, val acc: 0.158, test acc: 0.147\n",
"In epoch 10, loss: 1.878, val acc: 0.288, test acc: 0.283\n",
"In epoch 15, loss: 1.822, val acc: 0.344, test acc: 0.353\n",
"In epoch 20, loss: 1.751, val acc: 0.388, test acc: 0.389\n",
"In epoch 25, loss: 1.663, val acc: 0.406, test acc: 0.410\n",
"In epoch 30, loss: 1.562, val acc: 0.472, test acc: 0.481\n",
"In epoch 35, loss: 1.450, val acc: 0.558, test acc: 0.573\n",
"In epoch 40, loss: 1.333, val acc: 0.636, test acc: 0.641\n",
"In epoch 45, loss: 1.216, val acc: 0.684, test acc: 0.683\n",
"In epoch 50, loss: 1.102, val acc: 0.726, test acc: 0.713\n",
"In epoch 55, loss: 0.996, val acc: 0.740, test acc: 0.740\n",
"In epoch 60, loss: 0.899, val acc: 0.754, test acc: 0.760\n",
"In epoch 65, loss: 0.813, val acc: 0.762, test acc: 0.771\n",
"In epoch 70, loss: 0.737, val acc: 0.768, test acc: 0.781\n",
"In epoch 75, loss: 0.671, val acc: 0.776, test acc: 0.786\n",
"In epoch 80, loss: 0.614, val acc: 0.784, test acc: 0.790\n",
"In epoch 85, loss: 0.566, val acc: 0.780, test acc: 0.788\n",
"In epoch 90, loss: 0.524, val acc: 0.780, test acc: 0.791\n",
"In epoch 95, loss: 0.489, val acc: 0.772, test acc: 0.795\n"
]
}
]
......@@ -270,11 +273,11 @@
{
"cell_type": "markdown",
"source": [
"Check out the full example script [here](https://github.com/dmlc/dgl/blob/master/examples/sparse/gcn.py)."
"*Check out the full example script* [here](https://github.com/dmlc/dgl/blob/master/examples/sparse/gcn.py)."
],
"metadata": {
"id": "yQnJZvE9ZduM"
}
}
]
}
\ No newline at end of file
}
This source diff could not be displayed because it is too large. You can view the blob instead.
......@@ -62,19 +62,21 @@
"source": [
"## Sparse Multi-head Attention\n",
"\n",
"Unlike the all-pairs scaled-dot-product attention mechanism in vanillar Transformer:\n",
"Recall the all-pairs scaled-dot-product attention mechanism in vanillar Transformer:\n",
"\n",
"$$\\text{Attn}=\\text{softmax}(\\dfrac{QK^T} {\\sqrt{d}})V,$$\n",
"\n",
"the graph transformer (GT) model employs a Sparse Multi-head Attention block for message passing:\n",
"The graph transformer (GT) model employs a Sparse Multi-head Attention block:\n",
"\n",
"$$\\text{SparseAttn}(Q, K, V, A) = \\text{softmax}(\\frac{(QK^T) \\circ A}{\\sqrt{d}})V,$$\n",
"\n",
"where $Q, K, V ∈\\mathbb{R}^{N\\times d}$ are query feature, key feature, and value feature, respectively. $A\\in[0,1]^{N\\times N}$ is the adjacency matrix of the input graph. $(QK^T)\\circ A$ denotes Sampled Dense Dense Matrix Multiplication (SDDMM) as shown in the figure below:\n",
"where $Q, K, V ∈\\mathbb{R}^{N\\times d}$ are query feature, key feature, and value feature, respectively. $A\\in[0,1]^{N\\times N}$ is the adjacency matrix of the input graph. $(QK^T)\\circ A$ means that the multiplication of query matrix and key matrix is followed by a Hadamard product (or element-wise multiplication) with the sparse adjacency matrix as illustrated in the figure below:\n",
"\n",
"<img src=\"https://drive.google.com/uc?id=1OgMAewLR3Z1vz5y4J8aPRSeaU3g8iQfX\" width=\"500\">\n",
"\n",
"Only attention scores between connected nodes are computed according to the sparsity of $A$. Enjoying the [batched SDDMM API](https://docs.dgl.ai/en/latest/generated/dgl.sparse.bsddmm.html) in DGL, we can parallel the computation on multiple attention heads (different representation subspaces).\n",
"Essentially, only the attention scores between connected nodes are computed according to the sparsity of $A$. This operation is also called *Sampled Dense Dense Matrix Multiplication (SDDMM)*.\n",
"\n",
"Enjoying the [batched SDDMM API](https://docs.dgl.ai/en/latest/generated/dgl.sparse.bsddmm.html) in DGL, we can parallel the computation on multiple attention heads (different representation subspaces).\n",
"\n"
]
},
......@@ -129,8 +131,9 @@
" ######################################################################\n",
" # (HIGHLIGHT) Compute the multi-head attention with Sparse Matrix API\n",
" ######################################################################\n",
" attn = dglsp.bsddmm(A, q, k.transpose(1, 0)) # [N, N, nh]\n",
" attn = attn.softmax() # [N, N, nh]\n",
" attn = dglsp.bsddmm(A, q, k.transpose(1, 0)) # (sparse) [N, N, nh]\n",
" # Sparse softmax by default applies on the last sparse dimension.\n",
" attn = attn.softmax() # (sparse) [N, N, nh]\n",
" out = dglsp.bspmm(attn, v) # [N, dh, nh]\n",
"\n",
" return self.out_proj(out.reshape(N, -1))"
......@@ -188,11 +191,7 @@
"source": [
"## Graph Transformer Model\n",
"\n",
"The GT model is constructed by stacking GT layers.\n",
"\n",
"The input positional encoding of vanilla transformer is replaced with Laplacian positional encoding [(Dwivedi et al. 2020)](https://arxiv.org/abs/2003.00982).\n",
"\n",
"For the graph-level prediction task, an extra pooler is stacked on top of GT layers to aggregate node feature of the same graph."
"The GT model is constructed by stacking GT layers. The input positional encoding of vanilla transformer is replaced with Laplacian positional encoding [(Dwivedi et al. 2020)](https://arxiv.org/abs/2003.00982). For the graph-level prediction task, an extra pooler is stacked on top of GT layers to aggregate node feature of the same graph."
]
},
{
......@@ -247,7 +246,9 @@
"source": [
"## Training\n",
"\n",
"We train the GT model on [ogbg-molhiv](https://ogb.stanford.edu/docs/graphprop/#ogbg-mol) benchmark. The Laplacian positional encoding of each graph is pre-computed (with the API [here](https://docs.dgl.ai/en/latest/generated/dgl.laplacian_pe.html)) as part of the input to the model."
"We train the GT model on [ogbg-molhiv](https://ogb.stanford.edu/docs/graphprop/#ogbg-mol) benchmark. The Laplacian positional encoding of each graph is pre-computed (with the API [here](https://docs.dgl.ai/en/latest/generated/dgl.laplacian_pe.html)) as part of the input to the model.\n",
"\n",
"*Note that we down-sample the dataset to make this demo runs faster. See the* [*example script*](https://github.com/dmlc/dgl/blob/master/examples/sparse/graph_transformer.py) *for the performance on the full dataset.*"
]
},
{
......@@ -398,15 +399,6 @@
"# Kick off training.\n",
"train(model, dataset, evaluator, dev)"
]
},
{
"cell_type": "markdown",
"source": [
"*Check out the full example script [here](https://github.com/dmlc/dgl/blob/master/examples/sparse/graph_transformer.py).*"
],
"metadata": {
"id": "mifdq1Ftc-Nz"
}
}
],
"metadata": {
......@@ -415,8 +407,7 @@
},
"orig_nbformat": 4,
"colab": {
"provenance": [],
"toc_visible": true
"provenance": []
},
"gpuClass": "standard",
"kernelspec": {
......@@ -427,4 +418,4 @@
},
"nbformat": 4,
"nbformat_minor": 0
}
\ No newline at end of file
}
......@@ -5,8 +5,7 @@
"colab": {
"provenance": [],
"private_outputs": true,
"toc_visible": true,
"include_colab_link": true
"toc_visible": true
},
"kernelspec": {
"name": "python3",
......@@ -15,8 +14,7 @@
"language_info": {
"name": "python"
},
"gpuClass": "standard",
"accelerator": "GPU"
"gpuClass": "standard"
},
"cells": [
{
......@@ -51,7 +49,7 @@
" installed = True\n",
"except ImportError:\n",
" installed = False\n",
"print(\"DGL installed!\" if installed else \"Failed to install DGL!\")"
"print(\"DGL installed!\" if installed else \"DGL not found!\")"
],
"metadata": {
"id": "19UZd7wyWzpT"
......@@ -465,7 +463,7 @@
"\n",
"The supported combinations are shown as follows.\n",
"\n",
"A \\ B | **DiagMatrix**|**SparseMatrix**|**scalar**\n",
"A \\\\ B | **DiagMatrix**|**SparseMatrix**|**scalar**\n",
"----------------|---------------|----------------|----------\n",
"**DiagMatrix** |✅ |✅ |🚫\n",
"**SparseMatrix**|✅ |✅ |🚫\n",
......@@ -520,14 +518,14 @@
{
"cell_type": "markdown",
"source": [
"**sub(A, B) equivalent to A - B**\n",
"**sub(A, B), equivalent to A - B**\n",
"\n",
"The supported combinations are shown as follows.\n",
"\n",
"A \\ B | **DiagMatrix**|**SparseMatrix**|**scalar**\n",
"A \\\\ B | **DiagMatrix**|**SparseMatrix**|**scalar**\n",
"----------------|---------------|----------------|----------\n",
"**DiagMatrix** |✅ |🚫 |🚫\n",
"**SparseMatrix**|🚫 |🚫 |🚫\n",
"**DiagMatrix** |✅ | |🚫\n",
"**SparseMatrix**| | |🚫\n",
"**scalar** |🚫 |🚫 |🚫"
],
"metadata": {
......@@ -586,7 +584,7 @@
"\n",
"The supported combinations are shown as follows.\n",
"\n",
"A \\ B | **DiagMatrix**|**SparseMatrix**|**scalar**\n",
"A \\\\ B | **DiagMatrix**|**SparseMatrix**|**scalar**\n",
"----------------|---------------|----------------|----------\n",
"**DiagMatrix** |✅ |🚫 |✅\n",
"**SparseMatrix**|🚫 |🚫 |✅\n",
......@@ -642,7 +640,7 @@
"\n",
"The supported combinations are shown as follows.\n",
"\n",
"A \\ B | **DiagMatrix**|**SparseMatrix**|**scalar**\n",
"A \\\\ B | **DiagMatrix**|**SparseMatrix**|**scalar**\n",
"----------------|---------------|----------------|----------\n",
"**DiagMatrix** |✅ |🚫 |✅\n",
"**SparseMatrix**|🚫 |🚫 |✅\n",
......@@ -694,7 +692,7 @@
"\n",
"The supported combinations are shown as follows.\n",
"\n",
"A \\ B | **DiagMatrix**|**SparseMatrix**|**scalar**\n",
"A \\\\ B | **DiagMatrix**|**SparseMatrix**|**scalar**\n",
"----------------|---------------|----------------|----------\n",
"**DiagMatrix** |🚫 |🚫 |✅\n",
"**SparseMatrix**|🚫 |🚫 |✅\n",
......@@ -878,10 +876,10 @@
"\n",
"The supported combinations are shown as follows.\n",
"\n",
"A \\ B | **Tensor**|**DiagMatrix**|**SparseMatrix**\n",
"A \\\\ B | **Tensor**|**DiagMatrix**|**SparseMatrix**\n",
"----------------|-----------|--------------|----------\n",
"**Tensor** |✅ |🚫 |🚫\n",
"**DiagMatrix**. |✅ |✅ |✅\n",
"**DiagMatrix** |✅ |✅ |✅\n",
"**SparseMatrix**|✅ |✅ |✅"
],
"metadata": {
......@@ -1023,7 +1021,7 @@
{
"cell_type": "markdown",
"source": [
"**Sampled-Dense-Dense Matrix Multiplication**\n",
"**Sampled-Dense-Dense Matrix Multiplication (SDDMM)**\n",
"\n",
"``sddmm`` matrix-multiplies two dense matrices X1 and X2, then elementwise-multiplies the result with sparse matrix A at the nonzero locations. This is designed for sparse matrix with scalar values.\n",
"\n",
......@@ -1101,18 +1099,76 @@
{
"cell_type": "markdown",
"source": [
"## Non-linear activation functions\n",
"## Non-linear activation functions"
],
"metadata": {
"id": "fVkbTT28ZzPr"
}
},
{
"cell_type": "markdown",
"source": [
"### Element-wise functions\n",
"\n",
"* `softmax()`"
"Most activation functions are element-wise and can be further grouped into two categories:\n",
"\n",
"**Sparse-preserving functions** such as `sin()`, `tanh()`, `sigmoid()`, `relu()`, etc. You can directly apply them on the `val` tensor of the sparse matrix and then recreate a new matrix of the same sparsity using `val_like`."
],
"metadata": {
"id": "XuaNdFO7XG2r"
}
},
{
"cell_type": "code",
"source": [
"row = torch.tensor([0, 1, 1, 2])\n",
"col = torch.tensor([1, 0, 2, 0])\n",
"val = torch.randn(4)\n",
"A = dglsp.from_coo(row, col, val)\n",
"print(A.to_dense())\n",
"\n",
"print(\"Apply tanh.\")\n",
"A_new = dglsp.val_like(A, torch.tanh(A.val))\n",
"print(A_new.to_dense())"
],
"metadata": {
"id": "GZkCJJ0TX0cI"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"**Non-sparse-preserving functions** such as `exp()`, `cos()`, etc. You can first convert the sparse matrix to dense before applying the functions."
],
"metadata": {
"id": "i92lhMEnYas3"
}
},
{
"cell_type": "code",
"source": [
"row = torch.tensor([0, 1, 1, 2])\n",
"col = torch.tensor([1, 0, 2, 0])\n",
"val = torch.randn(4)\n",
"A = dglsp.from_coo(row, col, val)\n",
"print(A.to_dense())\n",
"\n",
"print(\"Apply exp.\")\n",
"A_new = A.to_dense().exp()\n",
"print(A_new)"
],
"metadata": {
"id": "sroJpzRNYZq5"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"**Softmax**\n",
"### Softmax\n",
"\n",
"Apply row-wise softmax to the nonzero entries of the sparse matrix."
],
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment