"[](https://colab.research.google.com/github/dmlc/dgl/blob/master/notebooks/stochastic_training/ondisk_dataset_heterograph.ipynb) [](https://github.com/dmlc/dgl/blob/master/notebooks/stochastic_training/ondisk_dataset_heterograph.ipynb)\n",
"[](https://colab.research.google.com/github/dmlc/dgl/blob/master/notebooks/stochastic_training/ondisk_dataset_heterograph.ipynb) [](https://github.com/dmlc/dgl/blob/master/notebooks/stochastic_training/ondisk_dataset_heterograph.ipynb)\n",
"\n",
"\n",
"This tutorial shows how to create `OnDiskDataset` for heterogeneous graph that could be used in **GraphBolt** framework.\n",
"This tutorial shows how to create `OnDiskDataset` for heterogeneous graph that could be used in **GraphBolt** framework. The major difference from creating dataset for homogeneous graph is that we need to specify node/edge types for edges, feature data, training/validation/test sets.\n",
"\n",
"\n",
"By the end of this tutorial, you will be able to\n",
"By the end of this tutorial, you will be able to\n",
"- organize graph structure data.\n",
"- organize graph structure data.\n",
...
@@ -102,10 +102,10 @@
...
@@ -102,10 +102,10 @@
"cell_type": "markdown",
"cell_type": "markdown",
"source": [
"source": [
"### Generate graph structure data\n",
"### Generate graph structure data\n",
"For heterogeneous graph, we just need to save edges(namely node pairs) into **CSV** file.\n",
"For heterogeneous graph, we need to save different edge edges(namely node pairs) into separate **CSV** files.\n",
"\n",
"\n",
"Note:\n",
"Note:\n",
"when saving to file, do not save index and header.*italicized text*\n"
"when saving to file, do not save index and header.\n"
],
],
"metadata": {
"metadata": {
"id": "qhNtIn_xhlnl"
"id": "qhNtIn_xhlnl"
...
@@ -116,17 +116,31 @@
...
@@ -116,17 +116,31 @@
"source": [
"source": [
"import numpy as np\n",
"import numpy as np\n",
"import pandas as pd\n",
"import pandas as pd\n",
"\n",
"# For simplicity, we create a heterogeneous graph with\n",
"print(f\"LP test negative dsts[user:follow:user] are saved to {lp_test_follow_neg_dsts_path}\")"
],
],
"metadata": {
"metadata": {
"id": "u0jCnXIcAQy4"
"id": "u0jCnXIcAQy4"
...
@@ -310,7 +445,9 @@
...
@@ -310,7 +445,9 @@
"cell_type": "markdown",
"cell_type": "markdown",
"source": [
"source": [
"## Organize Data into YAML File\n",
"## Organize Data into YAML File\n",
"Now we need to create a `metadata.yaml` file which contains the paths, dadta types of graph structure, feature data, training/validation/test sets. Please note that all path should be relative to `metadata.yaml`."
"Now we need to create a `metadata.yaml` file which contains the paths, dadta types of graph structure, feature data, training/validation/test sets. Please note that all path should be relative to `metadata.yaml`.\n",
"\n",
"For heterogeneous graph, we need to specify the node/edge type in **type** fields. For edge type, canonical etype is required which is a string that's concatenated by source node type, etype, and destination node type together with `:`."