Unverified Commit a7b5085a authored by esang's avatar esang Committed by GitHub
Browse files

Add a note about the order of TUDataset (#3549)

parent cd6d1138
...@@ -34,6 +34,14 @@ class LegacyTUDataset(DGLBuiltinDataset): ...@@ -34,6 +34,14 @@ class LegacyTUDataset(DGLBuiltinDataset):
num_labels : int num_labels : int
Number of classes Number of classes
Notes
-----
LegacyTUDataset uses provided node feature by default. If no feature provided, it uses one-hot node label instead.
If neither labels provided, it uses constant for node feature.
The dataset sorts graphs by their labels.
Shuffle is preferred before manual train/val split.
Examples Examples
-------- --------
>>> data = LegacyTUDataset('DD') >>> data = LegacyTUDataset('DD')
...@@ -59,11 +67,6 @@ class LegacyTUDataset(DGLBuiltinDataset): ...@@ -59,11 +67,6 @@ class LegacyTUDataset(DGLBuiltinDataset):
Graph(num_nodes=9539, num_edges=47382, Graph(num_nodes=9539, num_edges=47382,
ndata_schemes={'feat': Scheme(shape=(89,), dtype=torch.float32), '_ID': Scheme(shape=(), dtype=torch.int64)} ndata_schemes={'feat': Scheme(shape=(89,), dtype=torch.float32), '_ID': Scheme(shape=(), dtype=torch.int64)}
edata_schemes={'_ID': Scheme(shape=(), dtype=torch.int64)}) edata_schemes={'_ID': Scheme(shape=(), dtype=torch.int64)})
Notes
-----
LegacyTUDataset uses provided node feature by default. If no feature provided, it uses one-hot node label instead.
If neither labels provided, it uses constant for node feature.
""" """
_url = r"https://www.chrsmrrs.com/graphkerneldatasets/{}.zip" _url = r"https://www.chrsmrrs.com/graphkerneldatasets/{}.zip"
...@@ -259,6 +262,18 @@ class TUDataset(DGLBuiltinDataset): ...@@ -259,6 +262,18 @@ class TUDataset(DGLBuiltinDataset):
as per the original data. Other frameworks such as PyTorch Geometric removes the as per the original data. Other frameworks such as PyTorch Geometric removes the
duplicates by default. You can remove the duplicate edges with :func:`dgl.to_simple`. duplicates by default. You can remove the duplicate edges with :func:`dgl.to_simple`.
Graphs may have node labels, node attributes, edge labels, and edge attributes,
varing from different dataset.
Labels are mapped to :math:`\lbrace 0,\cdots,n-1 \rbrace` where :math:`n` is the
number of labels (some datasets have raw labels :math:`\lbrace -1, 1 \rbrace` which
will be mapped to :math:`\lbrace 0, 1 \rbrace`). In previous versions, the minimum
label was added so that :math:`\lbrace -1, 1 \rbrace` was mapped to
:math:`\lbrace 0, 2 \rbrace`.
The dataset sorts graphs by their labels.
Shuffle is preferred before manual train/val split.
Examples Examples
-------- --------
>>> data = TUDataset('DD') >>> data = TUDataset('DD')
...@@ -285,16 +300,6 @@ class TUDataset(DGLBuiltinDataset): ...@@ -285,16 +300,6 @@ class TUDataset(DGLBuiltinDataset):
ndata_schemes={'node_labels': Scheme(shape=(1,), dtype=torch.int64), '_ID': Scheme(shape=(), dtype=torch.int64)} ndata_schemes={'node_labels': Scheme(shape=(1,), dtype=torch.int64), '_ID': Scheme(shape=(), dtype=torch.int64)}
edata_schemes={'_ID': Scheme(shape=(), dtype=torch.int64)}) edata_schemes={'_ID': Scheme(shape=(), dtype=torch.int64)})
Notes
-----
Graphs may have node labels, node attributes, edge labels, and edge attributes,
varing from different dataset.
Labels are mapped to :math:`\lbrace 0,\cdots,n-1 \rbrace` where :math:`n` is the
number of labels (some datasets have raw labels :math:`\lbrace -1, 1 \rbrace` which
will be mapped to :math:`\lbrace 0, 1 \rbrace`). In previous versions, the minimum
label was added so that :math:`\lbrace -1, 1 \rbrace` was mapped to
:math:`\lbrace 0, 2 \rbrace`.
""" """
_url = r"https://www.chrsmrrs.com/graphkerneldatasets/{}.zip" _url = r"https://www.chrsmrrs.com/graphkerneldatasets/{}.zip"
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment