"examples/git@developer.sourcefind.cn:OpenDAS/dgl.git" did not exist on "701b746b82210a23a8db7b87af080a3a9ec28493"
data.rst 2.5 KB
Newer Older
Mufei Li's avatar
Mufei Li committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
.. _apidata:

Dataset
=======

.. currentmodule:: dgl.data

Utils
-----

.. autosummary::
    :toctree: ../../generated/

    utils.get_download_dir
    utils.download
    utils.check_sha1
    utils.extract_archive
18
19
20
21
    utils.split_dataset

.. autoclass:: dgl.data.utils.Subset
    :members: __getitem__, __len__
Mufei Li's avatar
Mufei Li committed
22
23
24
25
26
27
28
29
30
31

Dataset Classes
---------------

Stanford sentiment treebank dataset
```````````````````````````````````

For more information about the dataset, see `Sentiment Analysis <https://nlp.stanford.edu/sentiment/index.html>`__.

.. autoclass:: SST
32
33
34
35
36
    :members: __getitem__, __len__

Mini graph classification dataset
`````````````````````````````````

37
.. autoclass:: MiniGCDataset
38
    :members: __getitem__, __len__, num_classes
39

kitaev-chen's avatar
kitaev-chen committed
40

VoVAllen's avatar
VoVAllen committed
41
42
43
44
45
46
47
48
Graph kernel dataset
````````````````````

For more information about the dataset, see `Benchmark Data Sets for Graph Kernels <https://ls11-www.cs.tu-dortmund.de/staff/morris/graphkerneldatasets>`__.

.. autoclass:: TUDataset
    :members: __getitem__, __len__

kitaev-chen's avatar
kitaev-chen committed
49
50
51
52
53
54
55
56
57
58

Graph isomorphism network dataset
```````````````````````````````````

A compact subset of graph kernel dataset

.. autoclass:: GINDataset
    :members: __getitem__, __len__


59
60
61
62
63
Protein-Protein Interaction dataset
```````````````````````````````````

.. autoclass:: PPIDataset
    :members: __getitem__, __len__
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114

Molecular Graphs
----------------

To work on molecular graphs, make sure you have installed `RDKit 2018.09.3 <https://www.rdkit.org/docs/Install.html>`__.

Featurization
`````````````

For the use of graph neural networks, we need to featurize nodes (atoms) and edges (bonds). Below we list some
featurization methods/utilities:

.. autosummary::
    :toctree: ../../generated/

    chem.one_hot_encoding
    chem.BaseAtomFeaturizer
    chem.CanonicalAtomFeaturizer

Graph Construction
``````````````````

Several methods for constructing DGLGraphs from SMILES/RDKit molecule objects are listed below:

.. autosummary::
    :toctree: ../../generated/

    chem.mol_to_graph
    chem.smile_to_bigraph
    chem.mol_to_bigraph
    chem.smile_to_complete_graph
    chem.mol_to_complete_graph

Dataset Classes
```````````````

If your dataset is stored in a ``.csv`` file, you may find it helpful to use

.. autoclass:: dgl.data.chem.CSVDataset
    :members: __getitem__, __len__

Currently two datasets are supported:

* Tox21
* TencentAlchemyDataset

.. autoclass:: dgl.data.chem.Tox21
    :members: __getitem__, __len__, task_pos_weights

.. autoclass:: dgl.data.chem.TencentAlchemyDataset
    :members: __getitem__, __len__, set_mean_and_std