"git@developer.sourcefind.cn:OpenDAS/mmcv.git" did not exist on "eadd1e0f4f28242c698fe325bd41970bf30feea4"
data.rst 6.44 KB
Newer Older
Mufei Li's avatar
Mufei Li committed
1
2
.. _apidata:

3
4
dgl.data
=========
Mufei Li's avatar
Mufei Li committed
5
6
7
8
9
10
11
12
13
14
15
16
17

.. currentmodule:: dgl.data

Utils
-----

.. autosummary::
    :toctree: ../../generated/

    utils.get_download_dir
    utils.download
    utils.check_sha1
    utils.extract_archive
18
    utils.split_dataset
VoVAllen's avatar
VoVAllen committed
19
20
21
    utils.save_graphs
    utils.load_graphs
    utils.load_labels
22
23
24

.. autoclass:: dgl.data.utils.Subset
    :members: __getitem__, __len__
Mufei Li's avatar
Mufei Li committed
25
26
27
28
29
30
31
32
33
34

Dataset Classes
---------------

Stanford sentiment treebank dataset
```````````````````````````````````

For more information about the dataset, see `Sentiment Analysis <https://nlp.stanford.edu/sentiment/index.html>`__.

.. autoclass:: SST
35
36
    :members: __getitem__, __len__

37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101

Karate Club dataset
```````````````````````````````````

.. autoclass:: KarateClub
    :members: __getitem__, __len__


Citation Network dataset
```````````````````````````````````

.. autoclass:: CitationGraphDataset
    :members: __getitem__, __len__


CoraFull dataset
```````````````````````````````````

.. autoclass:: CoraFull
    :members: __getitem__, __len__


Amazon Co-Purchase dataset
```````````````````````````````````

.. autoclass:: AmazonCoBuy
    :members: __getitem__, __len__


Coauthor dataset
```````````````````````````````````

.. autoclass:: Coauthor
    :members: __getitem__, __len__


BitcoinOTC dataset
```````````````````````````````````

.. autoclass:: BitcoinOTC
    :members: __getitem__, __len__


ICEWS18 dataset
```````````````````````````````````

.. autoclass:: ICEWS18
    :members: __getitem__, __len__


QM7b dataset
```````````````````````````````````

.. autoclass:: QM7b
    :members: __getitem__, __len__



GDELT dataset
```````````````````````````````````

.. autoclass:: GDELT
    :members: __getitem__, __len__


102
103
104
Mini graph classification dataset
`````````````````````````````````

105
.. autoclass:: MiniGCDataset
106
    :members: __getitem__, __len__, num_classes
107

kitaev-chen's avatar
kitaev-chen committed
108

VoVAllen's avatar
VoVAllen committed
109
110
111
112
113
114
115
116
Graph kernel dataset
````````````````````

For more information about the dataset, see `Benchmark Data Sets for Graph Kernels <https://ls11-www.cs.tu-dortmund.de/staff/morris/graphkerneldatasets>`__.

.. autoclass:: TUDataset
    :members: __getitem__, __len__

kitaev-chen's avatar
kitaev-chen committed
117
118
119
120
121
122
123
124
125
126

Graph isomorphism network dataset
```````````````````````````````````

A compact subset of graph kernel dataset

.. autoclass:: GINDataset
    :members: __getitem__, __len__


127
128
129
130
131
Protein-Protein Interaction dataset
```````````````````````````````````

.. autoclass:: PPIDataset
    :members: __getitem__, __len__
132
133
134
135
136
137

Molecular Graphs
----------------

To work on molecular graphs, make sure you have installed `RDKit 2018.09.3 <https://www.rdkit.org/docs/Install.html>`__.

138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
Data Loading and Processing Utils
`````````````````````````````````

We adapt several utilities for processing molecules from
`DeepChem <https://github.com/deepchem/deepchem/blob/master/deepchem>`__.

.. autosummary::
    :toctree: ../../generated/

    chem.add_hydrogens_to_mol
    chem.get_mol_3D_coordinates
    chem.load_molecule
    chem.multiprocess_load_molecules

Featurization Utils for Single Molecule
```````````````````````````````````````
154

155
156
157
For the use of graph neural networks, we need to featurize nodes (atoms) and edges (bonds).

General utils:
158
159
160
161
162

.. autosummary::
    :toctree: ../../generated/

    chem.one_hot_encoding
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
    chem.ConcatFeaturizer
    chem.ConcatFeaturizer.__call__

Utils for atom featurization:

.. autosummary::
    :toctree: ../../generated/

    chem.atom_type_one_hot
    chem.atomic_number_one_hot
    chem.atomic_number
    chem.atom_degree_one_hot
    chem.atom_degree
    chem.atom_total_degree_one_hot
    chem.atom_total_degree
    chem.atom_implicit_valence_one_hot
    chem.atom_implicit_valence
    chem.atom_hybridization_one_hot
    chem.atom_total_num_H_one_hot
    chem.atom_total_num_H
    chem.atom_formal_charge_one_hot
    chem.atom_formal_charge
    chem.atom_num_radical_electrons_one_hot
    chem.atom_num_radical_electrons
    chem.atom_is_aromatic_one_hot
    chem.atom_is_aromatic
    chem.atom_chiral_tag_one_hot
    chem.atom_mass
191
    chem.BaseAtomFeaturizer
192
193
    chem.BaseAtomFeaturizer.feat_size
    chem.BaseAtomFeaturizer.__call__
194
195
    chem.CanonicalAtomFeaturizer

196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
Utils for bond featurization:

.. autosummary::
    :toctree: ../../generated/

    chem.bond_type_one_hot
    chem.bond_is_conjugated_one_hot
    chem.bond_is_conjugated
    chem.bond_is_in_ring_one_hot
    chem.bond_is_in_ring
    chem.bond_stereo_one_hot
    chem.BaseBondFeaturizer
    chem.BaseBondFeaturizer.feat_size
    chem.BaseBondFeaturizer.__call__
    chem.CanonicalBondFeaturizer

212
213
Graph Construction for Single Molecule
``````````````````````````````````````
214
215
216
217
218
219
220

Several methods for constructing DGLGraphs from SMILES/RDKit molecule objects are listed below:

.. autosummary::
    :toctree: ../../generated/

    chem.mol_to_graph
221
    chem.smiles_to_bigraph
222
    chem.mol_to_bigraph
223
    chem.smiles_to_complete_graph
224
    chem.mol_to_complete_graph
225
226
227
228
229
230
231
232
233
234
235
    chem.k_nearest_neighbors

Graph Construction and Featurization for Ligand-Protein Complex
```````````````````````````````````````````````````````````````

Constructing DGLHeteroGraphs and featurize for them.

.. autosummary::
    :toctree: ../../generated/

    chem.ACNN_graph_construction_and_featurization
236
237
238
239
240
241

Dataset Classes
```````````````

If your dataset is stored in a ``.csv`` file, you may find it helpful to use

242
.. autoclass:: dgl.data.chem.MoleculeCSVDataset
243
244
    :members: __getitem__, __len__

245
Currently four datasets are supported:
246
247
248

* Tox21
* TencentAlchemyDataset
Mufei Li's avatar
Mufei Li committed
249
* PubChemBioAssayAromaticity
250
* PDBBind
251
252
253
254
255
256

.. autoclass:: dgl.data.chem.Tox21
    :members: __getitem__, __len__, task_pos_weights

.. autoclass:: dgl.data.chem.TencentAlchemyDataset
    :members: __getitem__, __len__, set_mean_and_std
Mufei Li's avatar
Mufei Li committed
257
258
259

.. autoclass:: dgl.data.chem.PubChemBioAssayAromaticity
    :members: __getitem__, __len__
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288

.. autoclass:: dgl.data.chem.PDBBind
    :members: __getitem__, __len__

Dataset Splitting
`````````````````

We provide support for some common data splitting methods:

* consecutive split
* random split
* molecular weight split
* Bemis-Murcko scaffold split
* single-task-stratified split

.. autoclass:: dgl.data.chem.ConsecutiveSplitter
    :members: train_val_test_split, k_fold_split

.. autoclass:: dgl.data.chem.RandomSplitter
    :members: train_val_test_split, k_fold_split

.. autoclass:: dgl.data.chem.MolecularWeightSplitter
    :members: train_val_test_split, k_fold_split

.. autoclass:: dgl.data.chem.ScaffoldSplitter
    :members: train_val_test_split, k_fold_split

.. autoclass:: dgl.data.chem.SingleTaskStratifiedSplitter
    :members: train_val_test_split, k_fold_split