.. _apidata: Dataset ======= .. currentmodule:: dgl.data Utils ----- .. autosummary:: :toctree: ../../generated/ utils.get_download_dir utils.download utils.check_sha1 utils.extract_archive utils.split_dataset utils.save_graphs utils.load_graphs utils.load_labels .. autoclass:: dgl.data.utils.Subset :members: __getitem__, __len__ Dataset Classes --------------- Stanford sentiment treebank dataset ``````````````````````````````````` For more information about the dataset, see `Sentiment Analysis `__. .. autoclass:: SST :members: __getitem__, __len__ Karate Club dataset ``````````````````````````````````` .. autoclass:: KarateClub :members: __getitem__, __len__ Citation Network dataset ``````````````````````````````````` .. autoclass:: CitationGraphDataset :members: __getitem__, __len__ Cora Citation Network dataset ``````````````````````````````````` .. autoclass:: CoraDataset :members: __getitem__, __len__ CoraFull dataset ``````````````````````````````````` .. autoclass:: CoraFull :members: __getitem__, __len__ Amazon Co-Purchase dataset ``````````````````````````````````` .. autoclass:: AmazonCoBuy :members: __getitem__, __len__ Coauthor dataset ``````````````````````````````````` .. autoclass:: Coauthor :members: __getitem__, __len__ BitcoinOTC dataset ``````````````````````````````````` .. autoclass:: BitcoinOTC :members: __getitem__, __len__ ICEWS18 dataset ``````````````````````````````````` .. autoclass:: ICEWS18 :members: __getitem__, __len__ QM7b dataset ``````````````````````````````````` .. autoclass:: QM7b :members: __getitem__, __len__ GDELT dataset ``````````````````````````````````` .. autoclass:: GDELT :members: __getitem__, __len__ Mini graph classification dataset ````````````````````````````````` .. autoclass:: MiniGCDataset :members: __getitem__, __len__, num_classes Graph kernel dataset ```````````````````` For more information about the dataset, see `Benchmark Data Sets for Graph Kernels `__. .. autoclass:: TUDataset :members: __getitem__, __len__ Graph isomorphism network dataset ``````````````````````````````````` A compact subset of graph kernel dataset .. autoclass:: GINDataset :members: __getitem__, __len__ Protein-Protein Interaction dataset ``````````````````````````````````` .. autoclass:: PPIDataset :members: __getitem__, __len__ Molecular Graphs ---------------- To work on molecular graphs, make sure you have installed `RDKit 2018.09.3 `__. Data Loading and Processing Utils ````````````````````````````````` We adapt several utilities for processing molecules from `DeepChem `__. .. autosummary:: :toctree: ../../generated/ chem.add_hydrogens_to_mol chem.get_mol_3D_coordinates chem.load_molecule chem.multiprocess_load_molecules Featurization Utils for Single Molecule ``````````````````````````````````````` For the use of graph neural networks, we need to featurize nodes (atoms) and edges (bonds). General utils: .. autosummary:: :toctree: ../../generated/ chem.one_hot_encoding chem.ConcatFeaturizer chem.ConcatFeaturizer.__call__ Utils for atom featurization: .. autosummary:: :toctree: ../../generated/ chem.atom_type_one_hot chem.atomic_number_one_hot chem.atomic_number chem.atom_degree_one_hot chem.atom_degree chem.atom_total_degree_one_hot chem.atom_total_degree chem.atom_implicit_valence_one_hot chem.atom_implicit_valence chem.atom_hybridization_one_hot chem.atom_total_num_H_one_hot chem.atom_total_num_H chem.atom_formal_charge_one_hot chem.atom_formal_charge chem.atom_num_radical_electrons_one_hot chem.atom_num_radical_electrons chem.atom_is_aromatic_one_hot chem.atom_is_aromatic chem.atom_chiral_tag_one_hot chem.atom_mass chem.BaseAtomFeaturizer chem.BaseAtomFeaturizer.feat_size chem.BaseAtomFeaturizer.__call__ chem.CanonicalAtomFeaturizer Utils for bond featurization: .. autosummary:: :toctree: ../../generated/ chem.bond_type_one_hot chem.bond_is_conjugated_one_hot chem.bond_is_conjugated chem.bond_is_in_ring_one_hot chem.bond_is_in_ring chem.bond_stereo_one_hot chem.BaseBondFeaturizer chem.BaseBondFeaturizer.feat_size chem.BaseBondFeaturizer.__call__ chem.CanonicalBondFeaturizer Graph Construction for Single Molecule `````````````````````````````````````` Several methods for constructing DGLGraphs from SMILES/RDKit molecule objects are listed below: .. autosummary:: :toctree: ../../generated/ chem.mol_to_graph chem.smiles_to_bigraph chem.mol_to_bigraph chem.smiles_to_complete_graph chem.mol_to_complete_graph chem.k_nearest_neighbors Graph Construction and Featurization for Ligand-Protein Complex ``````````````````````````````````````````````````````````````` Constructing DGLHeteroGraphs and featurize for them. .. autosummary:: :toctree: ../../generated/ chem.ACNN_graph_construction_and_featurization Dataset Classes ``````````````` If your dataset is stored in a ``.csv`` file, you may find it helpful to use .. autoclass:: dgl.data.chem.CSVDataset :members: __getitem__, __len__ Currently four datasets are supported: * Tox21 * TencentAlchemyDataset * PubChemBioAssayAromaticity * PDBBind .. autoclass:: dgl.data.chem.Tox21 :members: __getitem__, __len__, task_pos_weights .. autoclass:: dgl.data.chem.TencentAlchemyDataset :members: __getitem__, __len__, set_mean_and_std .. autoclass:: dgl.data.chem.PubChemBioAssayAromaticity :members: __getitem__, __len__ .. autoclass:: dgl.data.chem.PDBBind :members: __getitem__, __len__ Dataset Splitting ````````````````` We provide support for some common data splitting methods: * consecutive split * random split * molecular weight split * Bemis-Murcko scaffold split * single-task-stratified split .. autoclass:: dgl.data.chem.ConsecutiveSplitter :members: train_val_test_split, k_fold_split .. autoclass:: dgl.data.chem.RandomSplitter :members: train_val_test_split, k_fold_split .. autoclass:: dgl.data.chem.MolecularWeightSplitter :members: train_val_test_split, k_fold_split .. autoclass:: dgl.data.chem.ScaffoldSplitter :members: train_val_test_split, k_fold_split .. autoclass:: dgl.data.chem.SingleTaskStratifiedSplitter :members: train_val_test_split, k_fold_split