README.md 6.2 KB
Newer Older
1
2
# DGL-LifeSci

Mufei Li's avatar
Mufei Li committed
3
4
[Documentation](https://lifesci.dgl.ai/index.html) | [Discussion Forum](https://discuss.dgl.ai)

5
6
7
8
## Introduction

Deep learning on graphs has been an arising trend in the past few years. There are a lot of graphs in 
life science such as molecular graphs and biological networks, making it an import area for applying 
9
deep learning on graphs. DGL-LifeSci is a DGL-based package for various applications in life science 
10
11
12
13
14
with graph neural networks. 

We provide various functionalities, including but not limited to methods for graph construction, 
featurization, and evaluation, model architectures, training scripts and pre-trained models.

15
16
For a list of community contributors, see [here](CONTRIBUTORS.md).

17
18
**For a full list of work implemented in DGL-LifeSci, see [here](examples/README.md).**

19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
## Installation

### Requirements

DGL-LifeSci should work on

* all Linux distributions no earlier than Ubuntu 16.04
* macOS X
* Windows 10

DGL-LifeSci requires python 3.6+, DGL 0.4.3+ and PyTorch 1.2.0+.

Additionally, we require `RDKit 2018.09.3` for cheminformatics. We recommend installing it with

```
conda install -c conda-forge rdkit==2018.09.3
```
 
For other installation recipes for RDKit, see the [official documentation](https://www.rdkit.org/docs/Install.html).

### Pip installation for DGL-LifeSci

```
pip install dgllife
```

### Conda installation for DGL-LifeSci

```
conda install -c dglteam dgllife
```

### Installation from source

If you want to try experimental features, you can install from source as follows:

```
git clone https://github.com/dmlc/dgl.git
cd apps/life_sci/python
python setup.py install
```

### Verifying successful installation

Once you have installed the package, you can verify the success of installation with 

```python
import dgllife

print(dgllife.__version__)
# 0.2.1
```

If you are new to DGL, the first time you import dgl a message will pop up as below:

```
DGL does not detect a valid backend option. Which backend would you like to work with?
Backend choice (pytorch, mxnet or tensorflow):
```

and you need to enter `pytorch`.

81
## Example Usage
82

83
84
85
86
To apply graph neural networks to molecules with DGL, we need to first construct `DGLGraph` -- 
the graph data structure in DGL and prepare initial node/edge features. Below gives an example of 
constructing a bi-directed graph from a molecule and featurizing it with atom and bond features such 
as atom type and bond type.
87

88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
```python
from dgllife.utils import smiles_to_bigraph, CanonicalAtomFeaturizer, CanonicalBondFeaturizer

# Node featurizer
node_featurizer = CanonicalAtomFeaturizer(atom_data_field='h')
# Edge featurizer
edge_featurizer = CanonicalBondFeaturizer(bond_data_field='h')
# SMILES (a string representation for molecule) for Penicillin
smiles = 'CC1(C(N2C(S1)C(C2=O)NC(=O)CC3=CC=CC=C3)C(=O)O)C'
g = smiles_to_bigraph(smiles=smiles, 
                      node_featurizer=node_featurizer,
                      edge_featurizer=edge_featurizer)
print(g)
"""
DGLGraph(num_nodes=23, num_edges=50,
         ndata_schemes={'h': Scheme(shape=(74,), dtype=torch.float32)}
         edata_schemes={'h': Scheme(shape=(12,), dtype=torch.float32)})
"""
```
107

108
109
We implement various models that users can import directly. Below gives an example of defining a GCN-based model  
for molecular property prediction.
Mufei Li's avatar
Mufei Li committed
110

111
112
```python
from dgllife.model import GCNPredictor
Mufei Li's avatar
Mufei Li committed
113

114
model = GCNPredictor(in_feats=1)
Mufei Li's avatar
Mufei Li committed
115
```
116

117
For a full example of applying `GCNPredictor`, run the following command
118

119
120
```bash
python examples/property_prediction/classification.py -m GCN -d Tox21
121
122
```

123
124
For more examples on molecular property prediction, generative models, protein-ligand binding affinity 
prediction and reaction prediction, see `examples`.
125

126
127
We also provide pre-trained models for most examples, which can be used off-shelf without training from scratch. 
Below gives an example of loading a pre-trained model for `GCNPredictor` on a molecular property prediction dataset.
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146

```python
from dgllife.data import Tox21
from dgllife.model import load_pretrained
from dgllife.utils import smiles_to_bigraph, CanonicalAtomFeaturizer

dataset = Tox21(smiles_to_bigraph, CanonicalAtomFeaturizer())
model = load_pretrained('GCN_Tox21') # Pretrained model loaded
model.eval()

smiles, g, label, mask = dataset[0]
feats = g.ndata.pop('h')
label_pred = model(g, feats)
print(smiles)                   # CCOc1ccc2nc(S(N)(=O)=O)sc2c1
print(label_pred[:, mask != 0]) # Mask non-existing labels
# tensor([[ 1.4190, -0.1820,  1.2974,  1.4416,  0.6914,  
# 2.0957,  0.5919,  0.7715, 1.7273,  0.2070]])
```

147
148
Similarly, we can load a pre-trained model for generating molecules. If possible, we recommend running 
the code block below with Jupyter notebook.
149

150
```python
151
152
153
154
from dgllife.model import load_pretrained

model = load_pretrained('DGMG_ZINC_canonical')
model.eval()
155
smiles = []
156
for i in range(4):
157
    smiles.append(model(rdkit_mol=True))
158

159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
print(smiles)
# ['CC1CCC2C(CCC3C2C(NC2=CC(Cl)=CC=C2N)S3(=O)=O)O1',
# 'O=C1SC2N=CN=C(NC(SC3=CC=CC=N3)C1=CC=CO)C=2C1=CCCC1', 
# 'CC1C=CC(=CC=1)C(=O)NN=C(C)C1=CC=CC2=CC=CC=C21', 
# 'CCN(CC1=CC=CC=C1F)CC1CCCN(C)C1']
```

If you are running the code block above in Jupyter notebook, you can also visualize the molecules generated with

```python
from IPython.display import SVG
from rdkit import Chem
from rdkit.Chem import Draw

mols = [Chem.MolFromSmiles(s) for s in smiles]
174
175
176
SVG(Draw.MolsToGridImage(mols, molsPerRow=4, subImgSize=(180, 150), useSVG=True))
```

177
![](https://data.dgl.ai/dgllife/dgmg/dgmg_model_zoo_example2.png)
178
179
180
181
182

## Speed Reference

Below we provide some reference numbers to show how DGL improves the speed of training models per epoch in seconds.

183
184
185
186
187
188
| Model                              | Original Implementation | DGL Implementation | Improvement |
| ---------------------------------- | ----------------------- | ------------------ | ----------- |
| GCN on Tox21                       | 5.5 (DeepChem)          | 1.0                | 5.5x        |
| AttentiveFP on Aromaticity         | 6.0                     | 1.2                | 5x          |
| JTNN on ZINC                       | 1826                    | 743                | 2.5x        |
| WLN for reaction center prediction | 11657                   | 5095               | 2.3x        |                                                           |