README.md 8.94 KB
Newer Older
1
2
# DGL for Chemistry

Mufei Li's avatar
Mufei Li committed
3
4
5
With atoms being nodes and bonds being edges, molecular graphs are among the core objects for study in Chemistry. 
Deep learning on graphs can be beneficial for various applications in Chemistry like drug and material discovery 
[1], [2], [12].
Mufei Li's avatar
Mufei Li committed
6

7
8
To make it easy for domain scientists, the DGL team releases a model zoo for Chemistry, spanning three cases
-- property prediction, target generation/optimization and binding affinity prediction.
9
10
11
12
13
14

With pre-trained models and training scripts, we hope this model zoo will be helpful for both
the chemistry community and the deep learning community to further their research.

## Dependencies

15
16
17
Before you proceed, depending on the model/task you are interested, 
you may need to install the dependencies below:

18
- PyTorch 1.2
Mufei Li's avatar
Mufei Li committed
19
    - Check the [official website](https://pytorch.org/) for installation guide.
20
21
22
- RDKit 2018.09.3
    - We recommend installation with `conda install -c conda-forge rdkit==2018.09.3`. For other installation recipes,
    see the [official documentation](https://www.rdkit.org/docs/Install.html).
23
24
25
26
27
28
- Pdbfixer
    - We recommend installation with `conda install -c omnia pdbfixer`. To install from source, see the 
    [manual](http://htmlpreview.github.io/?https://raw.github.com/pandegroup/pdbfixer/master/Manual.html).
- MDTraj
    - We recommend installation with `conda install -c conda-forge mdtraj`. For alternative ways of installation, 
    see the [official documentation](http://mdtraj.org/1.9.3/installation.html).
Mufei Li's avatar
Mufei Li committed
29
30

The rest dependencies can be installed with `pip install -r requirements.txt`.
31

Mufei Li's avatar
Mufei Li committed
32
33
34
35
36
37
38
39
40
41
## Speed Reference

Below we provide some reference numbers to show how DGL improves the speed of training models per epoch in seconds.

| Model                      | Original Implementation | DGL Implementation | Improvement |
| -------------------------- | ----------------------- | ------------------ | ----------- |
| GCN on Tox21               | 5.5 (DeepChem)          | 1.0                | 5.5x        |
| AttentiveFP on Aromaticity | 6.0                     | 1.2                | 5x          |
| JTNN on ZINC               | 1826                    | 743                | 2.5x        |   

42
## Featurization and Representation Learning
43
44
45
46
47
48
49
50
51

Fingerprint has been a widely used concept in cheminformatics. Chemists developed hand designed rules to convert 
molecules into binary strings where each bit indicates the presence or absence of a particular substructure. The
development of fingerprints makes the comparison of molecules a lot easier. Previous machine learning methods are 
mostly developed based on molecule fingerprints.

Graph neural networks make it possible for a data-driven representation of molecules out of the atoms, bonds and 
molecular graph topology, which may be viewed as a learned fingerprint [3]. 

52
53
54
55
56
57
## Property Prediction

To evaluate molecules for drug candidates, we need to know their properties and activities. In practice, this is
mostly achieved via wet lab experiments. We can cast the problem as a regression or classification problem.
In practice, this can be quite difficult due to the scarcity of labeled data.

Mufei Li's avatar
Mufei Li committed
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
### Models
- **Graph Convolutional Networks** [3], [9]: Graph Convolutional Networks (GCN) have been one of the most popular graph 
neural networks and they can be easily extended for graph level prediction.
- **Graph Attention Networks** [10]: Graph Attention Networks (GATs) incorporate multi-head attention into GCNs,
explicitly modeling the interactions between adjacent atoms.
- **SchNet** [4]: SchNet is a novel deep learning architecture modeling quantum interactions in molecules which utilize 
the continuous-filter convolutional layers.   
- **Multilevel Graph Convolutional neural Network** [5]: Multilevel Graph Convolutional neural Network (MGCN) is a well-designed 
hierarchical graph neural network directly extracts features from the conformation and spatial information followed 
by the multilevel interactions.    
- **Message Passing Neural Network** [6]: Message Passing Neural Network (MPNN) is a well-designed network with edge network (enn) 
as front end and Set2Set for output prediction.

### Example Usage of Pre-trained Models

Mufei Li's avatar
Mufei Li committed
73
```python
74
from dgl.data.chem import Tox21, smiles_to_bigraph, CanonicalAtomFeaturizer
Mufei Li's avatar
Mufei Li committed
75
76
from dgl import model_zoo

77
dataset = Tox21(smiles_to_bigraph, CanonicalAtomFeaturizer())
Mufei Li's avatar
Mufei Li committed
78
79
80
81
82
model = model_zoo.chem.load_pretrained('GCN_Tox21') # Pretrained model loaded
model.eval()

smiles, g, label, mask = dataset[0]
feats = g.ndata.pop('h')
83
label_pred = model(g, feats)
Mufei Li's avatar
Mufei Li committed
84
85
86
87
88
print(smiles)                   # CCOc1ccc2nc(S(N)(=O)=O)sc2c1
print(label_pred[:, mask != 0]) # Mask non-existing labels
# tensor([[-0.7956,  0.4054,  0.4288, -0.5565, -0.0911,  
# 0.9981, -0.1663,  0.2311, -0.2376,  0.9196]])
```
89

90
91
92
93
94
95
96
97
98
99
100
101
## Generative Models

We use generative models for two different purposes when it comes to molecules:
- **Distribution Learning**: Given a collection of molecules, we want to model their distribution and generate new
molecules with similar properties.
- **Goal-directed Optimization**: Find molecules with desired properties.

For this model zoo, we will only focused on generative models for molecular graphs. There are other generative models 
working with alternative representations like SMILES. 

Generative models are known to be difficult for evaluation. [GuacaMol](https://github.com/BenevolentAI/guacamol) and
[MOSES](https://github.com/molecularsets/moses) have been two recent efforts to benchmark generative models. There
lunar's avatar
lunar committed
102
are also two accompanying review papers that are well written [7], [8].
103
104

### Models
Mufei Li's avatar
Mufei Li committed
105
106
- **Deep Generative Models of Graphs (DGMG)** [11]: A very general framework for graph distribution learning by 
progressively adding atoms and bonds.
Mufei Li's avatar
Mufei Li committed
107
108
109
- **Junction Tree Variational Autoencoder for Molecular Graph Generation (JTNN)** [13]: JTNNs are able to incrementally
expand molecules while maintaining chemical valency at every step. They can be used for both molecule generation and
optimization.
Mufei Li's avatar
Mufei Li committed
110
111
112

### Example Usage of Pre-trained Models

Mufei Li's avatar
Mufei Li committed
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
```python
# We recommend running the code below with Jupyter notebooks
from IPython.display import SVG
from rdkit import Chem
from rdkit.Chem import Draw

from dgl import model_zoo

model = model_zoo.chem.load_pretrained('DGMG_ZINC_canonical')
model.eval()
mols = []
for i in range(4):
    SMILES = model(rdkit_mol=True)
    mols.append(Chem.MolFromSmiles(SMILES))
# Generating 4 molecules takes less than a second.

SVG(Draw.MolsToGridImage(mols, molsPerRow=4, subImgSize=(180, 150), useSVG=True))
```
Mufei Li's avatar
Mufei Li committed
131

132
![](https://data.dgl.ai/dgllife/dgmg/dgmg_model_zoo_example2.png)
133

134
135
136
137
138
139
140
141
142
143
144
## Binding affinity prediction

The interaction of drugs and proteins can be characterized in terms of binding affinity. Given a pair of ligand 
(drug candidate) and protein with particular conformations, we are interested in predicting the 
binding affinity between them. 

### Models

- **Atomic Convolutional Networks** [14]: Constructs nearest neighbor graphs separately for the ligand, protein and complex 
based on the 3D coordinates of the atoms and predicts the binding free energy.

145
146
147
148
149
150
151
152
153
## References

[1] Chen et al. (2018) The rise of deep learning in drug discovery. *Drug Discov Today* 6, 1241-1250.

[2] Vamathevan et al. (2019) Applications of machine learning in drug discovery and development. 
*Nature Reviews Drug Discovery* 18, 463-477.

[3] Duvenaud et al. (2015) Convolutional networks on graphs for learning molecular fingerprints. *Advances in neural 
information processing systems (NeurIPS)*, 2224-2232.
154

lunar's avatar
lunar committed
155
156
157
158
159
160
[4] Schütt et al. (2017) SchNet: A continuous-filter convolutional neural network for modeling quantum interactions. 
*Advances in Neural Information Processing Systems (NeurIPS)*, 992-1002.

[5] Lu et al. Molecular Property Prediction: A Multilevel Quantum Interactions Modeling Perspective. 
*The 33rd AAAI Conference on Artificial Intelligence*. 

Mufei Li's avatar
Mufei Li committed
161
162
[6] Gilmer et al. (2017) Neural Message Passing for Quantum Chemistry. *Proceedings of the 34th International Conference on 
Machine Learning* JMLR. 1263-1272.
lunar's avatar
lunar committed
163
164

[7] Brown et al. (2019) GuacaMol: Benchmarking Models for de Novo Molecular Design. *J. Chem. Inf. Model*, 2019, 59, 3, 
165
166
1096-1108.

lunar's avatar
lunar committed
167
[8] Polykovskiy et al. (2019) Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models. *arXiv*. 
Mufei Li's avatar
Mufei Li committed
168

Mufei Li's avatar
Mufei Li committed
169
170
171
172
173
174
175
176
177
[9] Kipf et al. (2017) Semi-Supervised Classification with Graph Convolutional Networks.
*The International Conference on Learning Representations (ICLR)*. 

[10] Veličković et al. (2018) Graph Attention Networks. 
*The International Conference on Learning Representations (ICLR)*. 

[11] Li et al. (2018) Learning Deep Generative Models of Graphs. *arXiv preprint arXiv:1803.03324*.

[12] Goh et al. (2017) Deep learning for computational chemistry. *Journal of Computational Chemistry* 16, 1291-1307.
178
179
180

[13] Jin et al. (2018) Junction Tree Variational Autoencoder for Molecular Graph Generation. 
*Proceedings of the 35th International Conference on Machine Learning (ICML)*, 2323-2332.
181
182

[14] Gomes et al. (2017) Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity. *arXiv preprint arXiv:1703.10603*.