README.md 5.25 KB
Newer Older
1
2
3
4
# DGL for Chemistry

With atoms being nodes and bonds being edges, molecular graphs are among the core objects for study in drug discovery. 
As drug discovery is known to be costly and time consuming, deep learning on graphs can be potentially beneficial for 
Mufei Li's avatar
Mufei Li committed
5
6
7
8
improving the efficiency of drug discovery [1], [2], [9].

To make it easy for domain scientists, the DGL team releases a model zoo for Chemistry, focusing on two particular cases 
-- property prediction and target generation/optimization. 
9
10
11
12
13
14
15
16
17
18
19
20

With pre-trained models and training scripts, we hope this model zoo will be helpful for both
the chemistry community and the deep learning community to further their research.

## Dependencies

Before you proceed, make sure you have installed the dependencies below:
- PyTorch 1.2
    - Check the [official website](https://pytorch.org/) for installation guide
- RDKit 2018.09.3
    - We recommend installation with `conda install -c conda-forge rdkit==2018.09.3`. For other installation recipes,
    see the [official documentation](https://www.rdkit.org/docs/Install.html).
Mufei Li's avatar
Mufei Li committed
21
22

The rest dependencies can be installed with `pip install -r requirements.txt`.
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

## Property Prediction

[**Get started with our example code!**](https://github.com/dmlc/dgl/tree/master/examples/pytorch/model_zoo/chem/property_prediction)

To evaluate molecules for drug candidates, we need to know their properties and activities. In practice, this is
mostly achieved via wet lab experiments. We can cast the problem as a regression or classification problem.
In practice, this can be quite difficult due to the scarcity of labeled data.

### Featurization and Representation Learning

Fingerprint has been a widely used concept in cheminformatics. Chemists developed hand designed rules to convert 
molecules into binary strings where each bit indicates the presence or absence of a particular substructure. The
development of fingerprints makes the comparison of molecules a lot easier. Previous machine learning methods are 
mostly developed based on molecule fingerprints.

Graph neural networks make it possible for a data-driven representation of molecules out of the atoms, bonds and 
molecular graph topology, which may be viewed as a learned fingerprint [3]. 

lunar's avatar
lunar committed
42
43
### Models  

44
- **Graph Convolutional Network**: Graph Convolutional Networks (GCN) have been one of the most popular graph neural 
lunar's avatar
lunar committed
45
46
47
networks and they can be easily extended for graph level prediction.  
- **SchNet**: SchNet is a novel deep learning architecture modeling quantum interactions in molecules which utilize 
the continuous-filter convolutional layers [4].   
Mufei Li's avatar
Mufei Li committed
48
49
50
51
52
- **Multilevel Graph Convolutional neural Network**: Multilevel Graph Convolutional neural Network (MGCN) is a 
well-designed hierarchical graph neural network directly extracts features from the conformation and spatial information 
followed by the multilevel interactions [5].    
- **Message Passing Neural Network**: Message Passing Neural Network (MPNN) is a well-designed network with edge network 
(enn) as front end and uses Set2Set to output prediction [6].
53

54
55
56
57
58
59
60
61
62
63
64
65
## Generative Models

We use generative models for two different purposes when it comes to molecules:
- **Distribution Learning**: Given a collection of molecules, we want to model their distribution and generate new
molecules with similar properties.
- **Goal-directed Optimization**: Find molecules with desired properties.

For this model zoo, we will only focused on generative models for molecular graphs. There are other generative models 
working with alternative representations like SMILES. 

Generative models are known to be difficult for evaluation. [GuacaMol](https://github.com/BenevolentAI/guacamol) and
[MOSES](https://github.com/molecularsets/moses) have been two recent efforts to benchmark generative models. There
lunar's avatar
lunar committed
66
are also two accompanying review papers that are well written [7], [8].
67
68
69
70
71

### Models
- **Deep Generative Models of Graphs (DGMG)**: A very general framework for graph distribution learning by progressively
adding atoms and bonds.

72
73
74
75
76
77
78
79
80
## References

[1] Chen et al. (2018) The rise of deep learning in drug discovery. *Drug Discov Today* 6, 1241-1250.

[2] Vamathevan et al. (2019) Applications of machine learning in drug discovery and development. 
*Nature Reviews Drug Discovery* 18, 463-477.

[3] Duvenaud et al. (2015) Convolutional networks on graphs for learning molecular fingerprints. *Advances in neural 
information processing systems (NeurIPS)*, 2224-2232.
81

lunar's avatar
lunar committed
82
83
84
85
86
87
[4] Schütt et al. (2017) SchNet: A continuous-filter convolutional neural network for modeling quantum interactions. 
*Advances in Neural Information Processing Systems (NeurIPS)*, 992-1002.

[5] Lu et al. Molecular Property Prediction: A Multilevel Quantum Interactions Modeling Perspective. 
*The 33rd AAAI Conference on Artificial Intelligence*. 

Mufei Li's avatar
Mufei Li committed
88
89
[6] Gilmer et al. (2017) Neural Message Passing for Quantum Chemistry. *Proceedings of the 34th International Conference 
on Machine Learning* JMLR. 1263-1272.
lunar's avatar
lunar committed
90
91

[7] Brown et al. (2019) GuacaMol: Benchmarking Models for de Novo Molecular Design. *J. Chem. Inf. Model*, 2019, 59, 3, 
92
93
1096-1108.

lunar's avatar
lunar committed
94
[8] Polykovskiy et al. (2019) Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models. *arXiv*. 
Mufei Li's avatar
Mufei Li committed
95
96

[9] Goh et al. (2017) Deep learning for computational chemistry. *Journal of Computational Chemistry* 16, 1291-1307.