README.md 14.3 KB
Newer Older
Minjie Wang's avatar
Minjie Wang committed
1
# Deep Graph Library (DGL)
Minjie Wang's avatar
Minjie Wang committed
2

Minjie Wang's avatar
Minjie Wang committed
3
4
[![PyPi Latest Release](https://img.shields.io/pypi/v/dgl.svg)](https://pypi.org/project/dgl/)
[![Conda Latest Release](https://anaconda.org/dglteam/dgl/badges/version.svg)](https://anaconda.org/dglteam/dgl)
Minjie Wang's avatar
Minjie Wang committed
5
[![Build Status](http://ci.dgl.ai:80/buildStatus/icon?job=DGL/master)](http://ci.dgl.ai:80/job/DGL/job/master/)
Minjie Wang's avatar
Minjie Wang committed
6
[![Benchmark by ASV](http://img.shields.io/badge/benchmarked%20by-asv-green.svg?style=flat)](https://asv.dgl.ai/)
VoVAllen's avatar
VoVAllen committed
7
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](./LICENSE)
zzhang-cn's avatar
zzhang-cn committed
8

Minjie Wang's avatar
Minjie Wang committed
9
Documentation ([Latest](https://docs.dgl.ai/en/latest/) | [Stable](https://docs.dgl.ai)) | [DGL at a glance](https://docs.dgl.ai/tutorials/basics/1_first.html#sphx-glr-tutorials-basics-1-first-py) | [Model Tutorials](https://docs.dgl.ai/tutorials/models/index.html) | [Discussion Forum](https://discuss.dgl.ai) | [Slack Channel](https://join.slack.com/t/deep-graph-library/shared_invite/zt-eb4ict1g-xcg3PhZAFAB8p6dtKuP6xQ)
Minjie Wang's avatar
Minjie Wang committed
10

11

Minjie Wang's avatar
Minjie Wang committed
12
DGL is an easy-to-use, high performance and scalable Python package for deep learning on graphs. DGL is framework agnostic, meaning if a deep graph model is a component of an end-to-end application, the rest of the logics can be implemented in any major frameworks, such as PyTorch, Apache MXNet or TensorFlow.
Zheng Zhang's avatar
Zheng Zhang committed
13

Minjie Wang's avatar
Minjie Wang committed
14
<p align="center">
zhjwy9343's avatar
zhjwy9343 committed
15
  <img src="http://data.dgl.ai/asset/image/DGL-Arch.png" alt="DGL v0.4 architecture" width="600">
Minjie Wang's avatar
Minjie Wang committed
16
17
18
  <br>
  <b>Figure</b>: DGL Overall Architecture
</p>
19

zhjwy9343's avatar
zhjwy9343 committed
20
## <img src="http://data.dgl.ai/asset/image/new.png" width="30">DGL News
Quan (Andy) Gan's avatar
Quan (Andy) Gan committed
21
22
**09/05/2020**: We invite you to participate in the survey [here](https://forms.gle/Ej3jHCocACmb49Gp8) to make DGL better fit for your needs.  Thanks!

23
24
**08/21/2020**: The new **v0.5.0 release** includes distributed GNN training, overhauled documentation and user guide, and several more features.  We have also submitted some models to the [OGB](https://ogb.stanford.edu) leaderboard.  See our [release note](https://github.com/dmlc/dgl/releases/tag/0.5.0) for more details.

25
26
27
**06/11/2020**: Amazon Shanghai AI Lab and AWS Deep Engine Science team working along with academic collaborators from the University of Minnesota, The Ohio State University, and Hunan University have created the **[Drug Repurposing Knowledge Graph (DRKG)](https://github.com/gnn4dr/DRKG)** and a set of machine learning tools, [DGL-KE](https://github.com/awslabs/dgl-ke), that can be used to prioritize drugs for repurposing studies. 
DRKG is a comprehensive biological knowledge graph that relates human genes, compounds, biological processes, drug side effects, diseases and symptoms. DRKG includes, curates, and normalizes information from six publicly available databases and data that were collected from recent publications related to Covid-19. It has 97,238 entities belonging to 13 types of entities, and 5,874,261 triplets belonging to 107 types of relations. 
More about the dataset is in this [blogpost](https://www.dgl.ai/news/2020/06/09/covid.html).
28

Minjie Wang's avatar
Minjie Wang committed
29
## Using DGL
30

31
**A data scientist** may want to apply a pre-trained model to your data right away. For this you can use DGL's [Application packages, formally *Model Zoo*](https://github.com/dmlc/dgl/tree/master/apps). Application packages are developed for domain applications, as is the case for [DGL-LifeScience](https://github.com/awslabs/dgl-lifesci). We will soon add model zoo for knowledge graph embedding learning and recommender systems. Here's how you will use a pretrained model:
Minjie Wang's avatar
Minjie Wang committed
32
```python
33
34
35
from dgllife.data import Tox21
from dgllife.model import load_pretrained
from dgllife.utils import smiles_to_bigraph, CanonicalAtomFeaturizer
zzhang-cn's avatar
zzhang-cn committed
36

Minjie Wang's avatar
Minjie Wang committed
37
dataset = Tox21(smiles_to_bigraph, CanonicalAtomFeaturizer())
38
model = load_pretrained('GCN_Tox21') # Pretrained model loaded
Minjie Wang's avatar
Minjie Wang committed
39
model.eval()
zzhang-cn's avatar
zzhang-cn committed
40

Minjie Wang's avatar
Minjie Wang committed
41
42
43
smiles, g, label, mask = dataset[0]
feats = g.ndata.pop('h')
label_pred = model(g, feats)
44
45
46
47
print(smiles)                   # CCOc1ccc2nc(S(N)(=O)=O)sc2c1
print(label_pred[:, mask != 0]) # Mask non-existing labels
# tensor([[ 1.4190, -0.1820,  1.2974,  1.4416,  0.6914,  
# 2.0957,  0.5919,  0.7715, 1.7273,  0.2070]])
Minjie Wang's avatar
Minjie Wang committed
48
```
Gan Quan's avatar
Gan Quan committed
49

Minjie Wang's avatar
Minjie Wang committed
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
**Further reading**: DGL is released as a managed service on AWS SageMaker, see the medium posts for an easy trip to DGL on SageMaker([part1](https://medium.com/@julsimon/a-primer-on-graph-neural-networks-with-amazon-neptune-and-the-deep-graph-library-5ce64984a276) and [part2](https://medium.com/@julsimon/deep-graph-library-part-2-training-on-amazon-sagemaker-54d318dfc814)).

**Researchers** can start from the growing list of [models implemented in DGL](https://github.com/dmlc/dgl/tree/master/examples). Developing new models does not mean that you have to start from scratch. Instead, you can reuse many [pre-built modules](https://docs.dgl.ai/api/python/nn.html). Here is how to get a standard two-layer graph convolutional model with a pre-built GraphConv module:
```python
from dgl.nn.pytorch import GraphConv
import torch.nn.functional as F

# build a two-layer GCN with ReLU as the activation in between
class GCN(nn.Module):
    def __init__(self, in_feats, h_feats, num_classes):
        super(GCN, self).__init__()
        self.gcn_layer1 = GraphConv(in_feats, h_feats)
        self.gcn_layer2 = GraphConv(h_feats, num_classes)
    
    def forward(self, graph, inputs):
        h = self.gcn_layer1(graph, inputs)
        h = F.relu(h)
        h = self.gcn_layer2(graph, h)
        return h
```

Next level down, you may want to innovate your own module. DGL offers a succinct message-passing interface (see tutorial [here](https://docs.dgl.ai/tutorials/basics/3_pagerank.html)). Here is how Graph Attention Network (GAT) is implemented ([complete codes](https://docs.dgl.ai/api/python/nn.pytorch.html#gatconv)). Of course, you can also find GAT as a module [GATConv](https://docs.dgl.ai/api/python/nn.pytorch.html#gatconv):
```python
import torch.nn as nn
import torch.nn.functional as F

# Define a GAT layer
class GATLayer(nn.Module):
    def __init__(self, in_feats, out_feats):
        super(GATLayer, self).__init__()
        self.linear_func = nn.Linear(in_feats, out_feats, bias=False)
        self.attention_func = nn.Linear(2 * out_feats, 1, bias=False)
        
    def edge_attention(self, edges):
        concat_z = torch.cat([edges.src['z'], edges.dst['z']], dim=1)
        src_e = self.attention_func(concat_z)
        src_e = F.leaky_relu(src_e)
        return {'e': src_e}
    
    def message_func(self, edges):
        return {'z': edges.src['z'], 'e':edges.data['e']}
        
    def reduce_func(self, nodes):
        a = F.softmax(nodes.mailbox['e'], dim=1)
        h = torch.sum(a * nodes.mailbox['z'], dim=1)
        return {'h': h}
                               
    def forward(self, graph, h):
        z = self.linear_func(h)
        graph.ndata['z'] = z
        graph.apply_edges(self.edge_attention)
        graph.update_all(self.message_func, self.reduce_func)
        return graph.ndata.pop('h')
```
## Performance and Scalability

Minjie Wang's avatar
Minjie Wang committed
106
**Microbenchmark on speed and memory usage**: While leaving tensor and autograd functions to backend frameworks (e.g. PyTorch, MXNet, and TensorFlow), DGL aggressively optimizes storage and computation with its own kernels. Here's a comparison to another popular package -- PyTorch Geometric (PyG). The short story is that raw speed is similar, but DGL has much better memory management.
Minjie Wang's avatar
Minjie Wang committed
107
108
109
110
111
112
113
114
115
116
117
118


| Dataset  |    Model     |                   Accuracy                   |                    Time <br> PyG &emsp;&emsp; DGL                    |           Memory <br> PyG &emsp;&emsp; DGL            |
| -------- |:------------:|:--------------------------------------------:|:--------------------------------------------------------------------:|:-----------------------------------------------------:|
| Cora     | GCN <br> GAT | 81.31 &plusmn; 0.88 <br> 83.98 &plusmn; 0.52 | <b>0.478</b> &emsp;&emsp; 0.666 <br> 1.608 &emsp;&emsp; <b>1.399</b> | 1.1 &emsp;&emsp; 1.1 <br> 1.2 &emsp;&emsp; <b>1.1</b> |
| CiteSeer | GCN <br> GAT | 70.98 &plusmn; 0.68 <br> 69.96 &plusmn; 0.53 | <b>0.490</b> &emsp;&emsp; 0.674 <br> 1.606 &emsp;&emsp; <b>1.399</b> | 1.1 &emsp;&emsp; 1.1 <br> 1.3 &emsp;&emsp; <b>1.1</b> |
| PubMed   | GCN <br> GAT | 79.00 &plusmn; 0.41 <br> 77.65 &plusmn; 0.32 | <b>0.491</b> &emsp;&emsp; 0.690 <br> 1.946 &emsp;&emsp; <b>1.393</b> | 1.1 &emsp;&emsp; 1.1 <br> 1.6 &emsp;&emsp; <b>1.1</b> |
| Reddit   |     GCN      |             93.46 &plusmn; 0.06              |                    *OOM*&emsp;&emsp; <b>28.6</b>                     |            *OOM* &emsp;&emsp; <b>11.7</b>             |
| Reddit-S |     GCN      |                     N/A                      |                    29.12 &emsp;&emsp; <b>9.44</b>                    |             15.7 &emsp;&emsp; <b>3.6</b>              |

Table: Training time(in seconds) for 200 epochs and memory consumption(GB)

119
120
121
122
123
124
125
126
127
128
129
Here is another comparison of DGL on TensorFlow backend with other TF-based GNN tools (training time in seconds for one epoch):

| Dateset | Model | DGL | GraphNet | tf_geometric |
| ------- | ----- | --- | -------- | ------------ |
| Core | GCN | 0.0148 | 0.0152 | 0.0192 |
| Reddit | GCN | 0.1095 | OOM | OOM |
| PubMed | GCN | 0.0156 | 0.0553 | 0.0185 |
| PPI | GCN | 0.09 | 0.16 | 0.21 |
| Cora | GAT | 0.0442 | n/a | 0.058 |
| PPI | GAT | 0.398 | n/a | 0.752 |

Minjie Wang's avatar
Minjie Wang committed
130
High memory utilization allows DGL to push the limit of single-GPU performance, as seen in below images.
zhjwy9343's avatar
zhjwy9343 committed
131
| <img src="http://data.dgl.ai/asset/image/DGLvsPyG-time1.png" width="400"> | <img src="http://data.dgl.ai/asset/image/DGLvsPyG-time2.png" width="400"> |
Minjie Wang's avatar
Minjie Wang committed
132
| -------- | -------- |
Gan Quan's avatar
Gan Quan committed
133

Minjie Wang's avatar
Minjie Wang committed
134
**Scalability**: DGL has fully leveraged multiple GPUs in both one machine and clusters for increasing training speed, and has better performance than alternatives, as seen in below images.
135

Minjie Wang's avatar
Minjie Wang committed
136
<p align="center">
zhjwy9343's avatar
zhjwy9343 committed
137
  <img src="http://data.dgl.ai/asset/image/one-four-GPUs.png" width="600">
Minjie Wang's avatar
Minjie Wang committed
138
</p>
Minjie Wang's avatar
Minjie Wang committed
139

zhjwy9343's avatar
zhjwy9343 committed
140
| <img src="http://data.dgl.ai/asset/image/one-four-GPUs-DGLvsGraphVite.png"> |  <img src="http://data.dgl.ai/asset/image/one-fourMachines.png"> | 
Minjie Wang's avatar
Minjie Wang committed
141
| :---------------------------------------: | -- |
142

Minjie Wang's avatar
Minjie Wang committed
143
144
145
146
147
148
149
150
151
152
153
154

**Further reading**: Detailed comparison of DGL and other Graph alternatives can be found [here](https://arxiv.org/abs/1909.01315).

## DGL Models and Applications

### DGL for research
Overall there are 30+ models implemented by using DGL:
- [PyTorch](https://github.com/dmlc/dgl/tree/master/examples/pytorch)
- [MXNet](https://github.com/dmlc/dgl/tree/master/examples/mxnet)
- [TensorFlow](https://github.com/dmlc/dgl/tree/master/examples/tensorflow)

### DGL for domain applications
155
- [DGL-LifeSci](https://github.com/awslabs/dgl-lifesci), previously DGL-Chem
Minjie Wang's avatar
Minjie Wang committed
156
- [DGL-KE](https://github.com/awslabs/dgl-ke)
Minjie Wang's avatar
Minjie Wang committed
157
158
159
- DGL-RecSys(coming soon)

### DGL for NLP/CV problems
Minjie Wang's avatar
Minjie Wang committed
160
- [TreeLSTM](https://github.com/dmlc/dgl/tree/master/examples/pytorch/tree_lstm)
Minjie Wang's avatar
Minjie Wang committed
161
162
163
164
165
166
167
- [GraphWriter](https://github.com/dmlc/dgl/tree/master/examples/pytorch/graphwriter)
- [Capsule Network](https://github.com/dmlc/dgl/tree/master/examples/pytorch/capsule)

We are currently in Beta stage.  More features and improvements are coming.


## Installation
Gan Quan's avatar
Gan Quan committed
168
169
170
171
172

DGL should work on

* all Linux distributions no earlier than Ubuntu 16.04
* macOS X
173
* Windows 10
Gan Quan's avatar
Gan Quan committed
174

Mufei Li's avatar
Mufei Li committed
175
DGL requires Python 3.6 or later.
Gan Quan's avatar
Gan Quan committed
176

Mufei Li's avatar
Mufei Li committed
177
Right now, DGL works on [PyTorch](https://pytorch.org) 1.5.0+, [MXNet](https://mxnet.apache.org) 1.6+, and [TensorFlow](https://tensorflow.org) 2.3+.
Gan Quan's avatar
Gan Quan committed
178
179
180
181
182


### Using anaconda

```
Quan (Andy) Gan's avatar
Quan (Andy) Gan committed
183
184
185
186
conda install -c dglteam dgl           # cpu version
conda install -c dglteam dgl-cuda9.0   # CUDA 9.0
conda install -c dglteam dgl-cuda9.2   # CUDA 9.2
conda install -c dglteam dgl-cuda10.0  # CUDA 10.0
Quan (Andy) Gan's avatar
Quan (Andy) Gan committed
187
conda install -c dglteam dgl-cuda10.1  # CUDA 10.1
188
conda install -c dglteam dgl-cuda10.2  # CUDA 10.2
Gan Quan's avatar
Gan Quan committed
189
190
191
192
```

### Using pip

193
194
195
196
197
198
199
200

|           | Latest Nightly Build Version  | Stable Version          |
|-----------|-------------------------------|-------------------------|
| CPU       | `pip install --pre dgl`       | `pip install dgl`       |
| CUDA 9.0  | `pip install --pre dgl-cu90`  | `pip install dgl-cu90`  |
| CUDA 9.2  | `pip install --pre dgl-cu92`  | `pip install dgl-cu92`  |
| CUDA 10.0 | `pip install --pre dgl-cu100` | `pip install dgl-cu100` |
| CUDA 10.1 | `pip install --pre dgl-cu101` | `pip install dgl-cu101` |
201
| CUDA 10.2 | `pip install --pre dgl-cu102` | `pip install dgl-cu102` |
Gan Quan's avatar
Gan Quan committed
202

Minjie Wang's avatar
Minjie Wang committed
203
### Built from source code
Gan Quan's avatar
Gan Quan committed
204
205
206
207

Refer to the guide [here](https://docs.dgl.ai/install/index.html#install-from-source).


Minjie Wang's avatar
Minjie Wang committed
208
## DGL Major Releases
Gan Quan's avatar
Gan Quan committed
209

Minjie Wang's avatar
Minjie Wang committed
210
211
| Releases  | Date   | Features |
|-----------|--------|-------------------------|
212
| v0.4.3    | 03/31/2020 | - TensorFlow support <br> - DGL-KE <br> - DGL-LifeSci <br> - Heterograph sampling APIs (experimental) |
Minjie Wang's avatar
Minjie Wang committed
213
214
215
216
| v0.4.2      | 01/24/2020 |  - Heterograph support <br> - TensorFlow support (experimental) <br> - MXNet GNN modules <br> | 
| v0.3.1 | 08/23/2019 | - APIs for GNN modules <br> - Model zoo (DGL-Chem) <br> - New installation |
| v0.2 | 03/09/2019 | - Graph sampling APIs <br> - Speed improvement |
| v0.1 | 12/07/2018 | - Basic DGL APIs <br> - PyTorch and MXNet support <br> - GNN model examples and tutorials |
Gan Quan's avatar
Gan Quan committed
217

Minjie Wang's avatar
Minjie Wang committed
218
## New to Deep Learning and Graph Deep Learning?
Gan Quan's avatar
Gan Quan committed
219

Anirudh's avatar
Anirudh committed
220
Check out the open source book [*Dive into Deep Learning*](https://d2l.ai/).
221

António Almeida's avatar
António Almeida committed
222
For those who are new to graph neural network, please see the [basic of DGL](https://docs.dgl.ai/tutorials/basics/index.html).
223

Minjie Wang's avatar
Minjie Wang committed
224
For audience who are looking for more advanced, realistic, and end-to-end examples, please see [model tutorials](https://docs.dgl.ai/tutorials/models/index.html).
225
226


Gan Quan's avatar
Gan Quan committed
227
228
## Contributing

Lingfan Yu's avatar
Lingfan Yu committed
229
Please let us know if you encounter a bug or have any suggestions by [filing an issue](https://github.com/dmlc/dgl/issues).
Gan Quan's avatar
Gan Quan committed
230
231

We welcome all contributions from bug fixes to new features and extensions.
Minjie Wang's avatar
Minjie Wang committed
232

233
We expect all contributions discussed in the issue tracker and going through PRs.  Please refer to our [contribution guide](https://docs.dgl.ai/contribute.html).
Gan Quan's avatar
Gan Quan committed
234

235
236
237
238
239
## Cite

If you use DGL in a scientific publication, we would appreciate citations to the following paper:
```
@article{wang2019dgl,
Minjie Wang's avatar
Minjie Wang committed
240
241
242
243
    title={Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks},
    author={Minjie Wang and Da Zheng and Zihao Ye and Quan Gan and Mufei Li and Xiang Song and Jinjing Zhou and Chao Ma and Lingfan Yu and Yu Gai and Tianjun Xiao and Tong He and George Karypis and Jinyang Li and Zheng Zhang},
    year={2019},
    journal={arXiv preprint arXiv:1909.01315}
244
245
}
```
246

Gan Quan's avatar
Gan Quan committed
247
248
## The Team

VoVAllen's avatar
VoVAllen committed
249
DGL is developed and maintained by [NYU, NYU Shanghai, AWS Shanghai AI Lab, and AWS MXNet Science Team](https://www.dgl.ai/pages/about.html).
Gan Quan's avatar
Gan Quan committed
250

251

Gan Quan's avatar
Gan Quan committed
252
253
254
## License

DGL uses Apache License 2.0.