"git@developer.sourcefind.cn:OpenDAS/dgl.git" did not exist on "e35e860ae3fbdecee8cdc24bfd700427a450e093"
Unverified Commit bdf1bb53 authored by Minjie Wang's avatar Minjie Wang Committed by GitHub
Browse files

[Doc] New README (#1259)

* new readme

* try resize

* try center

* more imgs

* fix
parent 23893bb0
# Deep Graph Library (DGL) # Deep Graph Library (DGL)
[![Build Status](http://ci.dgl.ai:80/buildStatus/icon?job=DGL/master)](http://ci.dgl.ai:80/job/DGL/job/master/) [![Build Status](http://ci.dgl.ai:80/buildStatus/icon?job=DGL/master)](http://ci.dgl.ai:80/job/DGL/job/master/)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](./LICENSE) [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](./LICENSE)
[Documentation](https://docs.dgl.ai) | [DGL at a glance](https://docs.dgl.ai/tutorials/basics/1_first.html#sphx-glr-tutorials-basics-1-first-py) | [Documentation](https://docs.dgl.ai) | [DGL at a glance](https://docs.dgl.ai/tutorials/basics/1_first.html#sphx-glr-tutorials-basics-1-first-py) | [Model Tutorials](https://docs.dgl.ai/tutorials/models/index.html) | [Discussion Forum](https://discuss.dgl.ai)
[Model Tutorials](https://docs.dgl.ai/tutorials/models/index.html) | [Discussion Forum](https://discuss.dgl.ai)
Model Zoos: [Chemistry](https://github.com/dmlc/dgl/tree/master/examples/pytorch/model_zoo) | [Citation Networks](https://github.com/dmlc/dgl/tree/master/examples/pytorch/model_zoo/citation_network)
DGL is a Python package that interfaces between existing tensor libraries and data being expressed as DGL is an easy-to-use, high performance and scalable Python package for deep learning on graphs. DGL is framework agnostic, meaning if a deep graph model is a component of an end-to-end application, the rest of the logics can be implemented in any major frameworks, such as PyTorch, Apache MXNet or TensorFlow.
graphs.
It makes implementing graph neural networks (including Graph Convolution Networks, TreeLSTM, and many others) easy while <p align="center">
maintaining high computation efficiency. <img src="https://i.imgur.com/DwA1NbZ.png" alt="DGL v0.4 architecture" width="600">
<br>
<b>Figure</b>: DGL Overall Architecture
</p>
All model examples can be found [here](https://github.com/dmlc/dgl/tree/master/examples).
A summary of part of the model accuracy and training speed with the Pytorch backend (on Amazon EC2 p3.2x instance (w/ V100 GPU)), as compared with the best open-source implementations: ## Using DGL
| Model | Reported <br> Accuracy | DGL <br> Accuracy | Author's training speed (epoch time) | DGL speed (epoch time) | Improvement | **A data scientist** may want to apply a pre-trained model to your data right away. For this you can use DGL's [Application packages, formally *Model Zoo*](https://github.com/dmlc/dgl/tree/master/apps). Application packages are developed for domain applications, as is the case for [DGL-LifeScience](https://github.com/dmlc/dgl/tree/master/apps/life_sci). We will soon add model zoo for knowledge graph embedding learning and recommender systems. Here's how you will use a pretrained model:
| ---------------------------------------------------------------- | ---------------------- | ----------------- | ----------------------------------------------------------------------------- | ---------------------- | ----------- | ```python
| [GCN](https://arxiv.org/abs/1609.02907) | 81.5% | 81.0% | [0.0051s (TF)](https://github.com/tkipf/gcn) | 0.0031s | 1.64x | from dgl.data.chem import Tox21, smiles_to_bigraph, CanonicalAtomFeaturizer
| [GAT](https://arxiv.org/abs/1710.10903) | 83.0% | 83.9% | [0.0982s (TF)](https://github.com/PetarV-/GAT) | 0.0113s | 8.69x | from dgl import model_zoo
| [SGC](https://arxiv.org/abs/1902.07153) | 81.0% | 81.9% | n/a | 0.0008s | n/a |
| [TreeLSTM](http://arxiv.org/abs/1503.00075) | 51.0% | 51.72% | [14.02s (DyNet)](https://github.com/clab/dynet/tree/master/examples/treelstm) | 3.18s | 4.3x |
| [R-GCN <br> (classification)](https://arxiv.org/abs/1703.06103) | 73.23% | 73.53% | [0.2853s (Theano)](https://github.com/tkipf/relational-gcn) | 0.0075s | 38.2x |
| [R-GCN <br> (link prediction)](https://arxiv.org/abs/1703.06103) | 0.158 | 0.151 | [2.204s (TF)](https://github.com/MichSchli/RelationPrediction) | 0.453s | 4.86x |
| [JTNN](https://arxiv.org/abs/1802.04364) | 96.44% | 96.44% | [1826s (Pytorch)](https://github.com/wengong-jin/icml18-jtnn) | 743s | 2.5x |
| [LGNN](https://arxiv.org/abs/1705.08415) | 94% | 94% | n/a | 1.45s | n/a |
| [DGMG](https://arxiv.org/pdf/1803.03324.pdf) | 84% | 90% | n/a | 238s | n/a |
| [GraphWriter](https://www.aclweb.org/anthology/N19-1238.pdf) | 14.3(BLEU) | 14.31(BLEU) | [1970s (PyTorch)](https://github.com/rikdz/GraphWriter) | 1192s | 1.65x |
With the MXNet/Gluon backend , we scaled a graph of 50M nodes and 150M edges on a P3.8xlarge instance, dataset = Tox21(smiles_to_bigraph, CanonicalAtomFeaturizer())
with 160s per epoch, on SSE ([Stochastic Steady-state Embedding](https://www.cc.gatech.edu/~hdai8/pdf/equilibrium_embedding.pdf)), model = model_zoo.chem.load_pretrained('GCN_Tox21') # Pretrained model loaded
a model similar to GCN. model.eval()
smiles, g, label, mask = dataset[0]
feats = g.ndata.pop('h')
label_pred = model(g, feats)
```
We are currently in Beta stage. More features and improvements are coming. **Further reading**: DGL is released as a managed service on AWS SageMaker, see the medium posts for an easy trip to DGL on SageMaker([part1](https://medium.com/@julsimon/a-primer-on-graph-neural-networks-with-amazon-neptune-and-the-deep-graph-library-5ce64984a276) and [part2](https://medium.com/@julsimon/deep-graph-library-part-2-training-on-amazon-sagemaker-54d318dfc814)).
**Researchers** can start from the growing list of [models implemented in DGL](https://github.com/dmlc/dgl/tree/master/examples). Developing new models does not mean that you have to start from scratch. Instead, you can reuse many [pre-built modules](https://docs.dgl.ai/api/python/nn.html). Here is how to get a standard two-layer graph convolutional model with a pre-built GraphConv module:
```python
from dgl.nn.pytorch import GraphConv
import torch.nn.functional as F
# build a two-layer GCN with ReLU as the activation in between
class GCN(nn.Module):
def __init__(self, in_feats, h_feats, num_classes):
super(GCN, self).__init__()
self.gcn_layer1 = GraphConv(in_feats, h_feats)
self.gcn_layer2 = GraphConv(h_feats, num_classes)
def forward(self, graph, inputs):
h = self.gcn_layer1(graph, inputs)
h = F.relu(h)
h = self.gcn_layer2(graph, h)
return h
```
Next level down, you may want to innovate your own module. DGL offers a succinct message-passing interface (see tutorial [here](https://docs.dgl.ai/tutorials/basics/3_pagerank.html)). Here is how Graph Attention Network (GAT) is implemented ([complete codes](https://docs.dgl.ai/api/python/nn.pytorch.html#gatconv)). Of course, you can also find GAT as a module [GATConv](https://docs.dgl.ai/api/python/nn.pytorch.html#gatconv):
```python
import torch.nn as nn
import torch.nn.functional as F
# Define a GAT layer
class GATLayer(nn.Module):
def __init__(self, in_feats, out_feats):
super(GATLayer, self).__init__()
self.linear_func = nn.Linear(in_feats, out_feats, bias=False)
self.attention_func = nn.Linear(2 * out_feats, 1, bias=False)
def edge_attention(self, edges):
concat_z = torch.cat([edges.src['z'], edges.dst['z']], dim=1)
src_e = self.attention_func(concat_z)
src_e = F.leaky_relu(src_e)
return {'e': src_e}
def message_func(self, edges):
return {'z': edges.src['z'], 'e':edges.data['e']}
def reduce_func(self, nodes):
a = F.softmax(nodes.mailbox['e'], dim=1)
h = torch.sum(a * nodes.mailbox['z'], dim=1)
return {'h': h}
def forward(self, graph, h):
z = self.linear_func(h)
graph.ndata['z'] = z
graph.apply_edges(self.edge_attention)
graph.update_all(self.message_func, self.reduce_func)
return graph.ndata.pop('h')
```
## Performance and Scalability
**Microbenchmark on speed and memory usage**: While leaving tensor and autograd functions to backend frameworks (e.g. PyTorch, MXNet, and TensorFlow), DGL aggressively optimizes storage and computation with its own kernels. Here's a comparison to another pupular package -- PyG. The short story is that raw speed is similar, but DGL has much better memory management.
| Dataset | Model | Accuracy | Time <br> PyG &emsp;&emsp; DGL | Memory <br> PyG &emsp;&emsp; DGL |
| -------- |:------------:|:--------------------------------------------:|:--------------------------------------------------------------------:|:-----------------------------------------------------:|
| Cora | GCN <br> GAT | 81.31 &plusmn; 0.88 <br> 83.98 &plusmn; 0.52 | <b>0.478</b> &emsp;&emsp; 0.666 <br> 1.608 &emsp;&emsp; <b>1.399</b> | 1.1 &emsp;&emsp; 1.1 <br> 1.2 &emsp;&emsp; <b>1.1</b> |
| CiteSeer | GCN <br> GAT | 70.98 &plusmn; 0.68 <br> 69.96 &plusmn; 0.53 | <b>0.490</b> &emsp;&emsp; 0.674 <br> 1.606 &emsp;&emsp; <b>1.399</b> | 1.1 &emsp;&emsp; 1.1 <br> 1.3 &emsp;&emsp; <b>1.1</b> |
| PubMed | GCN <br> GAT | 79.00 &plusmn; 0.41 <br> 77.65 &plusmn; 0.32 | <b>0.491</b> &emsp;&emsp; 0.690 <br> 1.946 &emsp;&emsp; <b>1.393</b> | 1.1 &emsp;&emsp; 1.1 <br> 1.6 &emsp;&emsp; <b>1.1</b> |
| Reddit | GCN | 93.46 &plusmn; 0.06 | *OOM*&emsp;&emsp; <b>28.6</b> | *OOM* &emsp;&emsp; <b>11.7</b> |
| Reddit-S | GCN | N/A | 29.12 &emsp;&emsp; <b>9.44</b> | 15.7 &emsp;&emsp; <b>3.6</b> |
Table: Training time(in seconds) for 200 epochs and memory consumption(GB)
High memory utilization allows DGL to push the limit of single-GPU performance, as seen in below images.
| <img src="https://i.imgur.com/CvXc9Uu.png" width="400"> | <img src="https://i.imgur.com/HnCfJyU.png" width="400"> |
| -------- | -------- |
## News **Scalability**: DGL has fully leveraged multiple GPUs in both one machine and clusters for increasing training speed, and has better performance than alternatives, as seen in below images.
v0.4 has just been released! DGL now support **heterogeneous graphs**, and comes <p align="center">
with a subpackage **DGL-KE** that computes embeddings for large knowledge graphs <img src="https://i.imgur.com/IGERtVX.png" width="600">
such as Freebase (1.9 billion triplets). </p>
See release note [here](https://github.com/dmlc/dgl/releases/tag/v0.4).
We presented DGL at [GTC 2019](https://www.nvidia.com/en-us/gtc/) as an | <img src="https://i.imgur.com/BugYro2.png"> | <img src="https://i.imgur.com/KQ4nVdX.png"> |
instructor-led training session. Check out our slides and tutorial materials | :---------------------------------------: | -- |
[here](https://github.com/dglai/DGL-GTC2019)!!!
## System requirements
**Further reading**: Detailed comparison of DGL and other Graph alternatives can be found [here](https://arxiv.org/abs/1909.01315).
## DGL Models and Applications
### DGL for research
Overall there are 30+ models implemented by using DGL:
- [PyTorch](https://github.com/dmlc/dgl/tree/master/examples/pytorch)
- [MXNet](https://github.com/dmlc/dgl/tree/master/examples/mxnet)
- [TensorFlow](https://github.com/dmlc/dgl/tree/master/examples/tensorflow)
### DGL for domain applications
- [DGL-LifeSci](https://github.com/dmlc/dgl/tree/master/apps/life_sci), previously DGL-Chem
- [DGL-KE](https://github.com/dmlc/dgl/tree/master/apps/kg)
- DGL-RecSys(coming soon)
### DGL for NLP/CV problems
- [LSTM](https://github.com/dmlc/dgl/tree/master/examples/pytorch/tree_lstm)
- [GraphWriter](https://github.com/dmlc/dgl/tree/master/examples/pytorch/graphwriter)
- [Capsule Network](https://github.com/dmlc/dgl/tree/master/examples/pytorch/capsule)
We are currently in Beta stage. More features and improvements are coming.
## Installation
DGL should work on DGL should work on
...@@ -56,12 +145,10 @@ DGL should work on ...@@ -56,12 +145,10 @@ DGL should work on
* macOS X * macOS X
* Windows 10 * Windows 10
DGL also requires Python 3.5 or later. Python 2 support is coming. DGL requires Python 3.5 or later.
Right now, DGL works on [PyTorch](https://pytorch.org) 0.4.1+ and [MXNet](https://mxnet.apache.org) nightly Right now, DGL works on [PyTorch](https://pytorch.org) 1.1.0+, [MXNet](https://mxnet.apache.org) nightly build, and [TensorFlow](https://tensorflow.org) 2.0+.
build.
## Installation
### Using anaconda ### Using anaconda
...@@ -84,98 +171,27 @@ conda install -c dglteam dgl-cuda10.1 # CUDA 10.1 ...@@ -84,98 +171,27 @@ conda install -c dglteam dgl-cuda10.1 # CUDA 10.1
| CUDA 10.0 | `pip install --pre dgl-cu100` | `pip install dgl-cu100` | | CUDA 10.0 | `pip install --pre dgl-cu100` | `pip install dgl-cu100` |
| CUDA 10.1 | `pip install --pre dgl-cu101` | `pip install dgl-cu101` | | CUDA 10.1 | `pip install --pre dgl-cu101` | `pip install dgl-cu101` |
### From source ### Built from source code
Refer to the guide [here](https://docs.dgl.ai/install/index.html#install-from-source). Refer to the guide [here](https://docs.dgl.ai/install/index.html#install-from-source).
## How DGL looks like
A graph can be constructed with feature tensors like this: ## DGL Major Releases
```python | Releases | Date | Features |
import dgl |-----------|--------|-------------------------|
import torch as th | v0.4.2 | 01/24/2020 | - Heterograph support <br> - TensorFlow support (experimental) <br> - MXNet GNN modules <br> |
| v0.3.1 | 08/23/2019 | - APIs for GNN modules <br> - Model zoo (DGL-Chem) <br> - New installation |
g = dgl.DGLGraph() | v0.2 | 03/09/2019 | - Graph sampling APIs <br> - Speed improvement |
g.add_nodes(5) # add 5 nodes | v0.1 | 12/07/2018 | - Basic DGL APIs <br> - PyTorch and MXNet support <br> - GNN model examples and tutorials |
g.add_edges([0, 0, 0, 0], [1, 2, 3, 4]) # add 4 edges 0->1, 0->2, 0->3, 0->4
g.ndata['h'] = th.randn(5, 3) # assign one 3D vector to each node
g.edata['h'] = th.randn(4, 4) # assign one 4D vector to each edge
```
This is *everything* to implement a single layer for Graph Convolutional Network on PyTorch: ## New to Deep Learning and Graph Deep Learning?
```python
import dgl.function as fn
import torch.nn as nn
import torch.nn.functional as F
from dgl import DGLGraph
msg_func = fn.copy_src(src='h', out='m')
reduce_func = fn.sum(msg='m', out='h')
class GCNLayer(nn.Module):
def __init__(self, in_feats, out_feats):
super(GCNLayer, self).__init__()
self.linear = nn.Linear(in_feats, out_feats)
def apply(self, nodes):
return {'h': F.relu(self.linear(nodes.data['h']))}
def forward(self, g, feature):
g.ndata['h'] = feature
g.update_all(msg_func, reduce_func)
g.apply_nodes(func=self.apply)
return g.ndata.pop('h')
```
One can also customize how message and reduce function works. The following code
demonstrates a (simplified version of) Graph Attention Network (GAT) layer:
```python
def msg_func(edges):
return {'k': edges.src['k'], 'v': edges.src['v']}
def reduce_func(nodes):
# nodes.data['q'] has the shape
# (number_of_nodes, feature_dims)
# nodes.data['k'] and nodes.data['v'] have the shape
# (number_of_nodes, number_of_incoming_messages, feature_dims)
# You only need to deal with the case where all nodes have the same number
# of incoming messages.
q = nodes.data['q'][:, None]
k = nodes.mailbox['k']
v = nodes.mailbox['v']
s = F.softmax((q * k).sum(-1), 1)[:, :, None]
return {'v': th.sum(s * v, 1)}
class GATLayer(nn.Module):
def __init__(self, in_feats, out_feats):
super(GATLayer, self).__init__()
self.Q = nn.Linear(in_feats, out_feats)
self.K = nn.Linear(in_feats, out_feats)
self.V = nn.Linear(in_feats, out_feats)
def apply(self, nodes):
return {'v': F.relu(self.linear(nodes.data['v']))}
def forward(self, g, feature):
g.ndata['v'] = self.V(feature)
g.ndata['q'] = self.Q(feature)
g.ndata['k'] = self.K(feature)
g.update_all(msg_func, reduce_func)
g.apply_nodes(func=self.apply)
return g.ndata['v']
```
For the basics of coding with DGL, please see [DGL basics](https://docs.dgl.ai/tutorials/basics/index.html).
For more realistic, end-to-end examples, please see [model tutorials](https://docs.dgl.ai/tutorials/models/index.html).
Check out the open source book [*Dive into Deep Learning*](http://gluon.ai/).
## New to Deep Learning? For those who are new to graph nerual network, please see the [basic of DGL](https://docs.dgl.ai/tutorials/basics/index.html).
Check out the open source book [*Dive into Deep Learning*](http://gluon.ai/). For audience who are looking for more advanced, realistic, and end-to-end examples, please see [model tutorials](https://docs.dgl.ai/tutorials/models/index.html).
## Contributing ## Contributing
...@@ -183,6 +199,7 @@ Check out the open source book [*Dive into Deep Learning*](http://gluon.ai/). ...@@ -183,6 +199,7 @@ Check out the open source book [*Dive into Deep Learning*](http://gluon.ai/).
Please let us know if you encounter a bug or have any suggestions by [filing an issue](https://github.com/dmlc/dgl/issues). Please let us know if you encounter a bug or have any suggestions by [filing an issue](https://github.com/dmlc/dgl/issues).
We welcome all contributions from bug fixes to new features and extensions. We welcome all contributions from bug fixes to new features and extensions.
We expect all contributions discussed in the issue tracker and going through PRs. Please refer to our [contribution guide](https://docs.dgl.ai/contribute.html). We expect all contributions discussed in the issue tracker and going through PRs. Please refer to our [contribution guide](https://docs.dgl.ai/contribute.html).
## Cite ## Cite
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment