performance.rst

Performance Benchmarks
======================

Integrated Benchmarks
---------------------

DGL continuously evaluates the speed of its core APIs, kernels as well as the training speed
of the state-of-the-art GNN models. The benchmark code is available at
`the main repository <https://github.com/dmlc/dgl/tree/master/benchmarks>`_. They are triggered
for every nightly-built version and the results are published to
`https://asv.dgl.ai/ <https://asv.dgl.ai>`_.

v0.6 Benchmarks
---------------

To understand the performance gain of DGL v0.6, we re-evaluated it on the v0.5 benchmarks
plus some new ones for graph classification tasks against the updated baselines. The results
are available in `a standalone repository <https://github.com/dglai/dgl-0.5-benchmark>`_.

v0.5 Benchmarks
---------------

Here is a summary of the system evaluation results from our paper `Deep Graph Library: 
A Graph-Centric, Highly-Performant Package for Graph Neural Networks <https://arxiv.org/abs/1909.01315>`_.

**Microbenchmark on speed and memory usage**:
While leaving tensor and autograd functions to backend frameworks (e.g.
PyTorch, MXNet, and TensorFlow), DGL aggressively optimizes storage and
computation with its own kernels. Here's a comparison to another popular
package -- PyTorch Geometric (PyG). The short story is that raw speed is
similar, but DGL has much better memory management.

+----------+--------------+-----------------+-------------------------+-------------------------+
| Dataset  |    Model     |   Accuracy      |         Time            |           Memory        |
|          |              |                 +------------+------------+------------+------------+
|          |              |                 |  PyG       |  DGL       |  PyG       |  DGL       |
+==========+==============+=================+============+============+============+============+
| Cora     | GCN          | 81.31 ± 0.88    | **0.478**  | 0.666      | 1.1        | 1.1        |
+          +--------------+-----------------+------------+------------+------------+------------+
|          | GAT          | 83.98 ± 0.52    | 1.608      | **1.399**  | 1.2        | **1.1**    |
+----------+--------------+-----------------+------------+------------+------------+------------+
| CiteSeer | GCN          | 70.98 ± 0.68    | **0.490**  | 0.674      | 1.1        | 1.1        |
+          +--------------+-----------------+------------+------------+------------+------------+
|          | GAT          | 69.96 ± 0.53    | 1.606      | **1.399**  | 1.3        | **1.1**    |
+----------+--------------+-----------------+------------+------------+------------+------------+
| PubMed   | GCN          | 79.00 ± 0.41    | **0.491**  | 0.690      | 1.1        | 1.1        |
+          +--------------+-----------------+------------+------------+------------+------------+
|          | GAT          | 77.65 ± 0.32    | 1.946      | **1.393**  | 1.6        | **1.1**    |
+----------+--------------+-----------------+------------+------------+------------+------------+
| Reddit   |     GCN      | 93.46 ± 0.06    | OOM        | **28.6**   | OOM        |  **11.7**  |
+----------+--------------+-----------------+------------+------------+------------+------------+
| Reddit-S |     GCN      | N/A             | 29.12      | **9.44**   | 15.7       |  **3.6**   |
+----------+--------------+-----------------+------------+------------+------------+------------+

Table: Training time(in seconds) for 200 epochs and memory consumption(GB)

Here is another comparison of DGL on TensorFlow backend with other TF-based GNN tools (training time in seconds for one epoch):

+---------+-------+--------+----------+--------------+
| Dateset | Model | DGL    | GraphNet | tf_geometric |
+=========+=======+========+==========+==============+
| Core    | GCN   | 0.0148 | 0.0152   | 0.0192       |
+---------+-------+--------+----------+--------------+
| Reddit  | GCN   | 0.1095 | OOM      | OOM          |
+---------+-------+--------+----------+--------------+
| PubMed  | GCN   | 0.0156 | 0.0553   | 0.0185       |
+---------+-------+--------+----------+--------------+
| PPI     | GCN   | 0.09   | 0.16     | 0.21         |
+---------+-------+--------+----------+--------------+
| Cora    | GAT   | 0.0442 | n/a      | 0.058        |
+---------+-------+--------+----------+--------------+
| PPI     | GAT   | 0.398  | n/a      | 0.752        |
+---------+-------+--------+----------+--------------+

High memory utilization allows DGL to push the limit of single-GPU performance, as seen in below images.

.. image:: http://data.dgl.ai/asset/image/DGLvsPyG-time1.png

.. image:: http://data.dgl.ai/asset/image/DGLvsPyG-time2.png

**Scalability**:
DGL has fully leveraged multiple GPUs in both one machine and clusters for
increasing training speed, and has better performance than alternatives, as
seen in below images.

.. image:: http://data.dgl.ai/asset/image/one-four-GPUs.png

.. image:: http://data.dgl.ai/asset/image/one-four-GPUs-DGLvsGraphVite.png

.. image:: http://data.dgl.ai/asset/image/one-fourMachines.png

**Further reading**:
Detailed comparison of DGL and other alternatives can be found
[here](https://arxiv.org/abs/1909.01315).