[Doc] Note for performance benchmarks (#3152)

Co-authored-by: Tong He <hetong007@gmail.com>

[Doc] Note for performance benchmarks (#3152)
Co-authored-by: Tong He <hetong007@gmail.com>
d1cc0969 · Minjie Wang · GitHub · ee1c1a82 · d1cc0969 · d1cc0969
Unverified Commit d1cc0969 authored Jul 20, 2021 by Minjie Wang Committed by GitHub Jul 20, 2021
Show whitespace changes
Inline Side-by-side

Showing with 96 additions and 1 deletion

docs/source/index.rst docs/source/index.rst +2 -1

docs/source/performance.rst docs/source/performance.rst +94 -0

No files found.
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -52,12 +52,13 @@ Welcome to Deep Graph Library Tutorials and Documentation
 .. toctree::
   :maxdepth: 1
-   :caption: Developer Notes
+   :caption: Notes
   :hidden:
   :glob:
   contribute
   developer/ffi
+   performance
 .. toctree::
   :maxdepth: 1

--- a/docs/source/performance.rst
+++ b/docs/source/performance.rst
+Performance Benchmarks
+======================
+Integrated Benchmarks
+---------------------
+DGL continuously evaluates the speed of its core APIs, kernels as well as the training speed
+of the state-of-the-art GNN models. The benchmark code is available at
+`the main repository <https://github.com/dmlc/dgl/tree/master/benchmarks>`_. They are triggered
+for every nightly-built version and the results are published to
+`https://asv.dgl.ai/ <https://asv.dgl.ai>`_.
+v0.6 Benchmarks
+---------------
+To understand the performance gain of DGL v0.6, we re-evaluated it on the v0.5 benchmarks
+plus some new ones for graph classification tasks against the updated baselines. The results
+are available in `a standalone repository <https://github.com/dglai/dgl-0.5-benchmark>`_.
+v0.5 Benchmarks
+---------------
+Here is a summary of the system evaluation results from our paper `Deep Graph Library: 
+A Graph-Centric, Highly-Performant Package for Graph Neural Networks <https://arxiv.org/abs/1909.01315>`_.
+**Microbenchmark on speed and memory usage**:
+While leaving tensor and autograd functions to backend frameworks (e.g.
+PyTorch, MXNet, and TensorFlow), DGL aggressively optimizes storage and
+computation with its own kernels. Here's a comparison to another popular
+package -- PyTorch Geometric (PyG). The short story is that raw speed is
+similar, but DGL has much better memory management.
+----------+--------------+-----------------+-------------------------+-------------------------+
+| Dataset  |    Model     |   Accuracy      |         Time            |           Memory        |
+|          |              |                 +------------+------------+------------+------------+
+|          |              |                 |  PyG       |  DGL       |  PyG       |  DGL       |
+==========+==============+=================+============+============+============+============+
+| Cora     | GCN          | 81.31 ± 0.88    | **0.478**  | 0.666      | 1.1        | 1.1        |
+          +--------------+-----------------+------------+------------+------------+------------+
+|          | GAT          | 83.98 ± 0.52    | 1.608      | **1.399**  | 1.2        | **1.1**    |
+----------+--------------+-----------------+------------+------------+------------+------------+
+| CiteSeer | GCN          | 70.98 ± 0.68    | **0.490**  | 0.674      | 1.1        | 1.1        |
+          +--------------+-----------------+------------+------------+------------+------------+
+|          | GAT          | 69.96 ± 0.53    | 1.606      | **1.399**  | 1.3        | **1.1**    |
+----------+--------------+-----------------+------------+------------+------------+------------+
+| PubMed   | GCN          | 79.00 ± 0.41    | **0.491**  | 0.690      | 1.1        | 1.1        |
+          +--------------+-----------------+------------+------------+------------+------------+
+|          | GAT          | 77.65 ± 0.32    | 1.946      | **1.393**  | 1.6        | **1.1**    |
+----------+--------------+-----------------+------------+------------+------------+------------+
+| Reddit   |     GCN      | 93.46 ± 0.06    | OOM        | **28.6**   | OOM        |  **11.7**  |
+----------+--------------+-----------------+------------+------------+------------+------------+
+| Reddit-S |     GCN      | N/A             | 29.12      | **9.44**   | 15.7       |  **3.6**   |
+----------+--------------+-----------------+------------+------------+------------+------------+
+Table: Training time(in seconds) for 200 epochs and memory consumption(GB)
+Here is another comparison of DGL on TensorFlow backend with other TF-based GNN tools (training time in seconds for one epoch):
+---------+-------+--------+----------+--------------+
+| Dateset | Model | DGL    | GraphNet | tf_geometric |
+=========+=======+========+==========+==============+
+| Core    | GCN   | 0.0148 | 0.0152   | 0.0192       |
+---------+-------+--------+----------+--------------+
+| Reddit  | GCN   | 0.1095 | OOM      | OOM          |
+---------+-------+--------+----------+--------------+
+| PubMed  | GCN   | 0.0156 | 0.0553   | 0.0185       |
+---------+-------+--------+----------+--------------+
+| PPI     | GCN   | 0.09   | 0.16     | 0.21         |
+---------+-------+--------+----------+--------------+
+| Cora    | GAT   | 0.0442 | n/a      | 0.058        |
+---------+-------+--------+----------+--------------+
+| PPI     | GAT   | 0.398  | n/a      | 0.752        |
+---------+-------+--------+----------+--------------+
+High memory utilization allows DGL to push the limit of single-GPU performance, as seen in below images.
+.. image:: http://data.dgl.ai/asset/image/DGLvsPyG-time1.png
+.. image:: http://data.dgl.ai/asset/image/DGLvsPyG-time2.png
+**Scalability**:
+DGL has fully leveraged multiple GPUs in both one machine and clusters for
+increasing training speed, and has better performance than alternatives, as
+seen in below images.
+.. image:: http://data.dgl.ai/asset/image/one-four-GPUs.png
+.. image:: http://data.dgl.ai/asset/image/one-four-GPUs-DGLvsGraphVite.png
+.. image:: http://data.dgl.ai/asset/image/one-fourMachines.png
+**Further reading**:
+Detailed comparison of DGL and other alternatives can be found
+[here](https://arxiv.org/abs/1909.01315).