"git@developer.sourcefind.cn:change/sglang.git" did not exist on "ed3157997153fdbbe142bf4ef995ecaaae62fc34"
Unverified Commit b0a9d16f authored by zhjwy9343's avatar zhjwy9343 Committed by GitHub
Browse files

[Doc] Chinese User Guide chapter 1 - 4 (#2351)



* [Feature] Add full graph training with dgl built-in dataset.

* [Feature] Add full graph training with dgl built-in dataset.

* [Feature] Add full graph training with dgl built-in dataset.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Feature] Add test loss and accuracy

* [Feature] Add test loss and accuracy

* [Feature] Add test loss and accuracy

* [Feature] Add test loss and accuracy

* [Feature] Add test loss and accuracy

* [Feature] Add test loss and accuracy

* [Fix] Add random

* [Bug] Fix batch norm error

* [Doc] Test with CN in Sphinx

* [Doc] Test with CN in Sphinx

* [Doc] Remove the test CN docs.

* [Feature] Add input embedding layer

* [Feature] Add input embedding layer

* [Feature] Add input embedding layer

* [Feature] Add input embedding layer

* [Feature] Add input embedding layer

* [Feature] Add input embedding layer

* [Feature] Add input embedding layer

* [Feature] Add input embedding layer

* [Feature] Add input embedding layer

* [Doc] fill readme with new performance results

* [Doc] Add Chinese User Guide, graph and 1.5

* [Doc] Add Chinese User Guide, graph and 1.5

* Update README.md

* [Fix] Temporary remove compgcn

* [Doc] Add CN user guide chapter2

* [Test] Tunning format

* [Test] Tunning format

* [Test] Tunning format

* [Test] Tunning format

* [Test] Tunning format

* [Test] Section headers

* [Fix] Fix format errors

* [Fix] Fix format errors

* [Fix] Fix format errors

* [Doc] Add CN-EN EN-CN links

* [Doc] Add CN-EN EN-CN links

* [Doc] Copyedit chapter2

* [Doc] Copyedit chapter2

* [Doc] Remove EN in 2.1

* [Doc] Remove EN in chapter 2

* [Doc] Copyedit first 2 sections

* [Doc] Copyedit first 2 sections

* [Doc] copyedited chapter 2 CN

* [Doc] Add chapter 3 raw texts

* [Doc] Add chapter 3 preface and 3.1

* [Doc] Add chapter 3.2 and 3.3

* [Doc] Add chapter 3.2 and 3.3

* [Doc] Add chapter 3.2 and 3.3

* [Doc] Remove EN parts

* [Doc] Copyediting 3.1

* [Doc] Copyediting 3.2 and 3.3

* [Doc] Proofreading 3.1 and 3.2

* [Doc] Proofreading 3.2 and 3.3

* [Doc] Add chapter 4 CN raw text.

* [Clean] Remove codes in other branches

* [Doc] Start to copyedit chapter 4 preface

* [Doc] copyedit CN section 4.1

* [Doc] Remove EN in User Guide Chapter 4

* [Doc] Copyedit chapter 4.1

* [Doc] copyedit cn chapter 4.2, 4.3, 4.4, and 4.5.

* [Doc] Fix errors in EN user guide graph feature and heterograph

* [Doc] 2nd round copyediting with Murph's comments

* [Doc] 3rd round copyediting with Murph's comments

* [Doc] 3rd round copyediting with Murph's comments

* [Doc] 3rd round copyediting with Murph's comments

* [Sync] syncronize with the dgl master

* [Doc] edited after Minjie's comments, 1st round

* update cub
Co-authored-by: default avatarMinjie Wang <wmjlyjemaine@gmail.com>
parent 9c08cd6b
......@@ -3,6 +3,8 @@
4.1 DGLDataset class
--------------------
:ref:`(中文版) <guide_cn-data-pipeline-dataset>`
:class:`~dgl.data.DGLDataset` is the base class for processing, loading and saving
graph datasets defined in :ref:`apidata`. It implements the basic pipeline
for processing graph data. The following flow chart shows how the
......
......@@ -3,6 +3,8 @@
4.2 Download raw data (optional)
--------------------------------
:ref:`(中文版) <guide_cn-data-pipeline-download>`
If a dataset is already in local disk, make sure it’s in directory
``raw_dir``. If one wants to run the code anywhere without bothering to
download and move data to the right directory, one can do it
......
......@@ -3,6 +3,8 @@
4.5 Loading OGB datasets using ``ogb`` package
----------------------------------------------
:ref:`(中文版) <guide_cn-data-pipeline-loadogb>`
`Open Graph Benchmark (OGB) <https://ogb.stanford.edu/docs/home/>`__ is
a collection of benchmark datasets. The official OGB package
`ogb <https://github.com/snap-stanford/ogb>`__ provides APIs for
......@@ -61,7 +63,7 @@ there is only one graph object in this kind of dataset.
valid_label = dataset.labels[split_idx['valid']]
test_label = dataset.labels[split_idx['test']]
*Link Property Prediction* datasets also contain one graph per dataset:
*Link Property Prediction* datasets also contain one graph per dataset.
.. code::
......
......@@ -3,6 +3,8 @@
4.3 Process data
----------------
:ref:`(中文版) <guide_cn-data-pipeline-process>`
One can implement the data processing code in function ``process()``, and it
assumes that the raw data is located in ``self.raw_dir`` already. There
are typically three types of tasks in machine learning on graphs: graph
......
......@@ -3,6 +3,8 @@
4.4 Save and load data
----------------------
:ref:`(中文版) <guide_cn-data-pipeline-savenload>`
DGL recommends implementing saving and loading functions to cache the
processed data in local disk. This saves a lot of data processing time
in most cases. DGL provides four functions to make things simple:
......@@ -44,9 +46,4 @@ dataset information.
Note that there are cases not suitable to save processed data. For
example, in the builtin dataset :class:`~dgl.data.GDELTDataset`,
the processed data is quite large, so it’s more effective to process
each data example in ``__getitem__(idx)``.
.. code::
print(split_edge['valid'].keys())
print(split_edge['test'].keys())
each data example in ``__getitem__(idx)``.
\ No newline at end of file
......@@ -3,6 +3,8 @@
Chapter 4: Graph Data Pipeline
==============================
:ref:`(中文版) <guide_cn-data-pipeline>`
DGL implements many commonly used graph datasets in :ref:`apidata`. They
follow a standard pipeline defined in class :class:`dgl.data.DGLDataset`. DGL highly
recommends processing graph data into a :class:`dgl.data.DGLDataset` subclass, as the
......
......@@ -61,4 +61,5 @@ For weighted graphs, one can store the weights as an edge feature as below.
ndata_schemes={}
edata_schemes={'w' : Scheme(shape=(,), dtype=torch.float32)})
See APIs: :py:attr:`~dgl.DGLGraph.ndata`, :py:attr:`~dgl.DGLGraph.edata`.
......@@ -3,6 +3,8 @@
2.1 Built-in Functions and Message Passing APIs
-----------------------------------------------
:ref:`(中文版) <guide_cn-message-passing-api>`
In DGL, **message function** takes a single argument ``edges``,
which is an :class:`~dgl.udf.EdgeBatch` instance. During message passing,
DGL generates it internally to represent a batch of edges. It has three
......
......@@ -3,6 +3,8 @@
2.4 Apply Edge Weight In Message Passing
----------------------------------------
:ref:`(中文版) <guide_cn-message-passing-edge>`
A commonly seen practice in GNN modeling is to apply edge weight on the
message before message aggregation, for examples, in
`GAT <https://arxiv.org/pdf/1710.10903.pdf>`__ and some `GCN
......
......@@ -3,6 +3,8 @@
2.2 Writing Efficient Message Passing Code
------------------------------------------
:ref:`(中文版) <guide_cn-message-passing-efficient>`
DGL optimizes memory consumption and computing speed for message
passing. The optimization includes:
......
......@@ -3,6 +3,8 @@
2.5 Message Passing on Heterogeneous Graph
------------------------------------------
:ref:`(中文版) <guide_cn-message-passing-heterograph>`
Heterogeneous graphs (:ref:`guide-graph-heterogeneous`), or
heterographs for short, are graphs that contain different types of nodes
and edges. The different types of nodes and edges tend to have different
......
......@@ -3,6 +3,8 @@
2.3 Apply Message Passing On Part Of The Graph
----------------------------------------------
:ref:`(中文版) <guide_cn-message-passing-part>`
If one only wants to update part of the nodes in the graph, the practice
is to create a subgraph by providing the IDs for the nodes to
include in the update, then call :meth:`~dgl.DGLGraph.update_all` on the
......
......@@ -3,6 +3,8 @@
Chapter 2: Message Passing
==========================
:ref:`(中文版) <guide_cn-message-passing>`
Message Passing Paradigm
------------------------
......
......@@ -3,6 +3,8 @@
3.1 DGL NN Module Construction Function
---------------------------------------
:ref:`(中文版) <guide_cn-nn-construction>`
The construction function performs the following steps:
1. Set options.
......
......@@ -3,6 +3,8 @@
3.2 DGL NN Module Forward Function
----------------------------------
:ref:`(中文版) <guide_cn-nn-forward>`
In NN module, ``forward()`` function does the actual message passing and
computation. Compared with PyTorch’s NN module which usually takes
tensors as the parameters, DGL NN module takes an additional parameter
......@@ -60,7 +62,7 @@ The math formulas for SAGEConv are:
One needs to specify the source node feature ``feat_src`` and destination
node feature ``feat_dst`` according to the graph type.
:meth:``~dgl.utils.expand_as_pair`` is a function that specifies the graph
:meth:`~dgl.utils.expand_as_pair` is a function that specifies the graph
type and expand ``feat`` into ``feat_src`` and ``feat_dst``.
The detail of this function is shown below.
......@@ -95,9 +97,7 @@ element will be the destination node feature.
In mini-batch training, the computing is applied on a subgraph sampled
based on a bunch of destination nodes. The subgraph is called as
``block`` in DGL. After message passing, only those destination nodes
will be updated since they have the same neighborhood as the one they
have in the original full graph. In the block creation phase,
``block`` in DGL. In the block creation phase,
``dst nodes`` are in the front of the node list. One can find the
``feat_dst`` by the index ``[0:g.number_of_dst_nodes()]``.
......@@ -120,7 +120,7 @@ Message passing and reducing
elif self._aggre_type == 'gcn':
check_eq_shape(feat)
graph.srcdata['h'] = feat_src
graph.dstdata['h'] = feat_dst # same as above if homogeneous
graph.dstdata['h'] = feat_dst
graph.update_all(fn.copy_u('h', 'm'), fn.sum('m', 'neigh'))
# divide in_degrees
degs = graph.in_degrees().to(feat_dst)
......
.. _guide-nn-heterograph:
3.3 Heterogeneous GraphConv Module
----------------------------------
------------------------------------
:ref:`(中文版) <guide_cn-nn-heterograph>`
:class:`~dgl.nn.pytorch.HeteroGraphConv`
is a module-level encapsulation to run DGL NN module on heterogeneous
graphs. The implementation logic is the same as message passing level API
:meth:`~dgl.DGLGraph.multi_update_all`:
:meth:`~dgl.DGLGraph.multi_update_all`, including:
- DGL NN module within each relation :math:`r`.
- Reduction that merges the results on the same node type from multiple
......
......@@ -3,6 +3,8 @@
Chapter 3: Building GNN Modules
===============================
:ref:`(中文版) <guide_cn-nn>`
DGL NN module consists of building blocks for GNN models. An NN module inherits
from `Pytorch’s NN Module <https://pytorch.org/docs/1.2.0/_modules/torch/nn/modules/module.html>`__, `MXNet Gluon’s NN Block <http://mxnet.incubator.apache.org/versions/1.6/api/python/docs/api/gluon/nn/index.html>`__ and `TensorFlow’s Keras
Layer <https://www.tensorflow.org/api_docs/python/tf/keras/layers>`__, depending on the DNN framework backend in use. In a DGL NN
......
.. _guide_cn-data-pipeline-dataset:
4.1 DGLDataset类
--------------------
:ref:`(English Version) <guide-data-pipeline-dataset>`
:class:`~dgl.data.DGLDataset` 是处理、导入和保存 :ref:`apidata` 中定义的图数据集的基类。
它实现了用于处理图数据的基本模版。下面的流程图展示了这个模版的工作方式。
.. figure:: https://data.dgl.ai/asset/image/userguide_data_flow.png
:align: center
在类DGLDataset中定义的图数据处理模版的流程图。
为了处理位于远程服务器或本地磁盘上的图数据集,下面的例子中定义了一个类,称为 ``MyDataset``,
它继承自 :class:`dgl.data.DGLDataset`。
.. code::
from dgl.data import DGLDataset
class MyDataset(DGLDataset):
""" 用于在DGL中自定义图数据集的模板:
Parameters
----------
url : str
下载原始数据集的url。
raw_dir : str
指定下载数据的存储目录或已下载数据的存储目录。默认: ~/.dgl/
save_dir : str
处理完成的数据集的保存目录。默认:raw_dir指定的值
force_reload : bool
是否重新导入数据集。默认:False
verbose : bool
是否打印进度信息。
"""
def __init__(self,
url=None,
raw_dir=None,
save_dir=None,
force_reload=False,
verbose=False):
super(MyDataset, self).__init__(name='dataset_name',
url=url,
raw_dir=raw_dir,
save_dir=save_dir,
force_reload=force_reload,
verbose=verbose)
def download(self):
# 将原始数据下载到本地磁盘
pass
def process(self):
# 将原始数据处理为图、标签和数据集划分的掩码
pass
def __getitem__(self, idx):
# 通过idx得到与之对应的一个样本
pass
def __len__(self):
# 数据样本的数量
pass
def save(self):
# 将处理后的数据保存至 `self.save_path`
pass
def load(self):
# 从 `self.save_path` 导入处理后的数据
pass
def has_cache(self):
# 检查在 `self.save_path` 中是否存有处理后的数据
pass
:class:`~dgl.data.DGLDataset` 类有抽象函数 ``process()``,
``__getitem__(idx)`` 和 ``__len__()``。子类必须实现这些函数。同时DGL也建议实现保存和导入函数,
因为对于处理后的大型数据集,这么做可以节省大量的时间,
并且有多个已有的API可以简化此操作(请参阅 :ref:`guide_cn-data-pipeline-savenload`)。
请注意, :class:`~dgl.data.DGLDataset` 的目的是提供一种标准且方便的方式来导入图数据。
用户可以存储有关数据集的图、特征、标签、掩码,以及诸如类别数、标签数等基本信息。
诸如采样、划分或特征归一化等操作建议在 :class:`~dgl.data.DGLDataset` 子类之外完成。
本章的后续部分展示了实现这些函数的最佳实践。
\ No newline at end of file
.. _guide_cn-data-pipeline-download:
4.2 下载原始数据(可选)
--------------------------------
:ref:`(English Version) <guide-data-pipeline-download>`
如果用户的数据集已经在本地磁盘中,请确保它被存放在目录 ``raw_dir`` 中。
如果用户想在任何地方运行代码而又不想自己下载数据并将其移动到正确的目录中,则可以通过实现函数 ``download()`` 来自动完成。
如果数据集是一个zip文件,可以直接继承 :class:`dgl.data.DGLBuiltinDataset` 类。后者支持解压缩zip文件。
否则用户需要自己实现 ``download()``,具体可以参考 :class:`~dgl.data.QM7bDataset` 类:
.. code::
import os
from dgl.data.utils import download
def download(self):
# 存储文件的路径
file_path = os.path.join(self.raw_dir, self.name + '.mat')
# 下载文件
download(self.url, path=file_path)
上面的代码将一个.mat文件下载到目录 ``self.raw_dir``。如果文件是.gz、.tar、.tar.gz或.tgz文件,请使用
:func:`~dgl.data.utils.extract_archive` 函数进行解压缩。以下代码展示了如何在
:class:`~dgl.data.BitcoinOTCDataset` 类中下载一个.gz文件:
.. code::
from dgl.data.utils import download, check_sha1
def download(self):
# 存储文件的路径,请确保使用与原始文件名相同的后缀
gz_file_path = os.path.join(self.raw_dir, self.name + '.csv.gz')
# 下载文件
download(self.url, path=gz_file_path)
# 检查 SHA-1
if not check_sha1(gz_file_path, self._sha1_str):
raise UserWarning('File {} is downloaded but the content hash does not match.'
'The repo may be outdated or download may be incomplete. '
'Otherwise you can create an issue for it.'.format(self.name + '.csv.gz'))
# 将文件解压缩到目录self.raw_dir下的self.name目录中
self._extract_gz(gz_file_path, self.raw_path)
上面的代码会将文件解压缩到 ``self.raw_dir`` 下的目录 ``self.name`` 中。
如果该类继承自 :class:`dgl.data.DGLBuiltinDataset` 来处理zip文件,
则它也会将文件解压缩到目录 ``self.name`` 中。
一个可选项是用户可以按照上面的示例检查下载后文件的SHA-1字符串,以防作者在远程服务器上更改了文件。
\ No newline at end of file
.. _guide_cn-data-pipeline-loadogb:
4.5 使用ogb包导入OGB数据集
----------------------------------------------
:ref:`(English Version) <guide-data-pipeline-loadogb>`
`Open Graph Benchmark (OGB) <https://ogb.stanford.edu/docs/home/>`__ 是一个图深度学习的基准数据集。
官方的 `ogb <https://github.com/snap-stanford/ogb>`__ 包提供了用于下载和处理OGB数据集到
:class:`dgl.data.DGLGraph` 对象的API。本节会介绍它们的基本用法。
首先使用pip安装ogb包:
.. code::
pip install ogb
以下代码显示了如何为 *Graph Property Prediction* 任务加载数据集。
.. code::
# 载入OGB的Graph Property Prediction数据集
import dgl
import torch
from ogb.graphproppred import DglGraphPropPredDataset
from torch.utils.data import DataLoader
def _collate_fn(batch):
# 小批次是一个元组(graph, label)列表
graphs = [e[0] for e in batch]
g = dgl.batch(graphs)
labels = [e[1] for e in batch]
labels = torch.stack(labels, 0)
return g, labels
# 载入数据集
dataset = DglGraphPropPredDataset(name='ogbg-molhiv')
split_idx = dataset.get_idx_split()
# dataloader
train_loader = DataLoader(dataset[split_idx["train"]], batch_size=32, shuffle=True, collate_fn=_collate_fn)
valid_loader = DataLoader(dataset[split_idx["valid"]], batch_size=32, shuffle=False, collate_fn=_collate_fn)
test_loader = DataLoader(dataset[split_idx["test"]], batch_size=32, shuffle=False, collate_fn=_collate_fn)
加载 *Node Property Prediction* 数据集类似,但要注意的是这种数据集只有一个图对象。
.. code::
# 载入OGB的Node Property Prediction数据集
from ogb.nodeproppred import DglNodePropPredDataset
dataset = DglNodePropPredDataset(name='ogbn-proteins')
split_idx = dataset.get_idx_split()
# there is only one graph in Node Property Prediction datasets
# 在Node Property Prediction数据集里只有一个图
g, labels = dataset[0]
# 获取划分的标签
train_label = dataset.labels[split_idx['train']]
valid_label = dataset.labels[split_idx['valid']]
test_label = dataset.labels[split_idx['test']]
每个 *Link Property Prediction* 数据集也只包括一个图。
.. code::
# 载入OGB的Link Property Prediction数据集
from ogb.linkproppred import DglLinkPropPredDataset
dataset = DglLinkPropPredDataset(name='ogbl-ppa')
split_edge = dataset.get_edge_split()
graph = dataset[0]
print(split_edge['train'].keys())
print(split_edge['valid'].keys())
print(split_edge['test'].keys())
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment