Unverified Commit 975eb8fc authored by xiang song(charlie.song)'s avatar xiang song(charlie.song) Committed by GitHub
Browse files

[Distributed] Distributed node embedding and sparse optimizer (#2733)



* Draft for sparse emb

* add some notes

* Fix

* Add sparse optim for dist pytorch

* Update test

* Fix

* upd

* upd

* Fix

* Fix

* Fix bug

* add transductive exmpale

* Fix example

* Some fix

* Upd

* Fix lint

* lint

* lint

* lint

* upd

* Fix lint

* lint

* upd

* remove dead import

* update

* lint

* update unitest

* update example

* Add adam optimizer

* Add unitest and update data

* upd

* upd

* upd

* Fix docstring and fix some bug in example code

* Update rgcn readme
Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-57-25.ec2.internal>
Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-24-210.ec2.internal>
Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-2-66.ec2.internal>
parent 2d372e35
...@@ -25,14 +25,23 @@ Distributed Tensor ...@@ -25,14 +25,23 @@ Distributed Tensor
.. autoclass:: DistTensor .. autoclass:: DistTensor
:members: part_policy, shape, dtype, name :members: part_policy, shape, dtype, name
Distributed Embedding Distributed Node Embedding
--------------------- ---------------------
.. currentmodule:: dgl.distributed.nn.pytorch
.. autoclass:: DistEmbedding .. autoclass:: NodeEmbedding
Distributed embedding optimizer
-------------------------
.. currentmodule:: dgl.distributed.optim.pytorch
.. autoclass:: SparseAdagrad .. autoclass:: SparseAdagrad
:members: step :members: step
.. autoclass:: SparseAdam
:members: step
Distributed workload split Distributed workload split
-------------------------- --------------------------
......
...@@ -9,7 +9,7 @@ This section covers the distributed APIs used in the training script. DGL provid ...@@ -9,7 +9,7 @@ This section covers the distributed APIs used in the training script. DGL provid
data structures and various APIs for initialization, distributed sampling and workload split. data structures and various APIs for initialization, distributed sampling and workload split.
For distributed training/inference, DGL provides three distributed data structures: For distributed training/inference, DGL provides three distributed data structures:
:class:`~dgl.distributed.DistGraph` for distributed graphs, :class:`~dgl.distributed.DistTensor` for :class:`~dgl.distributed.DistGraph` for distributed graphs, :class:`~dgl.distributed.DistTensor` for
distributed tensors and :class:`~dgl.distributed.DistEmbedding` for distributed learnable embeddings. distributed tensors and :class:`~dgl.distributed.nn.NodeEmbedding` for distributed learnable embeddings.
Initialization of the DGL distributed module Initialization of the DGL distributed module
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...@@ -27,7 +27,7 @@ Typically, the initialization APIs should be invoked in the following order: ...@@ -27,7 +27,7 @@ Typically, the initialization APIs should be invoked in the following order:
th.distributed.init_process_group(backend='gloo') th.distributed.init_process_group(backend='gloo')
**Note**: If the training script contains user-defined functions (UDFs) that have to be invoked on **Note**: If the training script contains user-defined functions (UDFs) that have to be invoked on
the servers (see the section of DistTensor and DistEmbedding for more details), these UDFs have to the servers (see the section of DistTensor and NodeEmbedding for more details), these UDFs have to
be declared before :func:`~dgl.distributed.initialize`. be declared before :func:`~dgl.distributed.initialize`.
Distributed graph Distributed graph
...@@ -125,7 +125,7 @@ in the cluster even if the :class:`~dgl.distributed.DistTensor` object disappear ...@@ -125,7 +125,7 @@ in the cluster even if the :class:`~dgl.distributed.DistTensor` object disappear
tensor = dgl.distributed.DistTensor((g.number_of_nodes(), 10), th.float32, name='test') tensor = dgl.distributed.DistTensor((g.number_of_nodes(), 10), th.float32, name='test')
**Note**: :class:`~dgl.distributed.DistTensor` creation is a synchronized operation. All trainers **Note**: :class:`~dgl.distributed.DistTensor` creation is a synchronized operation. All trainers
have to invoke the creation and the creation succeeds only when all trainers call it. have to invoke the creation and the creation succeeds only when all trainers call it.
A user can add a :class:`~dgl.distributed.DistTensor` to a :class:`~dgl.distributed.DistGraph` A user can add a :class:`~dgl.distributed.DistTensor` to a :class:`~dgl.distributed.DistGraph`
object as one of the node data or edge data. object as one of the node data or edge data.
...@@ -153,10 +153,10 @@ computation operators, such as sum and mean. ...@@ -153,10 +153,10 @@ computation operators, such as sum and mean.
when a machine runs multiple servers. This may result in data corruption. One way to avoid concurrent when a machine runs multiple servers. This may result in data corruption. One way to avoid concurrent
writes to the same row of data is to run one server process on a machine. writes to the same row of data is to run one server process on a machine.
Distributed Embedding Distributed NodeEmbedding
~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~
DGL provides :class:`~dgl.distributed.DistEmbedding` to support transductive models that require DGL provides :class:`~dgl.distributed.nn.NodeEmbedding` to support transductive models that require
node embeddings. Creating distributed embeddings is very similar to creating distributed tensors. node embeddings. Creating distributed embeddings is very similar to creating distributed tensors.
.. code:: python .. code:: python
...@@ -165,7 +165,7 @@ node embeddings. Creating distributed embeddings is very similar to creating dis ...@@ -165,7 +165,7 @@ node embeddings. Creating distributed embeddings is very similar to creating dis
arr = th.zeros(shape, dtype=dtype) arr = th.zeros(shape, dtype=dtype)
arr.uniform_(-1, 1) arr.uniform_(-1, 1)
return arr return arr
emb = dgl.distributed.DistEmbedding(g.number_of_nodes(), 10, init_func=initializer) emb = dgl.distributed.nn.NodeEmbedding(g.number_of_nodes(), 10, init_func=initializer)
Internally, distributed embeddings are built on top of distributed tensors, and, thus, has Internally, distributed embeddings are built on top of distributed tensors, and, thus, has
very similar behaviors to distributed tensors. For example, when embeddings are created, they very similar behaviors to distributed tensors. For example, when embeddings are created, they
...@@ -192,7 +192,7 @@ the other for dense model parameters, as shown in the code below: ...@@ -192,7 +192,7 @@ the other for dense model parameters, as shown in the code below:
optimizer.step() optimizer.step()
sparse_optimizer.step() sparse_optimizer.step()
**Note**: :class:`~dgl.distributed.DistEmbedding` is not an Pytorch nn module, so we cannot **Note**: :class:`~dgl.distributed.nn.NodeEmbedding` is not an Pytorch nn module, so we cannot
get access to it from parameters of a Pytorch nn module. get access to it from parameters of a Pytorch nn module.
Distributed sampling Distributed sampling
...@@ -252,7 +252,7 @@ the same as single-process sampling. ...@@ -252,7 +252,7 @@ the same as single-process sampling.
dataloader = dgl.sampling.NodeDataLoader(g, train_nid, sampler, dataloader = dgl.sampling.NodeDataLoader(g, train_nid, sampler,
batch_size=batch_size, shuffle=True) batch_size=batch_size, shuffle=True)
for batch in dataloader: for batch in dataloader:
... ...
Split workloads Split workloads
......
...@@ -16,7 +16,7 @@ For the training script, DGL provides distributed APIs that are similar to the o ...@@ -16,7 +16,7 @@ For the training script, DGL provides distributed APIs that are similar to the o
mini-batch training. This makes distributed training require only small code modifications mini-batch training. This makes distributed training require only small code modifications
from mini-batch training on a single machine. Below shows an example of training GraphSage from mini-batch training on a single machine. Below shows an example of training GraphSage
in a distributed fashion. The only code modifications are located on line 4-7: in a distributed fashion. The only code modifications are located on line 4-7:
1) initialize DGL's distributed module, 2) create a distributed graph object, and 1) initialize DGL's distributed module, 2) create a distributed graph object, and
3) split the training set and calculate the nodes for the local process. 3) split the training set and calculate the nodes for the local process.
The rest of the code, including sampler creation, model definition, training loops The rest of the code, including sampler creation, model definition, training loops
are the same as :ref:`mini-batch training <guide-minibatch>`. are the same as :ref:`mini-batch training <guide-minibatch>`.
...@@ -35,7 +35,7 @@ are the same as :ref:`mini-batch training <guide-minibatch>`. ...@@ -35,7 +35,7 @@ are the same as :ref:`mini-batch training <guide-minibatch>`.
# Create sampler # Create sampler
sampler = NeighborSampler(g, [10,25], sampler = NeighborSampler(g, [10,25],
dgl.distributed.sample_neighbors, dgl.distributed.sample_neighbors,
device) device)
dataloader = DistDataLoader( dataloader = DistDataLoader(
...@@ -85,7 +85,7 @@ Specifically, DGL's distributed training has three types of interacting processe ...@@ -85,7 +85,7 @@ Specifically, DGL's distributed training has three types of interacting processe
generate mini-batches for training. generate mini-batches for training.
* Trainers contain multiple classes to interact with servers. It has * Trainers contain multiple classes to interact with servers. It has
:class:`~dgl.distributed.DistGraph` to get access to partitioned graph data and has :class:`~dgl.distributed.DistGraph` to get access to partitioned graph data and has
:class:`~dgl.distributed.DistEmbedding` and :class:`~dgl.distributed.DistTensor` to access :class:`~dgl.distributed.nn.NodeEmbedding` and :class:`~dgl.distributed.DistTensor` to access
the node/edge features/embeddings. It has the node/edge features/embeddings. It has
:class:`~dgl.distributed.dist_dataloader.DistDataLoader` to :class:`~dgl.distributed.dist_dataloader.DistDataLoader` to
interact with samplers to get mini-batches. interact with samplers to get mini-batches.
......
...@@ -8,7 +8,7 @@ ...@@ -8,7 +8,7 @@
本节介绍了在训练脚本中使用的分布式计算API。DGL提供了三种分布式数据结构和多种API,用于初始化、分布式采样和数据分割。 本节介绍了在训练脚本中使用的分布式计算API。DGL提供了三种分布式数据结构和多种API,用于初始化、分布式采样和数据分割。
对于分布式训练/推断,DGL提供了三种分布式数据结构:用于分布式图的 :class:`~dgl.distributed.DistGraph`、 对于分布式训练/推断,DGL提供了三种分布式数据结构:用于分布式图的 :class:`~dgl.distributed.DistGraph`、
用于分布式张量的 :class:`~dgl.distributed.DistTensor` 和用于分布式可学习嵌入的 用于分布式张量的 :class:`~dgl.distributed.DistTensor` 和用于分布式可学习嵌入的
:class:`~dgl.distributed.DistEmbedding`。 :class:`~dgl.distributed.nn.NodeEmbedding`。
DGL分布式模块的初始化 DGL分布式模块的初始化
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...@@ -24,7 +24,7 @@ DGL分布式模块的初始化 ...@@ -24,7 +24,7 @@ DGL分布式模块的初始化
dgl.distributed.initialize('ip_config.txt') dgl.distributed.initialize('ip_config.txt')
th.distributed.init_process_group(backend='gloo') th.distributed.init_process_group(backend='gloo')
**Note**: 如果训练脚本里包含需要在服务器(细节内容可以在下面的DistTensor和DistEmbedding章节里查看)上调用的用户自定义函数(UDF), **Note**: 如果训练脚本里包含需要在服务器(细节内容可以在下面的DistTensor和NodeEmbedding章节里查看)上调用的用户自定义函数(UDF),
这些UDF必须在 :func:`~dgl.distributed.initialize` 之前被声明。 这些UDF必须在 :func:`~dgl.distributed.initialize` 之前被声明。
分布式图 分布式图
...@@ -138,7 +138,7 @@ DGL为分布式张量提供了类似于单机普通张量的接口,以访问 ...@@ -138,7 +138,7 @@ DGL为分布式张量提供了类似于单机普通张量的接口,以访问
分布式嵌入 分布式嵌入
~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~
DGL提供 :class:`~dgl.distributed.DistEmbedding` 以支持需要节点嵌入的直推(transductive)模型。 DGL提供 :class:`~dgl.distributed.nn.NodeEmbedding` 以支持需要节点嵌入的直推(transductive)模型。
分布式嵌入的创建与分布式张量的创建非常相似。 分布式嵌入的创建与分布式张量的创建非常相似。
.. code:: python .. code:: python
...@@ -147,7 +147,7 @@ DGL提供 :class:`~dgl.distributed.DistEmbedding` 以支持需要节点嵌入的 ...@@ -147,7 +147,7 @@ DGL提供 :class:`~dgl.distributed.DistEmbedding` 以支持需要节点嵌入的
arr = th.zeros(shape, dtype=dtype) arr = th.zeros(shape, dtype=dtype)
arr.uniform_(-1, 1) arr.uniform_(-1, 1)
return arr return arr
emb = dgl.distributed.DistEmbedding(g.number_of_nodes(), 10, init_func=initializer) emb = dgl.distributed.nn.NodeEmbedding(g.number_of_nodes(), 10, init_func=initializer)
在内部,分布式嵌入建立在分布式张量之上,因此,其行为与分布式张量非常相似。 在内部,分布式嵌入建立在分布式张量之上,因此,其行为与分布式张量非常相似。
例如,创建嵌入时,DGL会将它们分片并存储在集群中的所有计算机上。(分布式嵌入)可以通过名称唯一标识。 例如,创建嵌入时,DGL会将它们分片并存储在集群中的所有计算机上。(分布式嵌入)可以通过名称唯一标识。
...@@ -169,7 +169,7 @@ DGL提供了一个稀疏的Adagrad优化器 :class:`~dgl.distributed.SparseAdagr ...@@ -169,7 +169,7 @@ DGL提供了一个稀疏的Adagrad优化器 :class:`~dgl.distributed.SparseAdagr
optimizer.step() optimizer.step()
sparse_optimizer.step() sparse_optimizer.step()
**Note**: :class:`~dgl.distributed.DistEmbedding` 不是PyTorch的nn模块,因此用户无法从nn模块的参数访问它。 **Note**: :class:`~dgl.distributed.nn.NodeEmbedding` 不是PyTorch的nn模块,因此用户无法从nn模块的参数访问它。
分布式采样 分布式采样
~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~
...@@ -228,7 +228,7 @@ DGL提供了两个级别的API,用于对节点和边进行采样以生成小 ...@@ -228,7 +228,7 @@ DGL提供了两个级别的API,用于对节点和边进行采样以生成小
dataloader = dgl.sampling.NodeDataLoader(g, train_nid, sampler, dataloader = dgl.sampling.NodeDataLoader(g, train_nid, sampler,
batch_size=batch_size, shuffle=True) batch_size=batch_size, shuffle=True)
for batch in dataloader: for batch in dataloader:
... ...
分割数据集 分割数据集
......
...@@ -28,7 +28,7 @@ DGL采用完全分布式的方法,可将数据和计算同时分布在一组 ...@@ -28,7 +28,7 @@ DGL采用完全分布式的方法,可将数据和计算同时分布在一组
# 创建采样器 # 创建采样器
sampler = NeighborSampler(g, [10,25], sampler = NeighborSampler(g, [10,25],
dgl.distributed.sample_neighbors, dgl.distributed.sample_neighbors,
device) device)
dataloader = DistDataLoader( dataloader = DistDataLoader(
...@@ -74,7 +74,7 @@ DGL实现了一些分布式组件以支持分布式训练,下图显示了这 ...@@ -74,7 +74,7 @@ DGL实现了一些分布式组件以支持分布式训练,下图显示了这
这些服务器一起工作以将图数据提供给训练器。请注意,一台机器可能同时运行多个服务器进程,以并行化计算和网络通信。 这些服务器一起工作以将图数据提供给训练器。请注意,一台机器可能同时运行多个服务器进程,以并行化计算和网络通信。
* *采样器进程* 与服务器进行交互,并对节点和边采样以生成用于训练的小批次数据。 * *采样器进程* 与服务器进行交互,并对节点和边采样以生成用于训练的小批次数据。
* *训练器进程* 包含多个与服务器交互的类。它用 :class:`~dgl.distributed.DistGraph` 来获取被划分的图分区数据, * *训练器进程* 包含多个与服务器交互的类。它用 :class:`~dgl.distributed.DistGraph` 来获取被划分的图分区数据,
:class:`~dgl.distributed.DistEmbedding` :class:`~dgl.distributed.nn.NodeEmbedding`
:class:`~dgl.distributed.DistTensor` 来获取节点/边特征/嵌入,用 :class:`~dgl.distributed.DistTensor` 来获取节点/边特征/嵌入,用
:class:`~dgl.distributed.dist_dataloader.DistDataLoader` 与采样器进行交互以获得小批次数据。 :class:`~dgl.distributed.dist_dataloader.DistDataLoader` 与采样器进行交互以获得小批次数据。
......
...@@ -118,7 +118,7 @@ The command below launches one training process on each machine and each trainin ...@@ -118,7 +118,7 @@ The command below launches one training process on each machine and each trainin
python3 ~/workspace/dgl/tools/launch.py \ python3 ~/workspace/dgl/tools/launch.py \
--workspace ~/workspace/dgl/examples/pytorch/graphsage/experimental/ \ --workspace ~/workspace/dgl/examples/pytorch/graphsage/experimental/ \
--num_trainers 1 \ --num_trainers 1 \
--num_samplers 4 \ --num_samplers 0 \
--num_servers 1 \ --num_servers 1 \
--part_config data/ogb-product.json \ --part_config data/ogb-product.json \
--ip_config ip_config.txt \ --ip_config ip_config.txt \
...@@ -131,7 +131,7 @@ To run unsupervised training: ...@@ -131,7 +131,7 @@ To run unsupervised training:
python3 ~/workspace/dgl/tools/launch.py \ python3 ~/workspace/dgl/tools/launch.py \
--workspace ~/workspace/dgl/examples/pytorch/graphsage/experimental/ \ --workspace ~/workspace/dgl/examples/pytorch/graphsage/experimental/ \
--num_trainers 1 \ --num_trainers 1 \
--num_samplers 4 \ --num_samplers 0 \
--num_servers 1 \ --num_servers 1 \
--part_config data/ogb-product.json \ --part_config data/ogb-product.json \
--ip_config ip_config.txt \ --ip_config ip_config.txt \
...@@ -144,13 +144,59 @@ By default, this code will run on CPU. If you have GPU support, you can just add ...@@ -144,13 +144,59 @@ By default, this code will run on CPU. If you have GPU support, you can just add
python3 ~/workspace/dgl/tools/launch.py \ python3 ~/workspace/dgl/tools/launch.py \
--workspace ~/workspace/dgl/examples/pytorch/graphsage/experimental/ \ --workspace ~/workspace/dgl/examples/pytorch/graphsage/experimental/ \
--num_trainers 4 \ --num_trainers 4 \
--num_samplers 4 \ --num_samplers 0 \
--num_servers 1 \ --num_servers 1 \
--part_config data/ogb-product.json \ --part_config data/ogb-product.json \
--ip_config ip_config.txt \ --ip_config ip_config.txt \
"python3 train_dist.py --graph_name ogb-product --ip_config ip_config.txt --num_epochs 30 --batch_size 1000 --num_gpus 4" "python3 train_dist.py --graph_name ogb-product --ip_config ip_config.txt --num_epochs 30 --batch_size 1000 --num_gpus 4"
``` ```
To run supervised with transductive setting (nodes are initialized with node embedding)
```bash
python3 ~/workspace/dgl/tools/launch.py --workspace ~/workspace/dgl/examples/pytorch/graphsage/experimental/ \
--num_trainers 4 \
--num_samplers 4 \
--num_servers 1 \
--num_samplers 0 \
--part_config data/ogb-product.json \
--ip_config ip_config.txt \
"python3 train_dist_transductive.py --graph_name ogb-product --ip_config ip_config.txt --batch_size 1000 --num_gpu 4 --eval_every 5"
```
To run supervised with transductive setting using dgl distributed NodeEmbedding
```bash
python3 ~/workspace/dgl/tools/launch.py --workspace ~/workspace/dgl/examples/pytorch/graphsage/experimental/ \
--num_trainers 4 \
--num_samplers 4 \
--num_servers 1 \
--num_samplers 0 \
--part_config data/ogb-product.json \
--ip_config ip_config.txt \
"python3 train_dist_transductive.py --graph_name ogb-product --ip_config ip_config.txt --batch_size 1000 --num_gpu 4 --eval_every 5 --dgl_sparse"
```
To run unsupervised with transductive setting (nodes are initialized with node embedding)
```bash
python3 ~/workspace/dgl/tools/launch.py --workspace ~/workspace/dgl/examples/pytorch/graphsage/experimental/ \
--num_trainers 4 \
--num_samplers 0 \
--num_servers 1 \
--part_config data/ogb-product.json \
--ip_config ip_config.txt \
"python3 train_dist_unsupervised_transductive.py --graph_name ogb-product --ip_config ip_config.txt --num_epochs 3 --batch_size 1000 --num_gpus 4"
```
To run unsupervised with transductive setting using dgl distributed NodeEmbedding
```bash
python3 ~/workspace/dgl/tools/launch.py --workspace ~/workspace/dgl/examples/pytorch/graphsage/experimental/ \
--num_trainers 4 \
--num_samplers 0 \
--num_servers 1 \
--part_config data/ogb-product.json \
--ip_config ip_config.txt \
"python3 train_dist_unsupervised_transductive.py --graph_name ogb-product --ip_config ip_config.txt --num_epochs 3 --batch_size 1000 --num_gpus 4 --dgl_sparse"
```
**Note:** if you are using conda or other virtual environments on the remote machines, you need to replace `python3` in the command string (i.e. the last argument) with the path to the Python interpreter in that environment. **Note:** if you are using conda or other virtual environments on the remote machines, you need to replace `python3` in the command string (i.e. the last argument) with the path to the Python interpreter in that environment.
## Distributed code runs in the standalone mode ## Distributed code runs in the standalone mode
......
172.31.19.1 172.31.2.66
172.31.23.205 172.31.1.191
172.31.29.175
172.31.16.98
\ No newline at end of file
...@@ -21,20 +21,21 @@ import torch.optim as optim ...@@ -21,20 +21,21 @@ import torch.optim as optim
import torch.multiprocessing as mp import torch.multiprocessing as mp
from torch.utils.data import DataLoader from torch.utils.data import DataLoader
def load_subtensor(g, seeds, input_nodes, device): def load_subtensor(g, seeds, input_nodes, device, load_feat=True):
""" """
Copys features and labels of a set of nodes onto GPU. Copys features and labels of a set of nodes onto GPU.
""" """
batch_inputs = g.ndata['features'][input_nodes].to(device) batch_inputs = g.ndata['features'][input_nodes].to(device) if load_feat else None
batch_labels = g.ndata['labels'][seeds].to(device) batch_labels = g.ndata['labels'][seeds].to(device)
return batch_inputs, batch_labels return batch_inputs, batch_labels
class NeighborSampler(object): class NeighborSampler(object):
def __init__(self, g, fanouts, sample_neighbors, device): def __init__(self, g, fanouts, sample_neighbors, device, load_feat=True):
self.g = g self.g = g
self.fanouts = fanouts self.fanouts = fanouts
self.sample_neighbors = sample_neighbors self.sample_neighbors = sample_neighbors
self.device = device self.device = device
self.load_feat=load_feat
def sample_blocks(self, seeds): def sample_blocks(self, seeds):
seeds = th.LongTensor(np.asarray(seeds)) seeds = th.LongTensor(np.asarray(seeds))
...@@ -51,8 +52,9 @@ class NeighborSampler(object): ...@@ -51,8 +52,9 @@ class NeighborSampler(object):
input_nodes = blocks[0].srcdata[dgl.NID] input_nodes = blocks[0].srcdata[dgl.NID]
seeds = blocks[-1].dstdata[dgl.NID] seeds = blocks[-1].dstdata[dgl.NID]
batch_inputs, batch_labels = load_subtensor(self.g, seeds, input_nodes, "cpu") batch_inputs, batch_labels = load_subtensor(self.g, seeds, input_nodes, "cpu", self.load_feat)
blocks[0].srcdata['features'] = batch_inputs if self.load_feat:
blocks[0].srcdata['features'] = batch_inputs
blocks[-1].dstdata['labels'] = batch_labels blocks[-1].dstdata['labels'] = batch_labels
return blocks return blocks
...@@ -289,7 +291,7 @@ if __name__ == '__main__': ...@@ -289,7 +291,7 @@ if __name__ == '__main__':
parser.add_argument('--part_config', type=str, help='The path to the partition config file') parser.add_argument('--part_config', type=str, help='The path to the partition config file')
parser.add_argument('--num_clients', type=int, help='The number of clients') parser.add_argument('--num_clients', type=int, help='The number of clients')
parser.add_argument('--n_classes', type=int, help='the number of classes') parser.add_argument('--n_classes', type=int, help='the number of classes')
parser.add_argument('--num_gpus', type=int, default=-1, parser.add_argument('--num_gpus', type=int, default=-1,
help="the number of GPU device. Use -1 for CPU training") help="the number of GPU device. Use -1 for CPU training")
parser.add_argument('--num_epochs', type=int, default=20) parser.add_argument('--num_epochs', type=int, default=20)
parser.add_argument('--num_hidden', type=int, default=16) parser.add_argument('--num_hidden', type=int, default=16)
......
import os
os.environ['DGLBACKEND']='pytorch'
from multiprocessing import Process
import argparse, time, math
import numpy as np
from functools import wraps
import tqdm
import dgl
from dgl import DGLGraph
from dgl.data import register_data_args, load_data
from dgl.data.utils import load_graphs
import dgl.function as fn
import dgl.nn.pytorch as dglnn
from dgl.distributed import DistDataLoader
from dgl.distributed.nn import NodeEmbedding
import torch as th
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.multiprocessing as mp
from torch.utils.data import DataLoader
from train_dist import DistSAGE, NeighborSampler, compute_acc
class TransDistSAGE(DistSAGE):
def __init__(self, in_feats, n_hidden, n_classes, n_layers,
activation, dropout):
super(TransDistSAGE, self).__init__(in_feats, n_hidden, n_classes, n_layers, activation, dropout)
def inference(self, standalone, g, x, batch_size, device):
"""
Inference with the GraphSAGE model on full neighbors (i.e. without neighbor sampling).
g : the entire graph.
x : the input of entire node set.
The inference code is written in a fashion that it could handle any number of nodes and
layers.
"""
# During inference with sampling, multi-layer blocks are very inefficient because
# lots of computations in the first few layers are repeated.
# Therefore, we compute the representation of all nodes layer by layer. The nodes
# on each layer are of course splitted in batches.
# TODO: can we standardize this?
nodes = dgl.distributed.node_split(np.arange(g.number_of_nodes()),
g.get_partition_book(), force_even=True)
y = dgl.distributed.DistTensor((g.number_of_nodes(), self.n_hidden), th.float32, 'h',
persistent=True)
for l, layer in enumerate(self.layers):
if l == len(self.layers) - 1:
y = dgl.distributed.DistTensor((g.number_of_nodes(), self.n_classes),
th.float32, 'h_last', persistent=True)
sampler = NeighborSampler(g, [-1], dgl.distributed.sample_neighbors, device, load_feat=False)
print('|V|={}, eval batch size: {}'.format(g.number_of_nodes(), batch_size))
# Create PyTorch DataLoader for constructing blocks
dataloader = DistDataLoader(
dataset=nodes,
batch_size=batch_size,
collate_fn=sampler.sample_blocks,
shuffle=False,
drop_last=False)
for blocks in tqdm.tqdm(dataloader):
block = blocks[0].to(device)
input_nodes = block.srcdata[dgl.NID]
output_nodes = block.dstdata[dgl.NID]
h = x[input_nodes].to(device)
h_dst = h[:block.number_of_dst_nodes()]
h = layer(block, (h, h_dst))
if l != len(self.layers) - 1:
h = self.activation(h)
h = self.dropout(h)
y[output_nodes] = h.cpu()
x = y
g.barrier()
return y
def initializer(shape, dtype):
arr = th.zeros(shape, dtype=dtype)
arr.uniform_(-1, 1)
return arr
class DistEmb(nn.Module):
def __init__(self, num_nodes, emb_size, dgl_sparse_emb=False, dev_id='cpu'):
super().__init__()
self.dev_id = dev_id
self.emb_size = emb_size
self.dgl_sparse_emb = dgl_sparse_emb
if dgl_sparse_emb:
self.sparse_emb = NodeEmbedding(num_nodes, emb_size, name='sage', init_func=initializer)
else:
self.sparse_emb = th.nn.Embedding(num_nodes, emb_size, sparse=True)
nn.init.uniform_(self.sparse_emb.weight, -1.0, 1.0)
def forward(self, idx):
# embeddings are stored in cpu
idx = idx.cpu()
if self.dgl_sparse_emb:
return self.sparse_emb(idx, device=self.dev_id)
else:
return self.sparse_emb(idx).to(self.dev_id)
def load_embs(standalone, emb_layer, g):
nodes = dgl.distributed.node_split(np.arange(g.number_of_nodes()),
g.get_partition_book(), force_even=True)
x = dgl.distributed.DistTensor(
(g.number_of_nodes(),
emb_layer.module.emb_size \
if isinstance(emb_layer, th.nn.parallel.DistributedDataParallel) \
else emb_layer.emb_size),
th.float32, 'eval_embs',
persistent=True)
num_nodes = nodes.shape[0]
for i in range((num_nodes + 1023) // 1024):
idx = nodes[i * 1024: (i+1) * 1024 \
if (i+1) * 1024 < num_nodes \
else num_nodes]
embeds = emb_layer(idx).cpu()
x[idx] = embeds
if not standalone:
g.barrier()
return x
def evaluate(standalone, model, emb_layer, g, labels, val_nid, test_nid, batch_size, device):
"""
Evaluate the model on the validation set specified by ``val_nid``.
g : The entire graph.
inputs : The features of all the nodes.
labels : The labels of all the nodes.
val_nid : the node Ids for validation.
batch_size : Number of nodes to compute at the same time.
device : The GPU device to evaluate on.
"""
model.eval()
emb_layer.eval()
with th.no_grad():
inputs = load_embs(standalone, emb_layer, g)
pred = model.inference(standalone, g, inputs, batch_size, device)
model.train()
emb_layer.train()
return compute_acc(pred[val_nid], labels[val_nid]), compute_acc(pred[test_nid], labels[test_nid])
def run(args, device, data):
# Unpack data
train_nid, val_nid, test_nid, n_classes, g = data
# Create sampler
sampler = NeighborSampler(g, [int(fanout) for fanout in args.fan_out.split(',')],
dgl.distributed.sample_neighbors, device, load_feat=False)
# Create DataLoader for constructing blocks
dataloader = DistDataLoader(
dataset=train_nid.numpy(),
batch_size=args.batch_size,
collate_fn=sampler.sample_blocks,
shuffle=True,
drop_last=False)
# Define model and optimizer
emb_layer = DistEmb(g.num_nodes(), args.num_hidden, dgl_sparse_emb=args.dgl_sparse, dev_id=device)
model = TransDistSAGE(args.num_hidden, args.num_hidden, n_classes, args.num_layers, F.relu, args.dropout)
model = model.to(device)
if not args.standalone:
if args.num_gpus == -1:
model = th.nn.parallel.DistributedDataParallel(model)
else:
dev_id = g.rank() % args.num_gpus
model = th.nn.parallel.DistributedDataParallel(model, device_ids=[dev_id], output_device=dev_id)
if not args.dgl_sparse:
emb_layer = th.nn.parallel.DistributedDataParallel(emb_layer)
loss_fcn = nn.CrossEntropyLoss()
loss_fcn = loss_fcn.to(device)
optimizer = optim.Adam(model.parameters(), lr=args.lr)
if args.dgl_sparse:
emb_optimizer = dgl.distributed.optim.SparseAdam([emb_layer.sparse_emb], lr=args.sparse_lr)
print('optimize DGL sparse embedding:', emb_layer.sparse_emb)
elif args.standalone:
emb_optimizer = th.optim.SparseAdam(list(emb_layer.sparse_emb.parameters()), lr=args.sparse_lr)
print('optimize Pytorch sparse embedding:', emb_layer.sparse_emb)
else:
emb_optimizer = th.optim.SparseAdam(list(emb_layer.module.sparse_emb.parameters()), lr=args.sparse_lr)
print('optimize Pytorch sparse embedding:', emb_layer.module.sparse_emb)
train_size = th.sum(g.ndata['train_mask'][0:g.number_of_nodes()])
# Training loop
iter_tput = []
epoch = 0
for epoch in range(args.num_epochs):
tic = time.time()
sample_time = 0
forward_time = 0
backward_time = 0
update_time = 0
num_seeds = 0
num_inputs = 0
start = time.time()
# Loop over the dataloader to sample the computation dependency graph as a list of
# blocks.
step_time = []
for step, blocks in enumerate(dataloader):
tic_step = time.time()
sample_time += tic_step - start
# The nodes for input lies at the LHS side of the first block.
# The nodes for output lies at the RHS side of the last block.
batch_inputs = blocks[0].srcdata[dgl.NID]
batch_labels = blocks[-1].dstdata['labels']
batch_labels = batch_labels.long()
num_seeds += len(blocks[-1].dstdata[dgl.NID])
num_inputs += len(blocks[0].srcdata[dgl.NID])
blocks = [block.to(device) for block in blocks]
batch_labels = batch_labels.to(device)
# Compute loss and prediction
start = time.time()
batch_inputs = emb_layer(batch_inputs)
batch_pred = model(blocks, batch_inputs)
loss = loss_fcn(batch_pred, batch_labels)
forward_end = time.time()
emb_optimizer.zero_grad()
optimizer.zero_grad()
loss.backward()
compute_end = time.time()
forward_time += forward_end - start
backward_time += compute_end - forward_end
emb_optimizer.step()
optimizer.step()
update_time += time.time() - compute_end
step_t = time.time() - tic_step
step_time.append(step_t)
iter_tput.append(len(blocks[-1].dstdata[dgl.NID]) / step_t)
if step % args.log_every == 0:
acc = compute_acc(batch_pred, batch_labels)
gpu_mem_alloc = th.cuda.max_memory_allocated() / 1000000 if th.cuda.is_available() else 0
print('Part {} | Epoch {:05d} | Step {:05d} | Loss {:.4f} | Train Acc {:.4f} | Speed (samples/sec) {:.4f} | GPU {:.1f} MB | time {:.3f} s'.format(
g.rank(), epoch, step, loss.item(), acc.item(), np.mean(iter_tput[3:]), gpu_mem_alloc, np.sum(step_time[-args.log_every:])))
start = time.time()
toc = time.time()
print('Part {}, Epoch Time(s): {:.4f}, sample+data_copy: {:.4f}, forward: {:.4f}, backward: {:.4f}, update: {:.4f}, #seeds: {}, #inputs: {}'.format(
g.rank(), toc - tic, sample_time, forward_time, backward_time, update_time, num_seeds, num_inputs))
epoch += 1
if epoch % args.eval_every == 0 and epoch != 0:
start = time.time()
val_acc, test_acc = evaluate(args.standalone, model.module, emb_layer, g,
g.ndata['labels'], val_nid, test_nid, args.batch_size_eval, device)
print('Part {}, Val Acc {:.4f}, Test Acc {:.4f}, time: {:.4f}'.format(g.rank(), val_acc, test_acc, time.time()-start))
def main(args):
dgl.distributed.initialize(args.ip_config)
if not args.standalone:
th.distributed.init_process_group(backend='gloo')
g = dgl.distributed.DistGraph(args.graph_name, part_config=args.part_config)
print('rank:', g.rank())
pb = g.get_partition_book()
train_nid = dgl.distributed.node_split(g.ndata['train_mask'], pb, force_even=True)
val_nid = dgl.distributed.node_split(g.ndata['val_mask'], pb, force_even=True)
test_nid = dgl.distributed.node_split(g.ndata['test_mask'], pb, force_even=True)
local_nid = pb.partid2nids(pb.partid).detach().numpy()
print('part {}, train: {} (local: {}), val: {} (local: {}), test: {} (local: {})'.format(
g.rank(), len(train_nid), len(np.intersect1d(train_nid.numpy(), local_nid)),
len(val_nid), len(np.intersect1d(val_nid.numpy(), local_nid)),
len(test_nid), len(np.intersect1d(test_nid.numpy(), local_nid))))
if args.num_gpus == -1:
device = th.device('cpu')
else:
device = th.device('cuda:'+str(g.rank() % args.num_gpus))
labels = g.ndata['labels'][np.arange(g.number_of_nodes())]
n_classes = len(th.unique(labels[th.logical_not(th.isnan(labels))]))
print('#labels:', n_classes)
# Pack data
data = train_nid, val_nid, test_nid, n_classes, g
run(args, device, data)
print("parent ends")
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='GCN')
register_data_args(parser)
parser.add_argument('--graph_name', type=str, help='graph name')
parser.add_argument('--id', type=int, help='the partition id')
parser.add_argument('--ip_config', type=str, help='The file for IP configuration')
parser.add_argument('--part_config', type=str, help='The path to the partition config file')
parser.add_argument('--num_clients', type=int, help='The number of clients')
parser.add_argument('--n_classes', type=int, help='the number of classes')
parser.add_argument('--num_gpus', type=int, default=-1,
help="the number of GPU device. Use -1 for CPU training")
parser.add_argument('--num_epochs', type=int, default=20)
parser.add_argument('--num_hidden', type=int, default=16)
parser.add_argument('--num_layers', type=int, default=2)
parser.add_argument('--fan_out', type=str, default='10,25')
parser.add_argument('--batch_size', type=int, default=1000)
parser.add_argument('--batch_size_eval', type=int, default=100000)
parser.add_argument('--log_every', type=int, default=20)
parser.add_argument('--eval_every', type=int, default=5)
parser.add_argument('--lr', type=float, default=0.003)
parser.add_argument('--dropout', type=float, default=0.5)
parser.add_argument('--local_rank', type=int, help='get rank of the process')
parser.add_argument('--standalone', action='store_true', help='run in the standalone mode')
parser.add_argument("--dgl_sparse", action='store_true',
help='Whether to use DGL sparse embedding')
parser.add_argument("--sparse_lr", type=float, default=1e-2,
help="sparse lr rate")
args = parser.parse_args()
print(args)
main(args)
...@@ -68,19 +68,18 @@ class SAGE(nn.Module): ...@@ -68,19 +68,18 @@ class SAGE(nn.Module):
for l, layer in enumerate(self.layers): for l, layer in enumerate(self.layers):
y = th.zeros(g.number_of_nodes(), self.n_hidden if l != len(self.layers) - 1 else self.n_classes) y = th.zeros(g.number_of_nodes(), self.n_hidden if l != len(self.layers) - 1 else self.n_classes)
sampler = dgl.sampling.MultiLayerNeighborSampler([None]) sampler = dgl.dataloading.MultiLayerNeighborSampler([None])
dataloader = dgl.sampling.NodeDataLoader( dataloader = dgl.dataloading.NodeDataLoader(
g, g,
th.arange(g.number_of_nodes()), th.arange(g.number_of_nodes()),
sampler, sampler,
batch_size=args.batch_size, batch_size=batch_size,
shuffle=True, shuffle=True,
drop_last=False, drop_last=False,
num_workers=args.num_workers) num_workers=0)
for input_nodes, output_nodes, blocks in tqdm.tqdm(dataloader): for input_nodes, output_nodes, blocks in tqdm.tqdm(dataloader):
block = blocks[0] block = blocks[0]
block = block.int().to(device) block = block.int().to(device)
h = x[input_nodes].to(device) h = x[input_nodes].to(device)
h = layer(block, h) h = layer(block, h)
...@@ -93,7 +92,6 @@ class SAGE(nn.Module): ...@@ -93,7 +92,6 @@ class SAGE(nn.Module):
x = y x = y
return y return y
class NegativeSampler(object): class NegativeSampler(object):
def __init__(self, g, neg_nseeds): def __init__(self, g, neg_nseeds):
self.neg_nseeds = neg_nseeds self.neg_nseeds = neg_nseeds
...@@ -270,7 +268,7 @@ def generate_emb(model, g, inputs, batch_size, device): ...@@ -270,7 +268,7 @@ def generate_emb(model, g, inputs, batch_size, device):
def compute_acc(emb, labels, train_nids, val_nids, test_nids): def compute_acc(emb, labels, train_nids, val_nids, test_nids):
""" """
Compute the accuracy of prediction given the labels. Compute the accuracy of prediction given the labels.
We will fist train a LogisticRegression model using the trained embeddings, We will fist train a LogisticRegression model using the trained embeddings,
the training set, validation set and test set is provided as the arguments. the training set, validation set and test set is provided as the arguments.
...@@ -459,7 +457,7 @@ if __name__ == '__main__': ...@@ -459,7 +457,7 @@ if __name__ == '__main__':
parser.add_argument('--ip_config', type=str, help='The file for IP configuration') parser.add_argument('--ip_config', type=str, help='The file for IP configuration')
parser.add_argument('--part_config', type=str, help='The path to the partition config file') parser.add_argument('--part_config', type=str, help='The path to the partition config file')
parser.add_argument('--n_classes', type=int, help='the number of classes') parser.add_argument('--n_classes', type=int, help='the number of classes')
parser.add_argument('--num_gpus', type=int, default=-1, parser.add_argument('--num_gpus', type=int, default=-1,
help="the number of GPU device. Use -1 for CPU training") help="the number of GPU device. Use -1 for CPU training")
parser.add_argument('--num_epochs', type=int, default=20) parser.add_argument('--num_epochs', type=int, default=20)
parser.add_argument('--num_hidden', type=int, default=16) parser.add_argument('--num_hidden', type=int, default=16)
...@@ -479,6 +477,5 @@ if __name__ == '__main__': ...@@ -479,6 +477,5 @@ if __name__ == '__main__':
parser.add_argument('--remove_edge', default=False, action='store_true', parser.add_argument('--remove_edge', default=False, action='store_true',
help="whether to remove edges during sampling") help="whether to remove edges during sampling")
args = parser.parse_args() args = parser.parse_args()
print(args) print(args)
main(args) main(args)
import os
os.environ['DGLBACKEND']='pytorch'
from multiprocessing import Process
import argparse, time, math
import numpy as np
from functools import wraps
import tqdm
import sklearn.linear_model as lm
import sklearn.metrics as skm
import dgl
from dgl import DGLGraph
from dgl.data import register_data_args, load_data
from dgl.data.utils import load_graphs
import dgl.function as fn
import dgl.nn.pytorch as dglnn
import torch as th
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.multiprocessing as mp
from dgl.distributed import DistDataLoader
from dgl.distributed.optim import SparseAdagrad
from train_dist_unsupervised import SAGE, NeighborSampler, PosNeighborSampler, CrossEntropyLoss, compute_acc
from train_dist_transductive import DistEmb, load_embs
def generate_emb(standalone, model, emb_layer, g, batch_size, device):
"""
Generate embeddings for each node
emb_layer : Embedding layer
g : The entire graph.
inputs : The features of all the nodes.
batch_size : Number of nodes to compute at the same time.
device : The GPU device to evaluate on.
"""
model.eval()
emb_layer.eval()
with th.no_grad():
inputs = load_embs(standalone, emb_layer, g)
pred = model.inference(g, inputs, batch_size, device)
g.barrier()
return pred
def run(args, device, data):
# Unpack data
train_eids, train_nids, g, global_train_nid, global_valid_nid, global_test_nid, labels = data
# Create sampler
sampler = NeighborSampler(g, [int(fanout) for fanout in args.fan_out.split(',')], train_nids,
dgl.distributed.sample_neighbors, args.num_negs, args.remove_edge)
# Create PyTorch DataLoader for constructing blocks
dataloader = dgl.distributed.DistDataLoader(
dataset=train_eids.numpy(),
batch_size=args.batch_size,
collate_fn=sampler.sample_blocks,
shuffle=True,
drop_last=False)
# Define model and optimizer
emb_layer = DistEmb(g.num_nodes(), args.num_hidden, dgl_sparse_emb=args.dgl_sparse, dev_id=device)
model = SAGE(args.num_hidden, args.num_hidden, args.num_hidden, args.num_layers, F.relu, args.dropout)
model = model.to(device)
if not args.standalone:
if args.num_gpus == -1:
model = th.nn.parallel.DistributedDataParallel(model)
else:
dev_id = g.rank() % args.num_gpus
model = th.nn.parallel.DistributedDataParallel(model, device_ids=[dev_id], output_device=dev_id)
if not args.dgl_sparse:
emb_layer = th.nn.parallel.DistributedDataParallel(emb_layer)
loss_fcn = CrossEntropyLoss()
loss_fcn = loss_fcn.to(device)
optimizer = optim.Adam(model.parameters(), lr=args.lr)
if args.dgl_sparse:
emb_optimizer = dgl.distributed.optim.SparseAdam([emb_layer.sparse_emb], lr=args.sparse_lr)
print('optimize DGL sparse embedding:', emb_layer.sparse_emb)
elif args.standalone:
emb_optimizer = th.optim.SparseAdam(list(emb_layer.sparse_emb.parameters()), lr=args.sparse_lr)
print('optimize Pytorch sparse embedding:', emb_layer.sparse_emb)
else:
emb_optimizer = th.optim.SparseAdam(list(emb_layer.module.sparse_emb.parameters()), lr=args.sparse_lr)
print('optimize Pytorch sparse embedding:', emb_layer.module.sparse_emb)
# Training loop
epoch = 0
for epoch in range(args.num_epochs):
sample_time = 0
copy_time = 0
forward_time = 0
backward_time = 0
update_time = 0
num_seeds = 0
num_inputs = 0
step_time = []
iter_t = []
sample_t = []
feat_copy_t = []
forward_t = []
backward_t = []
update_t = []
iter_tput = []
start = time.time()
# Loop over the dataloader to sample the computation dependency graph as a list of
# blocks.
for step, (pos_graph, neg_graph, blocks) in enumerate(dataloader):
tic_step = time.time()
sample_t.append(tic_step - start)
pos_graph = pos_graph.to(device)
neg_graph = neg_graph.to(device)
blocks = [block.to(device) for block in blocks]
# The nodes for input lies at the LHS side of the first block.
# The nodes for output lies at the RHS side of the last block.
# Load the input features as well as output labels
batch_inputs = blocks[0].srcdata[dgl.NID]
copy_time = time.time()
feat_copy_t.append(copy_time - tic_step)
# Compute loss and prediction
batch_inputs = emb_layer(batch_inputs)
batch_pred = model(blocks, batch_inputs)
loss = loss_fcn(batch_pred, pos_graph, neg_graph)
forward_end = time.time()
emb_optimizer.zero_grad()
optimizer.zero_grad()
loss.backward()
compute_end = time.time()
forward_t.append(forward_end - copy_time)
backward_t.append(compute_end - forward_end)
# Aggregate gradients in multiple nodes.
emb_optimizer.step()
optimizer.step()
update_t.append(time.time() - compute_end)
pos_edges = pos_graph.number_of_edges()
neg_edges = neg_graph.number_of_edges()
step_t = time.time() - start
step_time.append(step_t)
iter_tput.append(pos_edges / step_t)
num_seeds += pos_edges
if step % args.log_every == 0:
print('[{}] Epoch {:05d} | Step {:05d} | Loss {:.4f} | Speed (samples/sec) {:.4f} | time {:.3f} s' \
'| sample {:.3f} | copy {:.3f} | forward {:.3f} | backward {:.3f} | update {:.3f}'.format(
g.rank(), epoch, step, loss.item(), np.mean(iter_tput[3:]), np.sum(step_time[-args.log_every:]),
np.sum(sample_t[-args.log_every:]), np.sum(feat_copy_t[-args.log_every:]), np.sum(forward_t[-args.log_every:]),
np.sum(backward_t[-args.log_every:]), np.sum(update_t[-args.log_every:])))
start = time.time()
print('[{}]Epoch Time(s): {:.4f}, sample: {:.4f}, data copy: {:.4f}, forward: {:.4f}, backward: {:.4f}, update: {:.4f}, #seeds: {}, #inputs: {}'.format(
g.rank(), np.sum(step_time), np.sum(sample_t), np.sum(feat_copy_t), np.sum(forward_t), np.sum(backward_t), np.sum(update_t), num_seeds, num_inputs))
epoch += 1
# evaluate the embedding using LogisticRegression
if args.standalone:
pred = generate_emb(True, model, emb_layer, g, args.batch_size_eval, device)
else:
pred = generate_emb(False, model.module, emb_layer, g, args.batch_size_eval, device)
if g.rank() == 0:
eval_acc, test_acc = compute_acc(pred, labels, global_train_nid, global_valid_nid, global_test_nid)
print('eval acc {:.4f}; test acc {:.4f}'.format(eval_acc, test_acc))
# sync for eval and test
if not args.standalone:
th.distributed.barrier()
if not args.standalone:
g._client.barrier()
# save features into file
if g.rank() == 0:
th.save(pred, 'emb.pt')
else:
feat = g.ndata['features']
th.save(pred, 'emb.pt')
def main(args):
dgl.distributed.initialize(args.ip_config)
if not args.standalone:
th.distributed.init_process_group(backend='gloo')
g = dgl.distributed.DistGraph(args.graph_name, part_config=args.part_config)
print('rank:', g.rank())
print('number of edges', g.number_of_edges())
train_eids = dgl.distributed.edge_split(th.ones((g.number_of_edges(),), dtype=th.bool), g.get_partition_book(), force_even=True)
train_nids = dgl.distributed.node_split(th.ones((g.number_of_nodes(),), dtype=th.bool), g.get_partition_book())
global_train_nid = th.LongTensor(np.nonzero(g.ndata['train_mask'][np.arange(g.number_of_nodes())]))
global_valid_nid = th.LongTensor(np.nonzero(g.ndata['val_mask'][np.arange(g.number_of_nodes())]))
global_test_nid = th.LongTensor(np.nonzero(g.ndata['test_mask'][np.arange(g.number_of_nodes())]))
labels = g.ndata['labels'][np.arange(g.number_of_nodes())]
if args.num_gpus == -1:
device = th.device('cpu')
else:
device = th.device('cuda:'+str(g.rank() % args.num_gpus))
# Pack data
global_train_nid = global_train_nid.squeeze()
global_valid_nid = global_valid_nid.squeeze()
global_test_nid = global_test_nid.squeeze()
print("number of train {}".format(global_train_nid.shape[0]))
print("number of valid {}".format(global_valid_nid.shape[0]))
print("number of test {}".format(global_test_nid.shape[0]))
data = train_eids, train_nids, g, global_train_nid, global_valid_nid, global_test_nid, labels
run(args, device, data)
print("parent ends")
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='GCN')
register_data_args(parser)
parser.add_argument('--graph_name', type=str, help='graph name')
parser.add_argument('--id', type=int, help='the partition id')
parser.add_argument('--ip_config', type=str, help='The file for IP configuration')
parser.add_argument('--part_config', type=str, help='The path to the partition config file')
parser.add_argument('--n_classes', type=int, help='the number of classes')
parser.add_argument('--num_gpus', type=int, default=-1,
help="the number of GPU device. Use -1 for CPU training")
parser.add_argument('--num_epochs', type=int, default=5)
parser.add_argument('--num_hidden', type=int, default=16)
parser.add_argument('--num-layers', type=int, default=2)
parser.add_argument('--fan_out', type=str, default='10,25')
parser.add_argument('--batch_size', type=int, default=1000)
parser.add_argument('--batch_size_eval', type=int, default=100000)
parser.add_argument('--log_every', type=int, default=20)
parser.add_argument('--eval_every', type=int, default=5)
parser.add_argument('--lr', type=float, default=0.003)
parser.add_argument('--dropout', type=float, default=0.5)
parser.add_argument('--local_rank', type=int, help='get rank of the process')
parser.add_argument('--standalone', action='store_true', help='run in the standalone mode')
parser.add_argument('--num_negs', type=int, default=1)
parser.add_argument('--neg_share', default=False, action='store_true',
help="sharing neg nodes for positive nodes")
parser.add_argument('--remove_edge', default=False, action='store_true',
help="whether to remove edges during sampling")
parser.add_argument("--dgl_sparse", action='store_true',
help='Whether to use DGL sparse embedding')
parser.add_argument("--sparse_lr", type=float, default=1e-2,
help="sparse lr rate")
args = parser.parse_args()
print(args)
main(args)
...@@ -10,35 +10,35 @@ pip3 install ogb pyarrow ...@@ -10,35 +10,35 @@ pip3 install ogb pyarrow
To train RGCN, it has four steps: To train RGCN, it has four steps:
### Step 0: Setup a Distributed File System ### Step 0: Setup a Distributed File System
* You may skip this step if your cluster already has folder(s) synchronized across machines. * You may skip this step if your cluster already has folder(s) synchronized across machines.
To perform distributed training, files and codes need to be accessed across multiple machines. A distributed file system would perfectly handle the job (i.e., NFS, Ceph). To perform distributed training, files and codes need to be accessed across multiple machines. A distributed file system would perfectly handle the job (i.e., NFS, Ceph).
#### Server side setup #### Server side setup
Here is an example of how to setup NFS. First, install essential libs on the storage server Here is an example of how to setup NFS. First, install essential libs on the storage server
```bash ```bash
sudo apt-get install nfs-kernel-server sudo apt-get install nfs-kernel-server
``` ```
Below we assume the user account is `ubuntu` and we create a directory of `workspace` in the home directory. Below we assume the user account is `ubuntu` and we create a directory of `workspace` in the home directory.
```bash ```bash
mkdir -p /home/ubuntu/workspace mkdir -p /home/ubuntu/workspace
``` ```
We assume that the all servers are under a subnet with ip range `192.168.0.0` to `192.168.255.255`. The exports configuration needs to be modifed to We assume that the all servers are under a subnet with ip range `192.168.0.0` to `192.168.255.255`. The exports configuration needs to be modifed to
```bash ```bash
sudo vim /etc/exports sudo vim /etc/exports
# add the following line # add the following line
/home/ubuntu/workspace 192.168.0.0/16(rw,sync,no_subtree_check) /home/ubuntu/workspace 192.168.0.0/16(rw,sync,no_subtree_check)
``` ```
The server's internal ip can be checked via `ifconfig` or `ip`. If the ip does not begin with `192.168`, then you may use The server's internal ip can be checked via `ifconfig` or `ip`. If the ip does not begin with `192.168`, then you may use
```bash ```bash
# for ip range 10.0.0.0 – 10.255.255.255 # for ip range 10.0.0.0 – 10.255.255.255
/home/ubuntu/workspace 10.0.0.0/8(rw,sync,no_subtree_check) /home/ubuntu/workspace 10.0.0.0/8(rw,sync,no_subtree_check)
# for ip range 172.16.0.0 – 172.31.255.255 # for ip range 172.16.0.0 – 172.31.255.255
/home/ubuntu/workspace 172.16.0.0/12(rw,sync,no_subtree_check) /home/ubuntu/workspace 172.16.0.0/12(rw,sync,no_subtree_check)
``` ```
...@@ -51,22 +51,22 @@ sudo systemctl restart nfs-kernel-server ...@@ -51,22 +51,22 @@ sudo systemctl restart nfs-kernel-server
For configraution details, please refer to [NFS ArchWiki](https://wiki.archlinux.org/index.php/NFS). For configraution details, please refer to [NFS ArchWiki](https://wiki.archlinux.org/index.php/NFS).
#### Client side setup #### Client side setup
To use NFS, clients also require to install essential packages To use NFS, clients also require to install essential packages
``` ```
sudo apt-get install nfs-common sudo apt-get install nfs-common
``` ```
You can either mount the NFS manually You can either mount the NFS manually
``` ```
mkdir -p /home/ubuntu/workspace mkdir -p /home/ubuntu/workspace
sudo mount -t nfs <nfs-server-ip>:/home/ubuntu/workspace /home/ubuntu/workspace sudo mount -t nfs <nfs-server-ip>:/home/ubuntu/workspace /home/ubuntu/workspace
``` ```
or edit the fstab so the folder will be mounted automatically or edit the fstab so the folder will be mounted automatically
``` ```
# vim /etc/fstab # vim /etc/fstab
...@@ -74,7 +74,7 @@ or edit the fstab so the folder will be mounted automatically ...@@ -74,7 +74,7 @@ or edit the fstab so the folder will be mounted automatically
<nfs-server-ip>:/home/ubuntu/workspace /home/ubuntu/workspace nfs defaults 0 0 <nfs-server-ip>:/home/ubuntu/workspace /home/ubuntu/workspace nfs defaults 0 0
``` ```
Then run `mount -a`. Then run `mount -a`.
Now go to `/home/ubuntu/workspace` and clone the DGL Github repository. Now go to `/home/ubuntu/workspace` and clone the DGL Github repository.
...@@ -126,6 +126,23 @@ We can get the performance score at the second epoch: ...@@ -126,6 +126,23 @@ We can get the performance score at the second epoch:
Val Acc 0.4323, Test Acc 0.4255, time: 128.0379 Val Acc 0.4323, Test Acc 0.4255, time: 128.0379
``` ```
The command below launches the same distributed training job using dgl distributed NodeEmbedding
```bash
python3 ~/workspace/dgl/tools/launch.py \
--workspace ~/workspace/dgl/examples/pytorch/rgcn/experimental/ \
--num_trainers 1 \
--num_servers 1 \
--num_samplers 4 \
--part_config data/ogbn-mag.json \
--ip_config ip_config.txt \
"python3 entity_classify_dist.py --graph-name ogbn-mag --dataset ogbn-mag --fanout='25,25' --batch-size 1024 --n-hidden 64 --lr 0.01 --eval-batch-size 1024 --low-mem --dropout 0.5 --use-self-loop --n-bases 2 --n-epochs 3 --layer-norm --ip-config ip_config.txt --sparse-embedding --sparse-lr 0.06 --num_gpus 1"
```
We can get the performance score at the second epoch:
```
Val Acc 0.4410, Test Acc 0.4282, time: 32.5274
```
**Note:** if you are using conda or other virtual environments on the remote machines, you need to replace `python3` in the command string (i.e. the last argument) with the path to the Python interpreter in that environment. **Note:** if you are using conda or other virtual environments on the remote machines, you need to replace `python3` in the command string (i.e. the last argument) with the path to the Python interpreter in that environment.
## Partition a graph with ParMETIS ## Partition a graph with ParMETIS
...@@ -186,7 +203,7 @@ python3 get_mag_data.py ...@@ -186,7 +203,7 @@ python3 get_mag_data.py
### Step 5: Verify the partition result (Optional) ### Step 5: Verify the partition result (Optional)
```bash ```bash
python3 verify_mag_partitions.py python3 verify_mag_partitions.py
``` ```
## Distributed code runs in the standalone mode ## Distributed code runs in the standalone mode
......
...@@ -162,7 +162,7 @@ class DistEmbedLayer(nn.Module): ...@@ -162,7 +162,7 @@ class DistEmbedLayer(nn.Module):
# We only create embeddings for nodes without node features. # We only create embeddings for nodes without node features.
if feat_name not in g.nodes[ntype].data: if feat_name not in g.nodes[ntype].data:
part_policy = g.get_node_partition_policy(ntype) part_policy = g.get_node_partition_policy(ntype)
self.node_embeds[ntype] = dgl.distributed.DistEmbedding(g.number_of_nodes(ntype), self.node_embeds[ntype] = dgl.distributed.nn.NodeEmbedding(g.number_of_nodes(ntype),
self.embed_size, self.embed_size,
embed_name + '_' + ntype, embed_name + '_' + ntype,
init_emb, init_emb,
...@@ -389,10 +389,10 @@ def run(args, device, data): ...@@ -389,10 +389,10 @@ def run(args, device, data):
if args.sparse_embedding: if args.sparse_embedding:
if args.dgl_sparse and args.standalone: if args.dgl_sparse and args.standalone:
emb_optimizer = dgl.distributed.SparseAdagrad(list(embed_layer.node_embeds.values()), lr=args.sparse_lr) emb_optimizer = dgl.distributed.optim.SparseAdam(list(embed_layer.node_embeds.values()), lr=args.sparse_lr)
print('optimize DGL sparse embedding:', embed_layer.node_embeds.keys()) print('optimize DGL sparse embedding:', embed_layer.node_embeds.keys())
elif args.dgl_sparse: elif args.dgl_sparse:
emb_optimizer = dgl.distributed.SparseAdagrad(list(embed_layer.module.node_embeds.values()), lr=args.sparse_lr) emb_optimizer = dgl.distributed.optim.SparseAdam(list(embed_layer.module.node_embeds.values()), lr=args.sparse_lr)
print('optimize DGL sparse embedding:', embed_layer.module.node_embeds.keys()) print('optimize DGL sparse embedding:', embed_layer.module.node_embeds.keys())
elif args.standalone: elif args.standalone:
emb_optimizer = th.optim.SparseAdam(list(embed_layer.node_embeds.parameters()), lr=args.sparse_lr) emb_optimizer = th.optim.SparseAdam(list(embed_layer.node_embeds.parameters()), lr=args.sparse_lr)
...@@ -534,7 +534,7 @@ if __name__ == '__main__': ...@@ -534,7 +534,7 @@ if __name__ == '__main__':
parser.add_argument('--conf-path', type=str, help='The path to the partition config file') parser.add_argument('--conf-path', type=str, help='The path to the partition config file')
# rgcn related # rgcn related
parser.add_argument('--num_gpus', type=int, default=-1, parser.add_argument('--num_gpus', type=int, default=-1,
help="the number of GPU device. Use -1 for CPU training") help="the number of GPU device. Use -1 for CPU training")
parser.add_argument("--dropout", type=float, default=0, parser.add_argument("--dropout", type=float, default=0,
help="dropout probability") help="dropout probability")
......
...@@ -33,7 +33,7 @@ def read_ip_config(filename): ...@@ -33,7 +33,7 @@ def read_ip_config(filename):
172.31.47.147 30050 2 172.31.47.147 30050 2
172.31.30.180 30050 2 172.31.30.180 30050 2
Note that, DGL KVStore supports multiple servers that can share data with each other Note that, DGL KVStore supports multiple servers that can share data with each other
on the same machine via shared-tensor. So the server_count should be >= 1. on the same machine via shared-tensor. So the server_count should be >= 1.
Parameters Parameters
...@@ -103,11 +103,11 @@ def get_type_str(dtype): ...@@ -103,11 +103,11 @@ def get_type_str(dtype):
class KVServer(object): class KVServer(object):
"""KVServer is a lightweight key-value store service for DGL distributed training. """KVServer is a lightweight key-value store service for DGL distributed training.
In practice, developers can use KVServer to hold large-scale graph features or In practice, developers can use KVServer to hold large-scale graph features or
graph embeddings across machines in a distributed setting. Also, user can re-wriite _push_handler() graph embeddings across machines in a distributed setting. Also, user can re-wriite _push_handler()
and _pull_handler() API to support flexibale algorithms. and _pull_handler() API to support flexibale algorithms.
DGL kvstore supports multiple-servers on single-machine. That means we can lunach many servers on the same machine and all of DGL kvstore supports multiple-servers on single-machine. That means we can lunach many servers on the same machine and all of
these servers will share the same shared-memory tensor for load-balance. these servers will share the same shared-memory tensor for load-balance.
Note that, DO NOT use KVServer in multiple threads on Python because this behavior is not defined. Note that, DO NOT use KVServer in multiple threads on Python because this behavior is not defined.
...@@ -119,7 +119,7 @@ class KVServer(object): ...@@ -119,7 +119,7 @@ class KVServer(object):
server_id : int server_id : int
KVServer's ID (start from 0). KVServer's ID (start from 0).
server_namebook: dict server_namebook: dict
IP address namebook of KVServer, where key is the KVServer's ID IP address namebook of KVServer, where key is the KVServer's ID
(start from 0) and value is the server's machine_id, IP address and port, e.g., (start from 0) and value is the server's machine_id, IP address and port, e.g.,
{0:'[0, 172.31.40.143, 30050], {0:'[0, 172.31.40.143, 30050],
...@@ -196,7 +196,7 @@ class KVServer(object): ...@@ -196,7 +196,7 @@ class KVServer(object):
name : str name : str
data name data name
global2local : list or tensor (mx.ndarray or torch.tensor) global2local : list or tensor (mx.ndarray or torch.tensor)
A data mapping of global ID to local ID. KVStore will use global ID by default A data mapping of global ID to local ID. KVStore will use global ID by default
if the global2local is not been set. if the global2local is not been set.
Note that, if the global2local is None KVServer will read shared-tensor. Note that, if the global2local is None KVServer will read shared-tensor.
...@@ -260,7 +260,7 @@ class KVServer(object): ...@@ -260,7 +260,7 @@ class KVServer(object):
time.sleep(2) # wait writing finish time.sleep(2) # wait writing finish
break break
else: else:
time.sleep(2) # wait until the file been created time.sleep(2) # wait until the file been created
data_shape, data_type = self._read_data_shape_type(name+'-part-shape-'+str(self._machine_id)) data_shape, data_type = self._read_data_shape_type(name+'-part-shape-'+str(self._machine_id))
assert data_type == 'int64' assert data_type == 'int64'
shared_data = empty_shared_mem(name+'-part-', False, data_shape, 'int64') shared_data = empty_shared_mem(name+'-part-', False, data_shape, 'int64')
...@@ -526,8 +526,8 @@ class KVServer(object): ...@@ -526,8 +526,8 @@ class KVServer(object):
c_ptr=None) c_ptr=None)
for client_id in range(self._client_count): for client_id in range(self._client_count):
_send_kv_msg(self._sender, back_msg, client_id) _send_kv_msg(self._sender, back_msg, client_id)
self._barrier_count = 0 self._barrier_count = 0
# Final message # Final message
elif msg.type == KVMsgType.FINAL: elif msg.type == KVMsgType.FINAL:
print("Exit KVStore service %d, solved message count: %d" % (self.get_id(), self.get_message_count())) print("Exit KVStore service %d, solved message count: %d" % (self.get_id(), self.get_message_count()))
break # exit loop break # exit loop
...@@ -639,7 +639,7 @@ class KVServer(object): ...@@ -639,7 +639,7 @@ class KVServer(object):
def _default_push_handler(self, name, ID, data, target): def _default_push_handler(self, name, ID, data, target):
"""Default handler for PUSH message. """Default handler for PUSH message.
On default, _push_handler perform update operation for the tensor. On default, _push_handler perform update operation for the tensor.
...@@ -680,7 +680,7 @@ class KVServer(object): ...@@ -680,7 +680,7 @@ class KVServer(object):
class KVClient(object): class KVClient(object):
"""KVClient is used to push/pull tensors to/from KVServer. If the server node and client node are on the """KVClient is used to push/pull tensors to/from KVServer. If the server node and client node are on the
same machine, they can commuincate with each other using local shared-memory tensor, instead of TCP/IP connections. same machine, they can commuincate with each other using local shared-memory tensor, instead of TCP/IP connections.
Note that, DO NOT use KVClient in multiple threads on Python because this behavior is not defined. Note that, DO NOT use KVClient in multiple threads on Python because this behavior is not defined.
...@@ -690,7 +690,7 @@ class KVClient(object): ...@@ -690,7 +690,7 @@ class KVClient(object):
Parameters Parameters
---------- ----------
server_namebook: dict server_namebook: dict
IP address namebook of KVServer, where key is the KVServer's ID IP address namebook of KVServer, where key is the KVServer's ID
(start from 0) and value is the server's machine_id, IP address and port, and group_count, e.g., (start from 0) and value is the server's machine_id, IP address and port, and group_count, e.g.,
{0:'[0, 172.31.40.143, 30050, 2], {0:'[0, 172.31.40.143, 30050, 2],
...@@ -807,7 +807,7 @@ class KVClient(object): ...@@ -807,7 +807,7 @@ class KVClient(object):
if (os.path.exists(tensor_name+'shape-'+str(self._machine_id))): if (os.path.exists(tensor_name+'shape-'+str(self._machine_id))):
break break
else: else:
time.sleep(1) # wait until the file been created time.sleep(1) # wait until the file been created
shape, data_type = self._read_data_shape_type(tensor_name+'shape-'+str(self._machine_id)) shape, data_type = self._read_data_shape_type(tensor_name+'shape-'+str(self._machine_id))
assert data_type == dtype assert data_type == dtype
shared_data = empty_shared_mem(tensor_name, False, shape, dtype) shared_data = empty_shared_mem(tensor_name, False, shape, dtype)
...@@ -825,7 +825,7 @@ class KVClient(object): ...@@ -825,7 +825,7 @@ class KVClient(object):
type=KVMsgType.GET_SHAPE, type=KVMsgType.GET_SHAPE,
rank=self._client_id, rank=self._client_id,
name=name, name=name,
id=None, id=None,
data=None, data=None,
shape=None, shape=None,
c_ptr=None) c_ptr=None)
...@@ -844,12 +844,12 @@ class KVClient(object): ...@@ -844,12 +844,12 @@ class KVClient(object):
def init_data(self, name, shape, dtype, target_name): def init_data(self, name, shape, dtype, target_name):
"""Send message to kvserver to initialize new data and """Send message to kvserver to initialize new data and
get corresponded shared-tensor (e.g., partition_book, g2l) on kvclient. get corresponded shared-tensor (e.g., partition_book, g2l) on kvclient.
The new data will be initialized to zeros. The new data will be initialized to zeros.
Note that, this API must be invoked after the conenct() API. Note that, this API must be invoked after the conenct() API.
Parameters Parameters
---------- ----------
...@@ -1034,10 +1034,10 @@ class KVClient(object): ...@@ -1034,10 +1034,10 @@ class KVClient(object):
local_data = partial_data local_data = partial_data
else: # push data to remote server else: # push data to remote server
msg = KVStoreMsg( msg = KVStoreMsg(
type=KVMsgType.PUSH, type=KVMsgType.PUSH,
rank=self._client_id, rank=self._client_id,
name=name, name=name,
id=partial_id, id=partial_id,
data=partial_data, data=partial_data,
shape=None, shape=None,
c_ptr=None) c_ptr=None)
...@@ -1052,7 +1052,7 @@ class KVClient(object): ...@@ -1052,7 +1052,7 @@ class KVClient(object):
self._udf_push_handler(name+'-data-', local_id, local_data, self._data_store, self._udf_push_param) self._udf_push_handler(name+'-data-', local_id, local_data, self._data_store, self._udf_push_param)
else: else:
self._default_push_handler(name+'-data-', local_id, local_data, self._data_store) self._default_push_handler(name+'-data-', local_id, local_data, self._data_store)
def pull(self, name, id_tensor): def pull(self, name, id_tensor):
"""Pull message from KVServer. """Pull message from KVServer.
...@@ -1081,8 +1081,8 @@ class KVClient(object): ...@@ -1081,8 +1081,8 @@ class KVClient(object):
self._group_count, self._group_count,
self._machine_id, self._machine_id,
self._client_id, self._client_id,
self._data_store[name+'-part-'], self._data_store[name+'-part-'],
g2l, g2l,
self._data_store[name+'-data-'], self._data_store[name+'-data-'],
self._sender, self._sender,
self._receiver) self._receiver)
...@@ -1116,9 +1116,9 @@ class KVClient(object): ...@@ -1116,9 +1116,9 @@ class KVClient(object):
local_id = partial_id local_id = partial_id
else: # pull data from remote server else: # pull data from remote server
msg = KVStoreMsg( msg = KVStoreMsg(
type=KVMsgType.PULL, type=KVMsgType.PULL,
rank=self._client_id, rank=self._client_id,
name=name, name=name,
id=partial_id, id=partial_id,
data=None, data=None,
shape=None, shape=None,
...@@ -1128,16 +1128,16 @@ class KVClient(object): ...@@ -1128,16 +1128,16 @@ class KVClient(object):
_send_kv_msg(self._sender, msg, s_id) _send_kv_msg(self._sender, msg, s_id)
pull_count += 1 pull_count += 1
start += count[idx] start += count[idx]
msg_list = [] msg_list = []
if local_id is not None: # local pull if local_id is not None: # local pull
local_data = self._udf_pull_handler(name+'-data-', local_id, self._data_store) local_data = self._udf_pull_handler(name+'-data-', local_id, self._data_store)
s_id = random.randint(self._machine_id*self._group_count, (self._machine_id+1)*self._group_count-1) s_id = random.randint(self._machine_id*self._group_count, (self._machine_id+1)*self._group_count-1)
local_msg = KVStoreMsg( local_msg = KVStoreMsg(
type=KVMsgType.PULL_BACK, type=KVMsgType.PULL_BACK,
rank=s_id, rank=s_id,
name=name, name=name,
id=None, id=None,
data=local_data, data=local_data,
shape=None, shape=None,
...@@ -1157,13 +1157,13 @@ class KVClient(object): ...@@ -1157,13 +1157,13 @@ class KVClient(object):
return data_tensor[back_sorted_id] # return data with original index order return data_tensor[back_sorted_id] # return data with original index order
def barrier(self): def barrier(self):
"""Barrier for all client nodes """Barrier for all client nodes
This API will be blocked untill all the clients call this API. This API will be blocked untill all the clients call this API.
""" """
msg = KVStoreMsg( msg = KVStoreMsg(
type=KVMsgType.BARRIER, type=KVMsgType.BARRIER,
rank=self._client_id, rank=self._client_id,
name=None, name=None,
...@@ -1215,7 +1215,7 @@ class KVClient(object): ...@@ -1215,7 +1215,7 @@ class KVClient(object):
IP = '127.0.0.1' IP = '127.0.0.1'
finally: finally:
s.close() s.close()
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(("",0)) s.bind(("",0))
s.listen(1) s.listen(1)
...@@ -1365,7 +1365,7 @@ class KVClient(object): ...@@ -1365,7 +1365,7 @@ class KVClient(object):
def _default_push_handler(self, name, ID, data, target): def _default_push_handler(self, name, ID, data, target):
"""Default handler for PUSH message. """Default handler for PUSH message.
On default, _push_handler perform update operation for the tensor. On default, _push_handler perform update operation for the tensor.
...@@ -1381,4 +1381,4 @@ class KVClient(object): ...@@ -1381,4 +1381,4 @@ class KVClient(object):
self._data_store self._data_store
""" """
target[name][ID] = data target[name][ID] = data
...@@ -19,6 +19,8 @@ from .dist_tensor import DistTensor ...@@ -19,6 +19,8 @@ from .dist_tensor import DistTensor
from .partition import partition_graph, load_partition, load_partition_book from .partition import partition_graph, load_partition, load_partition_book
from .graph_partition_book import GraphPartitionBook, PartitionPolicy from .graph_partition_book import GraphPartitionBook, PartitionPolicy
from .sparse_emb import SparseAdagrad, DistEmbedding from .sparse_emb import SparseAdagrad, DistEmbedding
from . import nn
from . import optim
from .rpc import * from .rpc import *
from .rpc_server import start_server from .rpc_server import start_server
......
...@@ -78,6 +78,8 @@ class DistTensor: ...@@ -78,6 +78,8 @@ class DistTensor:
The system determines the right partition policy automatically. The system determines the right partition policy automatically.
persistent : bool persistent : bool
Whether the created tensor lives after the ``DistTensor`` object is destroyed. Whether the created tensor lives after the ``DistTensor`` object is destroyed.
is_gdata : bool
Whether the created tensor is a ndata/edata or not.
Examples Examples
-------- --------
...@@ -100,7 +102,7 @@ class DistTensor: ...@@ -100,7 +102,7 @@ class DistTensor:
do the same. do the same.
''' '''
def __init__(self, shape, dtype, name=None, init_func=None, part_policy=None, def __init__(self, shape, dtype, name=None, init_func=None, part_policy=None,
persistent=False): persistent=False, is_gdata=True):
self.kvstore = get_kvstore() self.kvstore = get_kvstore()
assert self.kvstore is not None, \ assert self.kvstore is not None, \
'Distributed module is not initialized. Please call dgl.distributed.initialize.' 'Distributed module is not initialized. Please call dgl.distributed.initialize.'
...@@ -126,6 +128,7 @@ class DistTensor: ...@@ -126,6 +128,7 @@ class DistTensor:
+ 'its first dimension does not match the number of nodes or edges ' \ + 'its first dimension does not match the number of nodes or edges ' \
+ 'of a distributed graph or there does not exist a distributed graph.' + 'of a distributed graph or there does not exist a distributed graph.'
self._tensor_name = name
self._part_policy = part_policy self._part_policy = part_policy
assert part_policy.get_size() == shape[0], \ assert part_policy.get_size() == shape[0], \
'The partition policy does not match the input shape.' 'The partition policy does not match the input shape.'
...@@ -147,7 +150,7 @@ class DistTensor: ...@@ -147,7 +150,7 @@ class DistTensor:
self._name = str(data_name) self._name = str(data_name)
self._persistent = persistent self._persistent = persistent
if self._name not in exist_names: if self._name not in exist_names:
self.kvstore.init_data(self._name, shape, dtype, part_policy, init_func) self.kvstore.init_data(self._name, shape, dtype, part_policy, init_func, is_gdata)
self._owner = True self._owner = True
else: else:
self._owner = False self._owner = False
...@@ -218,3 +221,14 @@ class DistTensor: ...@@ -218,3 +221,14 @@ class DistTensor:
The name of the tensor. The name of the tensor.
''' '''
return self._name return self._name
@property
def tensor_name(self):
'''Return the tensor name
Returns
-------
str
The name of the tensor.
'''
return self._tensor_name
...@@ -825,6 +825,8 @@ class KVClient(object): ...@@ -825,6 +825,8 @@ class KVClient(object):
self._full_data_shape = {} self._full_data_shape = {}
# Store all the data name # Store all the data name
self._data_name_list = set() self._data_name_list = set()
# Store all graph data name
self._gdata_name_list = set()
# Basic information # Basic information
self._server_namebook = rpc.read_ip_config(ip_config, num_servers) self._server_namebook = rpc.read_ip_config(ip_config, num_servers)
self._server_count = len(self._server_namebook) self._server_count = len(self._server_namebook)
...@@ -940,7 +942,7 @@ class KVClient(object): ...@@ -940,7 +942,7 @@ class KVClient(object):
self._pull_handlers[name] = func self._pull_handlers[name] = func
self.barrier() self.barrier()
def init_data(self, name, shape, dtype, part_policy, init_func): def init_data(self, name, shape, dtype, part_policy, init_func, is_gdata=True):
"""Send message to kvserver to initialize new data tensor and mapping this """Send message to kvserver to initialize new data tensor and mapping this
data from server side to client side. data from server side to client side.
...@@ -956,6 +958,8 @@ class KVClient(object): ...@@ -956,6 +958,8 @@ class KVClient(object):
partition policy. partition policy.
init_func : func init_func : func
UDF init function UDF init function
is_gdata : bool
Whether the created tensor is a ndata/edata or not.
""" """
assert len(name) > 0, 'name cannot be empty.' assert len(name) > 0, 'name cannot be empty.'
assert len(shape) > 0, 'shape cannot be empty' assert len(shape) > 0, 'shape cannot be empty'
...@@ -997,6 +1001,8 @@ class KVClient(object): ...@@ -997,6 +1001,8 @@ class KVClient(object):
dlpack = shared_data.to_dlpack() dlpack = shared_data.to_dlpack()
self._data_store[name] = F.zerocopy_from_dlpack(dlpack) self._data_store[name] = F.zerocopy_from_dlpack(dlpack)
self._data_name_list.add(name) self._data_name_list.add(name)
if is_gdata:
self._gdata_name_list.add(name)
self._full_data_shape[name] = tuple(shape) self._full_data_shape[name] = tuple(shape)
self._pull_handlers[name] = default_pull_handler self._pull_handlers[name] = default_pull_handler
self._push_handlers[name] = default_push_handler self._push_handlers[name] = default_push_handler
...@@ -1040,6 +1046,8 @@ class KVClient(object): ...@@ -1040,6 +1046,8 @@ class KVClient(object):
self.barrier() self.barrier()
self._data_name_list.remove(name) self._data_name_list.remove(name)
if name in self._gdata_name_list:
self._gdata_name_list.remove(name)
# TODO(chao) : remove the delete log print # TODO(chao) : remove the delete log print
del self._data_store[name] del self._data_store[name]
del self._full_data_shape[name] del self._full_data_shape[name]
...@@ -1110,11 +1118,14 @@ class KVClient(object): ...@@ -1110,11 +1118,14 @@ class KVClient(object):
response = rpc.recv_response() response = rpc.recv_response()
assert response.msg == SEND_META_TO_BACKUP_MSG assert response.msg == SEND_META_TO_BACKUP_MSG
self._data_name_list.add(name) self._data_name_list.add(name)
# map_shared_data happens only at DistGraph initialization
# TODO(xiangsx): We assume there is no non-graph data initialized at this time
self._gdata_name_list.add(name)
self.barrier() self.barrier()
def data_name_list(self): def data_name_list(self):
"""Get all the data name""" """Get all the data name"""
return list(self._data_name_list) return list(self._gdata_name_list)
def get_data_meta(self, name): def get_data_meta(self, name):
"""Get meta data (data_type, data_shape, partition_policy) """Get meta data (data_type, data_shape, partition_policy)
...@@ -1125,6 +1136,25 @@ class KVClient(object): ...@@ -1125,6 +1136,25 @@ class KVClient(object):
part_policy = self._part_policy[name] part_policy = self._part_policy[name]
return (data_type, data_shape, part_policy) return (data_type, data_shape, part_policy)
def get_partid(self, name, id_tensor):
"""
Parameters
----------
name : str
data name
id_tensor : tensor
a vector storing the global data ID
"""
assert len(name) > 0, 'name cannot be empty.'
id_tensor = utils.toindex(id_tensor)
id_tensor = id_tensor.tousertensor()
assert F.ndim(id_tensor) == 1, 'ID must be a vector.'
# partition data
machine_id = self._part_policy[name].to_partid(id_tensor)
return machine_id
def push(self, name, id_tensor, data_tensor): def push(self, name, id_tensor, data_tensor):
"""Push data to KVServer. """Push data to KVServer.
......
"""dgl distributed.optims."""
import importlib
import sys
import os
from ...backend import backend_name
from ...utils import expand_as_pair
def _load_backend(mod_name):
mod = importlib.import_module('.%s' % mod_name, __name__)
thismod = sys.modules[__name__]
for api, obj in mod.__dict__.items():
setattr(thismod, api, obj)
_load_backend(backend_name)
"""dgl distributed sparse optimizer for pytorch."""
from .sparse_emb import NodeEmbedding
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment