Commit 58d212ad authored by fengzch-das's avatar fengzch-das
Browse files

update readme

parent be65e922
Pipeline #2863 failed with stages
in 0 seconds
<h1 align="center">PyGAS: Auto-Scaling GNNs in PyG</h1>
<img width="100%" src="https://raw.githubusercontent.com/rusty1s/pyg_autoscale/master/figures/overview.png?token=ABU7ZAXZ7WT3RIOSYHIDIVDAEI3SY" />
--------------------------------------------------------------------------------
*PyGAS* is the practical realization of our *<ins>G</ins>NN<ins>A</ins>uto<ins>S</ins>cale* (GAS) framework, which scales arbitrary message-passing GNNs to large graphs, as described in our paper:
Matthias Fey, Jan E. Lenssen, Frank Weichert, Jure Leskovec: **[GNNAutoScale: Scalable and Expressive Graph Neural Networks via Historical Embeddings](http://arxiv.org/abs/2106.05609)** *(ICML 2021)*
GAS prunes entire sub-trees of the computation graph by utilizing historical embeddings from prior training iterations, leading to constant GPU memory consumption in respect to input mini-batch size, and maximally expressivity.
*PyGAS* is implemented in [PyTorch](https://pytorch.org/) and utilizes the [PyTorch Geometric](https://github.com/rusty1s/pytorch_geometric) (PyG) library.
It provides an easy-to-use interface to convert a common or custom GNN from PyG into its scalable variant:
```python
from torch_geometric.nn import SAGEConv
from torch_geometric_autoscale import ScalableGNN
from torch_geometric_autoscale import metis, permute, SubgraphLoader
class GNN(ScalableGNN):
def __init__(self, num_nodes, in_channels, hidden_channels,
out_channels, num_layers):
# * pool_size determines the number of pinned CPU buffers
# * buffer_size determines the size of pinned CPU buffers,
# i.e. the maximum number of out-of-mini-batch nodes
super().__init__(num_nodes, hidden_channels, num_layers,
pool_size=2, buffer_size=5000)
self.convs = ModuleList()
self.convs.append(SAGEConv(in_channels, hidden_channels))
for _ in range(num_layers - 2):
self.convs.append(SAGEConv(hidden_channels, hidden_channels))
self.convs.append(SAGEConv(hidden_channels, out_channels))
def forward(self, x, adj_t, *args):
for conv, history in zip(self.convs[:-1], self.histories):
x = conv(x, adj_t).relu_()
x = self.push_and_pull(history, x, *args)
return self.convs[-1](x, adj_t)
perm, ptr = metis(data.adj_t, num_parts=40, log=True)
data = permute(data, perm, log=True)
loader = SubgraphLoader(data, ptr, batch_size=10, shuffle=True)
model = GNN(...)
for batch, *args in loader:
out = model(batch.x, batch.adj_t, *args)
## 简介
PyGAS是我们提出的GNNAutoScale(GAS)框架的实践实现,该框架能够将任意消息传递图神经网络(GNN)扩展到大规模图数据上。GAS通过利用历史训练迭代中存储的嵌入向量,对计算图的整个子树进行剪枝,从而实现了以下两大突破:
- 恒定GPU显存消耗:使显存占用与输入小批量大小无关
- 最大化模型表达能力:保留完整的拓扑信息而无须牺牲模型性能
## 安装
组件支持组合
| PyTorch版本 | fastpt版本 |PyGAS版本 | DTK版本 | Python版本 | 推荐编译方式 |
| ----------- | ----------- | ----------- | ------------------------ | -----------------| ------------ |
| 2.5.1 | 2.1.0 |master-be65e92 | >= 25.04 | 3.8、3.10、3.11 | fastpt不转码 |
| 2.4.1 | 2.0.1 |master-be65e92 | >= 25.04 | 3.8、3.10、3.11 | fastpt不转码 |
| 其他 | 其他 | 其他 | 其他 | 3.8、3.10、3.11 | hip转码 |
+ pytorch版本大于2.4.1 && dtk版本大于25.04 推荐使用fastpt不转码编译。
### 1、使用pip方式安装
PyGAS whl包下载目录:[光和开发者社区](https://download.sourcefind.cn:65024/4/main),选择对应的pytorch版本和python版本下载对应PyGAS的whl包
```shell
pip install torch* (下载torch的whl包)
pip install fastpt* --no-deps (下载fastpt的whl包)
source /usr/local/bin/fastpt -E
pip install torch_geometric_autoscale* (下载的PyGAS的whl包)
```
### 2、使用源码编译方式安装
A detailed description of `ScalableGNN` can be found [in its implementation](https://github.com/rusty1s/pyg_autoscale/blob/master/torch_geometric_autoscale/models/base.py#L13).
#### 编译环境准备
提供基于fastpt不转码编译:
## Requirements
1. 基于光源pytorch基础镜像环境:镜像下载地址:[光合开发者社区](https://sourcefind.cn/#/image/dcu/pytorch),根据pytorch、python、dtk及系统下载对应的镜像版本。
* Install [**PyTorch >= 1.7.0**](https://pytorch.org/get-started/locally/)
* Install [**PyTorch Geometric >= 1.7.0**](https://github.com/rusty1s/pytorch_geometric#installation):
```
pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
pip install torch-sparse -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
pip install torch-geometric
2. 基于现有python环境:安装pytorch,fastpt whl包下载目录:[光合开发者社区](https://sourcefind.cn/#/image/dcu/pytorch),根据python、dtk版本,下载对应pytorch的whl包。安装命令如下:
```shell
pip install torch* (下载torch的whl包)
pip install fastpt* --no-deps (下载fastpt的whl包, 安装顺序,先安装torch,后安装fastpt)
pip install pytest
pip install wheel
```
where `${TORCH}` should be replaced by either `1.7.0` or `1.8.0`, and `${CUDA}` should be replaced by either `cpu`, `cu92`, `cu101`, `cu102`, `cu110` or `cu111`, depending on your PyTorch installation.
## Installation
#### 源码编译安装
- 代码下载
```shell
git clone http://developer.sourcefind.cn/codes/OpenDAS/pyg_autoscale.git # 根据编译需要切换分支
```
pip install git+https://github.com/rusty1s/pyg_autoscale.git
- 提供2种源码编译方式(进入pyg_autoscale目录):
```
1. 设置不转码编译环境变量
source /usr/local/bin/fastpt -C
or
2. 编译whl包并安装
cp -r eigen-master/* eigen/
python3 setup.py -v bdist_wheel
pip install dist/torch_geometric_autoscale* --no-deps
3. 源码编译安装
export FORCE_CUDA=1
python3 setup.py install --no-deps
```
python setup.py install
```
## Project Structure
#### 注意事项
+ 若使用pip install下载安装过慢,可添加pypi清华源:-i https://pypi.tuna.tsinghua.edu.cn/simple/
+ ROCM_PATH为dtk的路径,默认为/opt/dtk
+ 在pytorch2.5.1环境下编译需要支持c++17语法,打开setup.py文件,把文件中的 -std=c++14 修改为 -std=c++17
* **`torch_geometric_autoscale/`** contains the source code of *PyGAS*
* **`examples/`** contains examples to demonstrate how to apply GAS in practice
* **`small_benchmark/`** includes experiments to evaluate GAS performance on *small-scale* graphs
* **`large_benchmark/`** includes experiments to evaluate GAS performance on *large-scale* graphs
## Known Issue
-
We use [**Hydra**](https://hydra.cc/) to manage hyperparameter configurations.
## Cite
Please cite [our paper](http://arxiv.org/abs/2106.05609) if you use this code in your own work:
```
@inproceedings{Fey/etal/2021,
title={{GNNAutoScale}: Scalable and Expressive Graph Neural Networks via Historical Embeddings},
author={Fey, M. and Lenssen, J. E. and Weichert, F. and Leskovec, J.},
booktitle={International Conference on Machine Learning (ICML)},
year={2021},
}
```
## 参考资料
- [README_ORIGIN](README_ORIGIN.md)
- [https://github.com/rusty1s/pyg_autoscale.git](https://github.com/rusty1s/pyg_autoscale.git)
<h1 align="center">PyGAS: Auto-Scaling GNNs in PyG</h1>
<img width="100%" src="https://raw.githubusercontent.com/rusty1s/pyg_autoscale/master/figures/overview.png?token=ABU7ZAXZ7WT3RIOSYHIDIVDAEI3SY" />
--------------------------------------------------------------------------------
*PyGAS* is the practical realization of our *<ins>G</ins>NN<ins>A</ins>uto<ins>S</ins>cale* (GAS) framework, which scales arbitrary message-passing GNNs to large graphs, as described in our paper:
Matthias Fey, Jan E. Lenssen, Frank Weichert, Jure Leskovec: **[GNNAutoScale: Scalable and Expressive Graph Neural Networks via Historical Embeddings](http://arxiv.org/abs/2106.05609)** *(ICML 2021)*
GAS prunes entire sub-trees of the computation graph by utilizing historical embeddings from prior training iterations, leading to constant GPU memory consumption in respect to input mini-batch size, and maximally expressivity.
*PyGAS* is implemented in [PyTorch](https://pytorch.org/) and utilizes the [PyTorch Geometric](https://github.com/rusty1s/pytorch_geometric) (PyG) library.
It provides an easy-to-use interface to convert a common or custom GNN from PyG into its scalable variant:
```python
from torch_geometric.nn import SAGEConv
from torch_geometric_autoscale import ScalableGNN
from torch_geometric_autoscale import metis, permute, SubgraphLoader
class GNN(ScalableGNN):
def __init__(self, num_nodes, in_channels, hidden_channels,
out_channels, num_layers):
# * pool_size determines the number of pinned CPU buffers
# * buffer_size determines the size of pinned CPU buffers,
# i.e. the maximum number of out-of-mini-batch nodes
super().__init__(num_nodes, hidden_channels, num_layers,
pool_size=2, buffer_size=5000)
self.convs = ModuleList()
self.convs.append(SAGEConv(in_channels, hidden_channels))
for _ in range(num_layers - 2):
self.convs.append(SAGEConv(hidden_channels, hidden_channels))
self.convs.append(SAGEConv(hidden_channels, out_channels))
def forward(self, x, adj_t, *args):
for conv, history in zip(self.convs[:-1], self.histories):
x = conv(x, adj_t).relu_()
x = self.push_and_pull(history, x, *args)
return self.convs[-1](x, adj_t)
perm, ptr = metis(data.adj_t, num_parts=40, log=True)
data = permute(data, perm, log=True)
loader = SubgraphLoader(data, ptr, batch_size=10, shuffle=True)
model = GNN(...)
for batch, *args in loader:
out = model(batch.x, batch.adj_t, *args)
```
A detailed description of `ScalableGNN` can be found [in its implementation](https://github.com/rusty1s/pyg_autoscale/blob/master/torch_geometric_autoscale/models/base.py#L13).
## Requirements
* Install [**PyTorch >= 1.7.0**](https://pytorch.org/get-started/locally/)
* Install [**PyTorch Geometric >= 1.7.0**](https://github.com/rusty1s/pytorch_geometric#installation):
```
pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
pip install torch-sparse -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
pip install torch-geometric
```
where `${TORCH}` should be replaced by either `1.7.0` or `1.8.0`, and `${CUDA}` should be replaced by either `cpu`, `cu92`, `cu101`, `cu102`, `cu110` or `cu111`, depending on your PyTorch installation.
## Installation
```
pip install git+https://github.com/rusty1s/pyg_autoscale.git
```
or
```
python setup.py install
```
## Project Structure
* **`torch_geometric_autoscale/`** contains the source code of *PyGAS*
* **`examples/`** contains examples to demonstrate how to apply GAS in practice
* **`small_benchmark/`** includes experiments to evaluate GAS performance on *small-scale* graphs
* **`large_benchmark/`** includes experiments to evaluate GAS performance on *large-scale* graphs
We use [**Hydra**](https://hydra.cc/) to manage hyperparameter configurations.
## Cite
Please cite [our paper](http://arxiv.org/abs/2106.05609) if you use this code in your own work:
```
@inproceedings{Fey/etal/2021,
title={{GNNAutoScale}: Scalable and Expressive Graph Neural Networks via Historical Embeddings},
author={Fey, M. and Lenssen, J. E. and Weichert, F. and Leskovec, J.},
booktitle={International Conference on Machine Learning (ICML)},
year={2021},
}
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment