".github/git@developer.sourcefind.cn:OpenDAS/vision.git" did not exist on "d1b2f4a18e0cae06b0a2288763c67b8371e65d9e"
Unverified Commit 5ce8b83d authored by Rhett Ying's avatar Rhett Ying Committed by GitHub
Browse files

[GraphBolt] disable full neighbor sampling in reference stage for hetero rgcn example (#6396)

parent 5ba2620f
...@@ -15,21 +15,21 @@ python3 hetero_rgcn.py --dataset ogbn-mag --num_gups 1 ...@@ -15,21 +15,21 @@ python3 hetero_rgcn.py --dataset ogbn-mag --num_gups 1
``` ```
### Resource usage and time cost ### Resource usage and time cost
Below results are roughly collected from an AWS EC2 **g4dn.metal**, 384GB RAM, 96 vCPUs(Cascade Lake P-8259L), 8 NVIDIA T4 GPUs(16GB RAM). CPU RAM usage is the peak value of `used` and `buff/cache` field of `free` command which are a bit rough. Please refer to `RSS`/`USS`/`PSS` which are more accurate. GPU RAM usage is the peak value recorded by `nvidia-smi` command. Below results are roughly collected from an AWS EC2 **g4dn.metal**, 384GB RAM, 96 vCPUs(Cascade Lake P-8259L), 8 NVIDIA T4 GPUs(16GB RAM). CPU RAM usage is the peak value of `used` field of `free` command which is a bit rough. Please refer to `RSS`/`USS`/`PSS` which are more accurate. GPU RAM usage is the peak value recorded by `nvidia-smi` command.
| Dataset Size | CPU RAM Usage | Num of GPUs | GPU RAM Usage | Time Per Epoch(Training) | Time Per Epoch(Inference: train/val/test set) | | Dataset Size | CPU RAM Usage | Num of GPUs | GPU RAM Usage | Time Per Epoch(Training) | Time Per Epoch(Inference: train/val/test set) |
| ------------ | ------------- | ----------- | ---------- | --------- | --------------------------- | | ------------ | ------------- | ----------- | ---------- | --------- | --------------------------- |
| ~1.1GB | ~5GB | 0 | 0GB | ~4min5s | ~2min7s + ~0min12s + ~0min8s | | ~1.1GB | ~4.5GB | 0 | 0GB | ~4min14s(615it, 2.41it/s) | ~0min28s(154it, 5.46it/s) + ~0min2s(16it, 5.48it/s) + ~0min2s(11it, 5.44it/s) |
| ~1.1GB | ~4.3GB | 1 | 4.7GB | ~1min18s | ~1min54s + ~0min12s + ~0min8s | | ~1.1GB | ~2GB | 0 | 4.4GB | ~1min15s(615it, 8.11it/s) | ~0min27s(154it, 5.63it/s) + ~0min2s(16it, 5.90it/s) + ~0min1s(11it, 5.82it/s) |
### Accuracies ### Accuracies
``` ```
Final performance: Final performance:
All runs: All runs:
Highest Train: 49.29 ± 0.85 Highest Train: 64.66 ± 0.74
Highest Valid: 34.69 ± 0.49 Highest Valid: 41.31 ± 0.12
Final Train: 48.14 ± 1.09 Final Train: 64.66 ± 0.74
Final Test: 33.65 ± 0.63 Final Test: 40.07 ± 0.02
``` ```
## Run on `ogb-lsc-mag240m` dataset ## Run on `ogb-lsc-mag240m` dataset
...@@ -45,24 +45,19 @@ python3 hetero_rgcn.py --dataset ogb-lsc-mag240m --num_gups 1 ...@@ -45,24 +45,19 @@ python3 hetero_rgcn.py --dataset ogb-lsc-mag240m --num_gups 1
``` ```
### Resource usage and time cost ### Resource usage and time cost
Below results are roughly collected from an AWS EC2 **g4dn.metal**, 384GB RAM, 96 vCPUs(Cascade Lake P-8259L), 8 NVIDIA T4 GPUs(16GB RAM). CPU RAM usage is the peak value of `used` and `buff/cache` field of `free` command which are a bit rough. Please refer to `RSS`/`USS`/`PSS` which are more accurate. GPU RAM usage is the peak value recorded by `nvidia-smi` command. Below results are roughly collected from an AWS EC2 **g4dn.metal**, 384GB RAM, 96 vCPUs(Cascade Lake P-8259L), 8 NVIDIA T4 GPUs(16GB RAM). CPU RAM usage is the peak value of `used` field of `free` command which is a bit rough. Please refer to `RSS`/`USS`/`PSS` which are more accurate. GPU RAM usage is the peak value recorded by `nvidia-smi` command.
Infer with full neighbors on GPU is out of memory on `T4(16GB RAM)``. GPUs with larger memory is required such as `A100(40GB RAM)`. | Dataset Size | CPU RAM Usage | Num of GPUs | GPU RAM Usage | Time Per Epoch(Training) | Time Per Epoch(Inference: train/val/test set) |
```
Tried to allocate 21.72 GiB (GPU 0; 14.75 GiB total capacity; 12.30 GiB already allocated; 2.02 GiB free; 12.60 GiB reserved in total by PyTorch)
```
| Dataset Size | CPU RAM Usage(used + buff/cache) | Num of GPUs | GPU RAM Usage | Time Per Epoch(Training) | Time Per Epoch(Inference: train/val/test set) |
| ------------ | ------------- | ----------- | ---------- | --------- | --------------------------- | | ------------ | ------------- | ----------- | ---------- | --------- | --------------------------- |
| ~404GB | ~110GB + ~250GB | 0 | 0GB | ~5min22s(1087it, 3.37it/s) | ~35min29s(272it, 7.83s/it) + ~6min9s(34it, 10.87s/it) + ~3min32s(22it, 9.66s/it) | | ~404GB | ~55GB | 0 | 0GB | ~3min51s(1087it, 4.70it/s) | ~2min21s(272it, 1.93it/s) + ~0min22s(34it, 1.48it/s) + ~0min14s(22it, 1.51it/s) |
| ~404GB | ~110GB + ~250GB | 1 | 2.7GB | ~2min45s(1087it, 6.56it/s) | ~OOM + ~OOM + ~OOM | | ~404GB | ~55GB | 1 | 7GB | ~2min41s(1087it, 6.73it/s) | ~1min52s(272it, 2.41it/s) + ~0min17s(34it, 1.93it/s) + ~0min11s(22it, 1.99it/s) |
### Accuracies ### Accuracies
``` ```
Final performance: Final performance:
All runs: All runs:
Highest Train: 54.75 ± 0.29 Highest Train: 54.43 ± 0.39
Highest Valid: 52.08 ± 0.09 Highest Valid: 51.78 ± 0.68
Final Train: 54.75 ± 0.29 Final Train: 54.35 ± 0.51
Final Test: 0.00 ± 0.00 Final Test: 0.00 ± 0.00
``` ```
\ No newline at end of file
...@@ -443,8 +443,9 @@ def extract_node_features(name, block, data, node_embed, device): ...@@ -443,8 +443,9 @@ def extract_node_features(name, block, data, node_embed, device):
} }
# Original feature data are stored in float16 while model weights are # Original feature data are stored in float16 while model weights are
# float32, so we need to convert the features to float32. # float32, so we need to convert the features to float32.
# [TODO] Enable mixed precision training on GPU. node_features = {
node_features = {k: v.float() for k, v in node_features.items()} k: v.to(device).float() for k, v in node_features.items()
}
return node_features return node_features
...@@ -469,18 +470,6 @@ def evaluate( ...@@ -469,18 +470,6 @@ def evaluate(
else: else:
evaluator = MAG240MEvaluator() evaluator = MAG240MEvaluator()
# Initialize a neighbor sampler that samples all neighbors. The model
# has 2 GNN layers, so we create a sampler of 2 layers.
######################################################################
# [Why we need to sample all neighbors?]
# During the testing phase, we use a `MultiLayerFullNeighborSampler` to
# sample all neighbors for each node. This is done to achieve the most
# accurate evaluation of the model's performance, despite the increased
# computational cost. This contrasts with the training phase where we
# prefer a balance between computational efficiency and model accuracy,
# hence only a subset of neighbors is sampled.
######################################################################
data_loader = create_dataloader( data_loader = create_dataloader(
name, name,
g, g,
...@@ -488,7 +477,7 @@ def evaluate( ...@@ -488,7 +477,7 @@ def evaluate(
item_set, item_set,
device, device,
batch_size=4096, batch_size=4096,
fanouts=[-1, -1], fanouts=[25, 10],
shuffle=False, shuffle=False,
num_workers=num_workers, num_workers=num_workers,
) )
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment