Commit 3c15726c authored by yangzhong's avatar yangzhong
Browse files

git init

parents
---
hide:
- toc
---
# Installation
We use MLCommons CM Automation framework to run MLPerf inference benchmarks.
CM needs `git`, `python3-pip` and `python3-venv` installed on your system. If any of these are absent, please follow the [official CM installation page](https://docs.mlcommons.org/ck/install) to install them. Once the dependencies are installed, do the following
## Activate a Virtual ENV for CM
This step is not mandatory as CM can use separate virtual environment for MLPerf inference. But the latest `pip` install requires this or else will need the `--break-system-packages` flag while installing `cm4mlops`.
```bash
python3 -m venv cm
source cm/bin/activate
```
## Install CM and pulls any needed repositories
=== "Use the default fork of CM MLOps repository"
```bash
pip install cm4mlops
```
=== "Use custom fork/branch of the CM MLOps repository"
```bash
pip install cmind && cm init --quiet --repo=mlcommons@cm4mlops --branch=mlperf-inference
```
Here, `repo` is in the format `githubUsername@githubRepo`.
Now, you are ready to use the `cm` commands to run MLPerf inference as given in the [benchmarks](../index.md) page
mkdocs-material
swagger-markdown
mkdocs-macros-plugin
ruamel.yaml
mkdocs-redirects
mkdocs-site-urls
---
hide:
- toc
---
Click [here](https://docs.google.com/presentation/d/1cmbpZUpVr78EIrhzyMBnnWnjJrD-mZ2vmSb-yETkTA8/edit?usp=sharing) to view the proposal slide for Common Automation for MLPerf Inference Submission Generation through CM.
=== "Custom automation based MLPerf results"
If you have not followed the `cm run` commands under the individual model pages in the [benchmarks](../index.md) directory, please make sure that the result directory is structured in the following way. You can see the real examples for the expected folder structure [here](https://github.com/mlcommons/inference/tree/submission-generation-examples).
```
└── System description ID(SUT Name)
├── system_meta.json
└── Benchmark
└── Scenario
├── Performance
| └── run_1 run for all scenarios
| ├── mlperf_log_summary.txt
| └── mlperf_log_detail.txt
├── Accuracy
| ├── mlperf_log_summary.txt
| ├── mlperf_log_detail.txt
| ├── mlperf_log_accuracy.json
| └── accuracy.txt
|── Compliance_Test_ID
| ├── Performance
| | └── run_x/#1 run for all scenarios
| | ├── mlperf_log_summary.txt
| | └── mlperf_log_detail.txt
| ├── Accuracy # for TEST01 only
| | ├── baseline_accuracy.txt (if test fails in deterministic mode)
| | ├── compliance_accuracy.txt (if test fails in deterministic mode)
| | ├── mlperf_log_accuracy.json
| | └── accuracy.txt
| ├── verify_performance.txt
| └── verify_accuracy.txt # for TEST01 only
|── user.conf
└── measurements.json
```
<details>
<summary>Click here if you are submitting in open division</summary>
* The `model_mapping.json` should be included inside the SUT folder which is used to map the custom model full name to the official model name. The format of json file is:
```
{
"custom_model_name_for_model1":"official_model_name_for_model1",
"custom_model_name_for_model2":"official_model_name_for_model2",
}
```
</details>
=== "CM automation based results"
If you have followed the `cm run` commands under the individual model pages in the [benchmarks](../index.md) directory, all the valid results will get aggregated to the `cm cache` folder. The following command could be used to browse the structure of inference results folder generated by CM.
### Get results folder structure
```bash
cm find cache --tags=get,mlperf,inference,results,dir | xargs tree
```
Once all the results across all the models are ready you can use the following the below section to generate a valid submission tree compliant with the [MLPerf requirements](https://github.com/mlcommons/policies/blob/master/submission_rules.adoc#inference-1).
## Generate submission folder
The submission generation flow is explained in the below diagram
```mermaid
flowchart LR
subgraph Generation [Submission Generation SUT1]
direction TB
A[populate system details] --> B[generate submission structure]
B --> C[truncate-accuracy-logs]
C --> D{Infer low talency results <br>and/or<br> filter out invalid results}
D --> yes --> E[preprocess-mlperf-inference-submission]
D --> no --> F[run-mlperf-inference-submission-checker]
E --> F
end
Input((Results SUT1)) --> Generation
Generation --> Output((Submission Folder <br> SUT1))
```
### Command to generate submission folder
```bash
cm run script --tags=generate,inference,submission \
--clean \
--preprocess_submission=yes \
--run-checker=yes \
--submitter=MLCommons \
--division=closed \
--env.CM_DETERMINE_MEMORY_CONFIGURATION=yes \
--quiet
```
!!! tip
* Use `--hw_name="My system name"` to give a meaningful system name. Examples can be seen [here](https://github.com/mlcommons/inference_results_v3.0/tree/main/open/cTuning/systems)
* Use `--submitter=<Your name>` if your organization is an official MLCommons member and would like to submit under your organization
* Use `--hw_notes_extra` option to add additional notes like `--hw_notes_extra="Result taken by NAME" `
* Use `--results_dir` option to specify the results folder. It is automatically taken from CM cache for MLPerf automation based runs
* Use `--submission_dir` option to specify the submission folder. (You can avoid this if you're pushing to github or only running a single SUT and CM will use its cache folder)
* Use `--division=open` for open division submission
* Use `--category` option to specify the category for which submission is generated(datacenter/edge). By default, the category is taken from `system_meta.json` file located in the SUT root directory.
* Use `--submission_base_dir` to specify the directory to which the outputs from preprocess submission script and final submission is added. No need to provide `--submission_dir` along with this. For `docker run`, use `--submission_base_dir` instead of `--submission_dir`.
If there are multiple systems where MLPerf results are collected, the same process needs to be repeated on each of them. One we have submission folders on all the SUTs, we need to sync them to make a single submission folder
=== "Sync Locally"
If you are having results in multiple systems, you need to merge them to one system. You can use `rsync` for this. For example, the below command will sync the submission folder from SUT2 to the one in SUT1.
```
rsync -avz username@host1:<path_to_submission_folder2>/ <path_to_submission_folder1>/
```
Same needs to be repeated for all other SUTs so that we have the full submissions in SUT1.
```mermaid
flowchart LR
subgraph SUT1 [Submission Generation SUT1]
A[Submission Folder SUT1]
end
subgraph SUT2 [Submission Generation SUT2]
B[Submission Folder SUT2]
end
subgraph SUT3 [Submission Generation SUT3]
C[Submission Folder SUT3]
end
subgraph SUTN [Submission Generation SUTN]
D[Submission Folder SUTN]
end
SUT2 --> SUT1
SUT3 --> SUT1
SUTN --> SUT1
```
=== "Sync via a Github repo"
If you are collecting results across multiple systems you can generate different submissions and aggregate all of them to a GitHub repository (can be private) and use it to generate a single tar ball which can be uploaded to the [MLCommons Submission UI](https://submissions-ui.mlcommons.org/submission).
Run the following command after **replacing `--repo_url` with your GitHub repository URL**.
```bash
cm run script --tags=push,github,mlperf,inference,submission \
--repo_url=https://github.com/mlcommons/mlperf_inference_submissions_v5.0 \
--commit_message="Results on <HW name> added by <Name>" \
--quiet
```
```mermaid
flowchart LR
subgraph SUT1 [Submission Generation SUT1]
A[Submission Folder SUT1]
end
subgraph SUT2 [Submission Generation SUT2]
B[Submission Folder SUT2]
end
subgraph SUT3 [Submission Generation SUT3]
C[Submission Folder SUT3]
end
subgraph SUTN [Submission Generation SUTN]
D[Submission Folder SUTN]
end
SUT2 -- git sync and push --> G[Github Repo]
SUT3 -- git sync and push --> G[Github Repo]
SUTN -- git sync and push --> G[Github Repo]
SUT1 -- git sync and push --> G[Github Repo]
```
## Upload the final submission
!!! warning
If you are using GitHub for consolidating your results, make sure that you have run the [`push-to-github` command](#__tabbed_2_2) on the same system to ensure results are synced as is on the GitHub repository.
Once you have all the results on the system, you can upload them to the MLCommons submission server as follows:
=== "via CLI"
You can do the following command which will run the submission checker and upload the results to the MLCommons submission server
```
cm run script --tags=run,submission,checker \
--submitter_id=<> \
--submission_dir=<Path to the submission folder>
```
=== "via Browser"
You can do the following command to generate the final submission tar file and then upload to the [MLCommons Submission UI](https://submissions-ui.mlcommons.org/submission).
```
cm run script --tags=run,submission,checker \
--submission_dir=<Path to the submission folder> \
--tar=yes \
--submission_tar_file=mysubmission.tar.gz
```
```mermaid
flowchart LR
subgraph SUT [Combined Submissions]
A[Combined Submission Folder in SUT1]
end
SUT --> B[Run submission checker]
B --> C[Upload to MLC Submission server]
C --> D[Receive validation email]
```
<!--Click [here](https://youtu.be/eI1Hoecc3ho) to view the recording of the workshop: Streamlining your MLPerf Inference results using CM.-->
# All memory requirements in GB
resnet:
reference:
fp32:
system_memory: 8
accelerator_memory: 4
disk_storage: 25
nvidia:
int8:
system_memory: 8
accelerator_memory: 4
disk_storage: 100
intel:
int8:
system_memory: 8
accelerator_memory: 0
disk_storage: 50
qualcomm:
int8:
system_memory: 8
accelerator_memory: 8
disk_storage: 50
retinanet:
reference:
fp32:
system_memory: 8
accelerator_memory: 8
disk_storage: 200
nvidia:
int8:
system_memory: 8
accelerator_memory: 8
disk_storage: 200
intel:
int8:
system_memory: 8
accelerator_memory: 0
disk_storage: 200
qualcomm:
int8:
system_memory: 8
accelerator_memory: 8
disk_storage: 200
rgat:
reference:
fp32:
system_memory: 768
accelerator_memory: 8
disk_storage: 2300
---
hide:
- toc
---
# Using CM for MLPerf Inference
# MLPerf™ Inference Benchmark for Graph Neural Network
This is the reference implementation for MLPerf Inference Graph Neural Network. The reference implementation currently uses Deep Graph Library (DGL), and pytorch as the backbone of the model.
**Hardware requirements:** The minimun requirements to run this benchmark are ~600GB of RAM and ~2.3TB of disk. This requires to create a memory map for the graph features and not load them to memory all at once.
## Supported Models
| model | accuracy | dataset | model source | precision | notes |
| ---- | ---- | ---- | ---- | ---- | ---- |
| RGAT | 0.7286 | IGBH | [Illiois Graph Benchmark](https://github.com/IllinoisGraphBenchmark/IGB-Datasets/) | fp32 | - |
## Dataset
| Data | Description | Task |
| ---- | ---- | ---- |
| IGBH | Illinois Graph Benchmark Heterogeneous is a graph dataset consisting of one heterogeneous graph with 547,306,935 nodes and 5,812,005,639 edges. Node types: Author, Conference, FoS, Institute, Journal, Paper. A subset of 1% of the paper nodes are randomly choosen as the validation dataset using the [split seeds script](tools/split_seeds.py). The validation dataset will be used as the input queries for the SUT, however the whole dataset is needed to run the benchmarks, since all the graph connections are needed to achieve the quality target. | Node Classification |
| IGBH (calibration) | We sampled 5000 nodes from the training paper nodes of the IGBH for the calibration dataset. We provide the [Node ids](../../calibration/IGBH/calibration.txt) and the [script](tools/split_seeds.py) to generate them (using the `--calibration` flag). | Node Classification |
## Automated command to run the benchmark via MLCommons CM
Please see the [new docs site](https://docs.mlcommons.org/inference/benchmarks/graph/rgat/) for an automated way to run this benchmark across different available implementations and do an end-to-end submission with or without docker.
You can also do `pip install cm4mlops` and then use `cm` commands for downloading the model and datasets using the commands given in the later sections.
## Setup
Set the following helper variables
```bash
export ROOT_INFERENCE=$PWD/inference
export GRAPH_FOLDER=$PWD/inference/graph/R-GAT/
export LOADGEN_FOLDER=$PWD/inference/loadgen
export MODEL_PATH=$PWD/inference/graph/R-GAT/model/
```
### Clone the repository
```bash
git clone --recurse-submodules https://github.com/mlcommons/inference.git --depth 1
```
### Install pytorch
**For NVIDIA GPU based runs:**
```bash
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121
```
**For CPU based runs:**
```bash
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cpu
```
### Install requirements (only for running without using docker)
Install requirements:
```bash
cd $GRAPH_FOLDER
pip install -r requirements.txt
```
Install loadgen:
```bash
cd $LOADGEN_FOLDER
CFLAGS="-std=c++14" python setup.py install
```
### Install pytorch geometric
```bash
export TORCH_VERSION=$(python -c "import torch; print(torch.__version__)")
pip install torch-geometric torch-scatter torch-sparse -f https://data.pyg.org/whl/torch-${TORCH_VERSION}.html
```
### Install DGL
**For NVIDIA GPU based runs:**
```bash
pip install dgl -f https://data.dgl.ai/wheels/torch-2.1/cu121/repo.html
```
**For CPU based runs:**
```bash
pip install dgl -f https://data.dgl.ai/wheels/torch-2.1/repo.html
```
### Download model through CM (Collective Minds)
```
cm run script --tags=get,ml-model,rgat --outdirname=<path_to_download>
```
### Download model using Rclone
To run Rclone on Windows, you can download the executable [here](https://rclone.org/install/#windows).
To install Rclone on Linux/macOS/BSD systems, run:
```
sudo -v ; curl https://rclone.org/install.sh | sudo bash
```
Once Rclone is installed, run the following command to authenticate with the bucket:
```
rclone config create mlc-inference s3 provider=Cloudflare access_key_id=f65ba5eef400db161ea49967de89f47b secret_access_key=fbea333914c292b854f14d3fe232bad6c5407bf0ab1bebf78833c2b359bdfd2b endpoint=https://c2686074cb2caf5cbaf6d134bdba8b47.r2.cloudflarestorage.com
```
You can then navigate in the terminal to your desired download directory and run the following commands to download the checkpoints:
**`fp32`**
```
rclone copy mlc-inference:mlcommons-inference-wg-public/R-GAT/RGAT.pt $MODEL_PATH -P
```
### Download and setup dataset
#### Debug Dataset
**CM Command**
```
cm run script --tags=get,dataset,igbh,_debug --outdirname=<path to download>
```
**Download Dataset**
```bash
cd $GRAPH_FOLDER
python3 tools/download_igbh_test.py
```
**Split Seeds**
```bash
cd $GRAPH_FOLDER
python3 tools/split_seeds.py --path igbh --dataset_size tiny
```
#### Full Dataset
**Warning:** This script will download 2.2TB of data
**CM Command**
```
cm run script --tags=get,dataset,igbh,_full --outdirname=<path to download>
```
```bash
cd $GRAPH_FOLDER
./tools/download_igbh_full.sh igbh/
```
**Split Seeds**
```bash
cd $GRAPH_FOLDER
python3 tools/split_seeds.py --path igbh --dataset_size full
```
#### Calibration dataset
The calibration dataset contains 5000 nodes from the training paper nodes of the IGBH dataset. We provide the [Node ids](../../calibration/IGBH/calibration.txt) and the [script](tools/split_seeds.py) to generate them (using the `--calibration` flag).
**CM Command**
```
cm run script --tags=get,dataset,igbh,_full,_calibration --outdirname=<path to download>
```
### Run the benchmark
#### Debug Run
```bash
# Go to the benchmark folder
cd $GRAPH_FOLDER
# Run the benchmark DGL
python3 main.py --dataset igbh-dgl-tiny --dataset-path igbh/ --profile debug-dgl [--model-path <path_to_ckpt>] [--device <cpu or gpu>] [--dtype <fp16 or fp32>] [--scenario <SingleStream, MultiStream, Server or Offline>]
```
#### Local run
```bash
# Go to the benchmark folder
cd $GRAPH_FOLDER
# Run the benchmark DGL
python3 main.py --dataset igbh-dgl --dataset-path igbh/ --profile rgat-dgl-full [--model-path <path_to_ckpt>] [--device <cpu or gpu>] [--dtype <fp16 or fp32>] [--scenario <SingleStream, MultiStream, Server or Offline>]
```
### Evaluate the accuracy
```bash
cm run script --tags=process,mlperf,accuracy,_igbh --result_dir=<Path to directory where files are generated after the benchmark run>
```
Please click [here](https://github.com/mlcommons/inference/blob/dev/graph/R-GAT/tools/accuracy_igbh.py) to view the Python script for evaluating accuracy for the IGBH dataset.
#### Run using docker
Not implemented yet
#### Accuracy run
Add the `--accuracy` to the command to run the benchmark
```bash
python3 main.py --dataset igbh --dataset-path igbh/ --accuracy --model-path model/ [--model-path <path_to_ckpt>] [--device <cpu or gpu>] [--dtype <fp16 or fp32>] [--scenario <SingleStream, MultiStream, Server or Offline>] [--layout <COO, CSC or CSR>]
```
**NOTE:** For official submissions you should submit the results of the accuracy run in a file called `accuracy.txt` with the following format:
```
accuracy=<accuracy>%, good=<number_of_good_samples>, total=<number_of_total_samples>
hash=<hash>
```
### Docker run
**CPU:**
Build docker image
```bash
docker build . -f dockerfile.cpu -t rgat-cpu
```
Run docker container:
```bash
docker run --rm -it -v $(pwd):/root rgat-cpu
```
Run benchmark inside the docker container:
```bash
python3 main.py --dataset igbh-dgl --dataset-path igbh/ --profile rgat-dgl-full --device cpu [--model-path <path_to_ckpt>] [--dtype <fp16 or fp32>] [--scenario <SingleStream, MultiStream, Server or Offline>]
```
**GPU:**
Build docker image
```bash
docker build . -f dockerfile.gpu -t rgat-gpu
```
Run docker container:
```bash
docker run --rm -it -v $(pwd):/workspace/root --gpus all rgat-gpu
```
Go inside the root folder and run benchmark inside the docker container:
```bash
cd root
python3 main.py --dataset igbh-dgl --dataset-path igbh/ --profile rgat-dgl-full --device gpu [--model-path <path_to_ckpt>] [--dtype <fp16 or fp32>] [--scenario <SingleStream, MultiStream, Server or Offline>]
```
**NOTE:** For official submissions, this benchmark is required to run in equal issue mode. Please make sure that the flag `rgat.*.sample_concatenate_permutation` is set to one in the [mlperf.conf](../../loadgen/mlperf.conf) file when loadgen is built.
"""
abstract backend class
"""
class Backend:
def __init__(self):
self.inputs = []
self.outputs = []
def version(self):
raise NotImplementedError("Backend:version")
def name(self):
raise NotImplementedError("Backend:name")
def load(self, model_path, inputs=None, outputs=None):
raise NotImplementedError("Backend:load")
def predict(self, feed):
raise NotImplementedError("Backend:predict")
from typing import Optional, List, Union, Any
from dgl_utilities.feature_fetching import IGBHeteroGraphStructure, Features, IGBH
from dgl_utilities.components import build_graph, get_loader, RGAT
from dgl_utilities.pyg_sampler import PyGSampler
import os
import torch
import logging
import backend
from typing import Literal
logging.basicConfig(level=logging.INFO)
log = logging.getLogger("backend-dgl")
class BackendDGL(backend.Backend):
def __init__(
self,
model_type="rgat",
type: Literal["fp16", "fp32"] = "fp16",
device: Literal["cpu", "gpu"] = "gpu",
ckpt_path: str = None,
igbh: IGBH = None,
batch_size: int = 1,
layout: Literal["CSC", "CSR", "COO"] = "COO",
edge_dir: str = "in",
):
super(BackendDGL, self).__init__()
# Set device and type
if device == "gpu":
self.device = torch.device("cuda")
else:
self.device = torch.device("cpu")
if type == "fp32":
self.type = torch.float32
else:
self.type = torch.float16
# Create Node and neighbor loader
self.fan_out = [5, 10, 15]
self.igbh_graph_structure = igbh.igbh_dataset
self.feature_store = Features(
self.igbh_graph_structure.dir,
self.igbh_graph_structure.dataset_size,
self.igbh_graph_structure.in_memory,
use_fp16=self.igbh_graph_structure.use_fp16,
)
self.feature_store.build_features(use_journal_conference=True)
self.graph = build_graph(
self.igbh_graph_structure,
"dgl",
features=self.feature_store)
self.neighbor_loader = PyGSampler([5, 10, 15])
# Load model Architechture
self.model = RGAT(
backend="dgl",
device=device,
graph=self.graph,
in_feats=1024,
h_feats=512,
num_classes=2983,
num_layers=len(self.fan_out),
n_heads=4
).to(self.type).to(self.device)
self.model.eval()
# Load model checkpoint
ckpt = None
if ckpt_path is not None:
try:
ckpt = torch.load(ckpt_path, map_location=self.device)
except FileNotFoundError as e:
print(f"Checkpoint file not found: {e}")
return -1
if ckpt is not None:
self.model.load_state_dict(ckpt["model_state_dict"])
def version(self):
return torch.__version__
def name(self):
return "pytorch-SUT"
def image_format(self):
return "NCHW"
def load(self):
return self
def predict(self, inputs: torch.Tensor):
with torch.no_grad():
input_size = inputs.shape[0]
# Get batch
batch = self.neighbor_loader.sample(self.graph, {"paper": inputs})
batch_preds, batch_labels = self.model(
batch, self.device, self.feature_store)
return batch_preds
#### **1. Applicable Categories**
- Datacenter
---
#### **2. Applicable Scenarios for Each Category**
- Offline
---
#### **3. Applicable Compliance Tests**
- TEST01
---
#### **4. Latency Threshold for Server Scenarios**
- Not applicable
---
#### **5. Validation Dataset: Unique Samples**
Number of **unique samples** in the validation dataset and the QSL size specified in
- [X] [inference policies benchmark section](https://github.com/mlcommons/inference_policies/blob/master/inference_rules.adoc#41-benchmarks)
- [X] [mlperf.conf](https://github.com/mlcommons/inference/blob/master/loadgen/mlperf.conf)
- [X] [Inference benchmark docs](https://github.com/mlcommons/inference/blob/docs/docs/index.md)
*(Ensure QSL size overflows the system cache if possible.)*
---
#### **6. Equal Issue Mode Applicability**
Documented whether **Equal Issue Mode** is applicable in
- [X] [mlperf.conf](https://github.com/mlcommons/inference/blob/master/loadgen/mlperf.conf#L42)
- [X] [Inference benchmark docs](https://github.com/mlcommons/inference/blob/docs/docs/index.md)
*(Relevant if sample processing times are inconsistent across inputs.)*
---
#### **7. Expected Accuracy and `accuracy.txt` Contents**
- [X] Expected accuracy updated in the [inference policies](https://github.com/mlcommons/inference_policies/blob/master/inference_rules.adoc#41-benchmarks)
- [X] `accuracy.txt` file generated by the reference accuracy script from the MLPerf accuracy log and is validated by the submission checker.
---
#### **8. Reference Model Details**
- [X] Reference model details updated in [Inference benchmark docs](https://github.com/mlcommons/inference/blob/docs/docs/index.md)
---
#### **9. Reference Implementation Dataset Coverage**
- [X] Reference implementation successfully processes the entire validation dataset during:
- [X] Performance runs
- [X] Accuracy runs
- [X] Compliance runs
- [X] Valid log files passing the submission checker are generated for all runs - [link](https://github.com/mlcommons/mlperf_inference_unofficial_submissions_v5.0/tree/main/closed/MLCommons/results/mlc-server-reference-gpu-pytorch_v2.4.0-cu124/rgat/offline/performance/run_1).
---
#### **10. Test Runs with Smaller Input Sets**
- [X] Verified the reference implementation can perform test runs with a smaller subset of inputs for:
- [X] Performance runs
- [X] Accuracy runs
---
#### **11. Dataset and Reference Model Instructions**
- [X] Clear instructions provided for:
- [X] Downloading the dataset and reference model.
- [X] Using the dataset and model for the benchmark.
---
#### **12. Documentation of Recommended System Requirements to run the reference implementation**
- [X] Added [here](https://github.com/mlcommons/inference/blob/docs/docs/system_requirements.yml#L44)
---
#### **13. Submission Checker Modifications**
- [X] All necessary changes made to the **submission checker** to validate the benchmark.
---
#### **14. Sample Log Files**
- [X] Include sample logs for all the applicable scenario runs:
- [X] Offline
- [X] [`mlperf_log_summary.txt`](https://github.com/mlcommons/mlperf_inference_unofficial_submissions_v5.0/blob/main/closed/MLCommons/results/mlc-server-reference-gpu-pytorch_v2.4.0-cu124/rgat/offline/performance/run_1/mlperf_log_summary.txt)
- [X] [`mlperf_log_detail.txt`](https://github.com/mlcommons/mlperf_inference_unofficial_submissions_v5.0/blob/main/closed/MLCommons/results/mlc-server-reference-gpu-pytorch_v2.4.0-cu124/rgat/offline/performance/run_1/mlperf_log_detail.txt)
- [X] Ensure sample logs successfully pass the submission checker and applicable compliance runs. [Link](https://htmlpreview.github.io/?https://github.com/mlcommons/mlperf_inference_unofficial_submissions_v5.0/blob/refs/heads/auto-update/closed/MLCommons/results/mlc-server-reference-gpu-pytorch_v2.4.0-cu124/summary.html)
"""
dataset related classes and methods
"""
# pylint: disable=unused-argument,missing-docstring
import logging
logging.basicConfig(level=logging.INFO)
log = logging.getLogger("dataset")
class Dataset:
def __init__(self):
pass
def preprocess(self, use_cache=True):
raise NotImplementedError("Dataset:preprocess")
def get_item_count(self):
return NotImplementedError("Dataset:get_item_count")
def get_list(self):
raise NotImplementedError("Dataset:get_list")
def load_query_samples(self, sample_list):
pass
def unload_query_samples(self, sample_list):
pass
def get_samples(self, id_list):
pass
def get_item(self, id):
raise NotImplementedError("Dataset:get_item")
def preprocess(id):
return id
import torch
import torch.nn as nn
import torch.nn.functional as F
import math
from dgl_utilities.pyg_sampler import PyGSampler
DGL_AVAILABLE = True
try:
import dgl
except ModuleNotFoundError:
DGL_AVAILABLE = False
dgl = None
def check_dgl_available():
assert DGL_AVAILABLE, "DGL Not available in the container"
def build_graph(graph_structure, backend, features=None):
assert graph_structure.separate_sampling_aggregation or (features is not None), \
"Either we need a feature to build the graph, or \
we should specify to separate sampling from aggregation"
if backend.lower() == "dgl":
check_dgl_available()
graph = dgl.heterograph(graph_structure.edge_dict)
graph.predict = "paper"
if features is not None:
for node, node_feature in features.feature.items():
if graph.num_nodes(ntype=node) < node_feature.shape[0]:
graph.add_nodes(
node_feature.shape[0] -
graph.num_nodes(
ntype=node),
ntype=node)
else:
assert graph.num_nodes(ntype=node) == node_feature.shape[0], f"\
Graph has more {node} nodes ({graph.num_nodes(ntype=node)}) \
than feature shape ({node_feature.shape[0]})"
if not graph_structure.separate_sampling_aggregation:
for node, node_feature in features.feature.items():
graph.nodes[node].data['feat'] = node_feature
setattr(
graph,
f"num_{node}_nodes",
node_feature.shape[0])
graph = dgl.remove_self_loop(graph, etype="cites")
graph = dgl.add_self_loop(graph, etype="cites")
graph.nodes['paper'].data['label'] = graph_structure.label
return graph
else:
assert False, "Unrecognized backend " + backend
def get_sampler(use_pyg_sampler=False):
if use_pyg_sampler:
return PyGSampler
else:
return dgl.dataloading.MultiLayerNeighborSampler
def get_loader(graph, index, fanouts, backend, use_pyg_sampler=True, **kwargs):
if backend.lower() == "dgl":
check_dgl_available()
fanouts = [int(fanout) for fanout in fanouts.split(",")]
return dgl.dataloading.DataLoader(
graph, {"paper": index},
get_sampler(use_pyg_sampler=use_pyg_sampler)(fanouts),
**kwargs
)
else:
assert False, "Unrecognized backend " + backend
def glorot(value):
if isinstance(value, torch.Tensor):
stdv = math.sqrt(6.0 / (value.size(-2) + value.size(-1)))
value.data.uniform_(-stdv, stdv)
else:
for v in value.parameters() if hasattr(value, 'parameters') else []:
glorot(v)
for v in value.buffers() if hasattr(value, 'buffers') else []:
glorot(v)
class GATPatched(dgl.nn.pytorch.GATConv):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def reset_parameters(self):
if hasattr(self, 'fc'):
glorot(self.fc.weight)
else:
glorot(self.fc_src.weight)
glorot(self.fc_dst.weight)
glorot(self.attn_l)
glorot(self.attn_r)
if self.bias is not None:
nn.init.constant_(self.bias, 0)
if isinstance(self.res_fc, nn.Linear):
glorot(self.res_fc.weight)
class RGAT_DGL(nn.Module):
def __init__(
self,
etypes,
in_feats, h_feats, num_classes,
num_layers=2, n_heads=4, dropout=0.2,
with_trim=None):
super().__init__()
self.layers = nn.ModuleList()
# does not support other models since they are not used
self.layers.append(dgl.nn.pytorch.HeteroGraphConv({
etype: GATPatched(in_feats, h_feats // n_heads, n_heads)
for etype in etypes}))
for _ in range(num_layers - 2):
self.layers.append(dgl.nn.pytorch.HeteroGraphConv({
etype: GATPatched(h_feats, h_feats // n_heads, n_heads)
for etype in etypes}))
self.layers.append(dgl.nn.pytorch.HeteroGraphConv({
etype: GATPatched(h_feats, h_feats // n_heads, n_heads)
for etype in etypes}))
self.dropout = nn.Dropout(dropout)
self.linear = nn.Linear(h_feats, num_classes)
def forward(self, blocks, x):
h = x
for l, (layer, block) in enumerate(zip(self.layers, blocks)):
h = layer(block, h)
h = dgl.apply_each(
h, lambda x: x.view(
x.shape[0], x.shape[1] * x.shape[2]))
if l != len(self.layers) - 1:
h = dgl.apply_each(h, F.leaky_relu)
h = dgl.apply_each(h, self.dropout)
return self.linear(h['paper'])
def extract_graph_structure(self, batch, device):
# moves all blocks to device
return [block.to(device) for block in batch[-1]]
def extract_inputs_and_outputs(self, sampled_subgraph, device, features):
# input to the batch argument would be a list of blocks
# the sampled sbgraph is already moved to device in
# extract_graph_structure
# in case if the input feature is not stored on the graph,
# but rather in shared memory: (separate_sampling_aggregation)
# we use this method to extract them based on the blocks
if features is None or features.feature == {}:
batch_inputs = {
key: value.to(torch.float32)
for key, value in sampled_subgraph[0].srcdata['feat'].items()
}
else:
batch_inputs = features.get_input_features(
sampled_subgraph[0].srcdata[dgl.NID],
device
)
batch_labels = sampled_subgraph[-1].dstdata['label']['paper']
return batch_inputs, batch_labels
class RGAT(torch.nn.Module):
def __init__(self, backend, device, graph, **model_kwargs):
super().__init__()
self.backend = backend.lower()
if backend.lower() == "dgl":
check_dgl_available()
etypes = graph.etypes
self.model = RGAT_DGL(etypes=etypes, **model_kwargs)
else:
assert False, "Unrecognized backend " + backend
self.device = device
self.layers = self.model.layers
def forward(self, batch, device, features):
# a general method to get the batches and move them to the
# corresponding device
batch = self.model.extract_graph_structure(batch, device)
# a general method to fetch the features given the sampled blocks
# and move them to corresponding device
batch_inputs, batch_labels = self.model.extract_inputs_and_outputs(
sampled_subgraph=batch,
device=device,
features=features,
)
return self.model.forward(batch, batch_inputs), batch_labels
import torch
import os
import concurrent.futures
import os.path as osp
import numpy as np
from typing import Literal
def float2half(base_path, dataset_size):
paper_nodes_num = {
"tiny": 100000,
"small": 1000000,
"medium": 10000000,
"large": 100000000,
"full": 269346174,
}
author_nodes_num = {
"tiny": 357041,
"small": 1926066,
"medium": 15544654,
"large": 116959896,
"full": 277220883,
}
# paper node
paper_feat_path = os.path.join(base_path, "paper", "node_feat.npy")
paper_fp16_feat_path = os.path.join(
base_path, "paper", "node_feat_fp16.pt")
if not os.path.exists(paper_fp16_feat_path):
if dataset_size in ["large", "full"]:
num_paper_nodes = paper_nodes_num[dataset_size]
paper_node_features = torch.from_numpy(
np.memmap(
paper_feat_path,
dtype="float32",
mode="r",
shape=(num_paper_nodes, 1024),
)
)
else:
paper_node_features = torch.from_numpy(
np.load(paper_feat_path, mmap_mode="r")
)
paper_node_features = paper_node_features.half()
torch.save(paper_node_features, paper_fp16_feat_path)
# author node
author_feat_path = os.path.join(base_path, "author", "node_feat.npy")
author_fp16_feat_path = os.path.join(
base_path, "author", "node_feat_fp16.pt")
if not os.path.exists(author_fp16_feat_path):
if dataset_size in ["large", "full"]:
num_author_nodes = author_nodes_num[dataset_size]
author_node_features = torch.from_numpy(
np.memmap(
author_feat_path,
dtype="float32",
mode="r",
shape=(num_author_nodes, 1024),
)
)
else:
author_node_features = torch.from_numpy(
np.load(author_feat_path, mmap_mode="r")
)
author_node_features = author_node_features.half()
torch.save(author_node_features, author_fp16_feat_path)
# institute node
institute_feat_path = os.path.join(base_path, "institute", "node_feat.npy")
institute_fp16_feat_path = os.path.join(
base_path, "institute", "node_feat_fp16.pt")
if not os.path.exists(institute_fp16_feat_path):
institute_node_features = torch.from_numpy(
np.load(institute_feat_path, mmap_mode="r")
)
institute_node_features = institute_node_features.half()
torch.save(institute_node_features, institute_fp16_feat_path)
# fos node
fos_feat_path = os.path.join(base_path, "fos", "node_feat.npy")
fos_fp16_feat_path = os.path.join(base_path, "fos", "node_feat_fp16.pt")
if not os.path.exists(fos_fp16_feat_path):
fos_node_features = torch.from_numpy(
np.load(fos_feat_path, mmap_mode="r"))
fos_node_features = fos_node_features.half()
torch.save(fos_node_features, fos_fp16_feat_path)
# conference node
conference_feat_path = os.path.join(
base_path, "conference", "node_feat.npy")
conference_fp16_feat_path = os.path.join(
base_path, "conference", "node_feat_fp16.pt"
)
if not os.path.exists(conference_fp16_feat_path):
conference_node_features = torch.from_numpy(
np.load(conference_feat_path, mmap_mode="r")
)
conference_node_features = conference_node_features.half()
torch.save(conference_node_features, conference_fp16_feat_path)
# journal node
journal_feat_path = os.path.join(base_path, "journal", "node_feat.npy")
journal_fp16_feat_path = os.path.join(
base_path, "journal", "node_feat_fp16.pt")
if not os.path.exists(journal_fp16_feat_path):
journal_node_features = torch.from_numpy(
np.load(journal_feat_path, mmap_mode="r")
)
journal_node_features = journal_node_features.half()
torch.save(journal_node_features, journal_fp16_feat_path)
class IGBH:
def __init__(
self,
data_path,
name="igbh",
dataset_size="full",
use_label_2K=True,
in_memory=False,
layout: Literal["CSC", "CSR", "COO"] = "COO",
type: Literal["fp16", "fp32"] = "fp16",
device="cpu",
edge_dir="in",
**kwargs,
):
super().__init__()
self.data_path = data_path
self.name = name
self.size = dataset_size
self.igbh_dataset = IGBHeteroGraphStructure(
data_path,
dataset_size=dataset_size,
in_memory=in_memory,
use_label_2K=use_label_2K,
layout=layout,
use_fp16=(type == "fp16")
)
self.num_samples = len(self.igbh_dataset.val_idx)
def get_samples(self, id_list):
return self.igbh_dataset.val_idx[id_list]
def get_labels(self, id_list):
return self.igbh_dataset.label[self.get_samples(id_list)]
def get_item_count(self):
return len(self.igbh_dataset.val_idx)
def load_query_samples(self, id):
pass
def unload_query_samples(self, sample_list):
pass
class IGBHeteroGraphStructure:
"""
Synchronously (optionally parallelly) loads the edge relations for IGBH.
Current IGBH edge relations are not yet converted to torch tensor.
"""
def __init__(
self,
data_path,
dataset_size="full",
use_label_2K=True,
in_memory=False,
use_fp16=True,
# in-memory and memory-related optimizations
separate_sampling_aggregation=False,
# perf related
multithreading=True,
**kwargs,
):
self.dir = data_path
self.dataset_size = dataset_size
self.use_fp16 = use_fp16
self.in_memory = in_memory
self.use_label_2K = use_label_2K
self.num_classes = 2983 if not self.use_label_2K else 19
self.label_file = "node_label_19.npy" if not self.use_label_2K else "node_label_2K.npy"
self.num_nodes = {
"full": {'paper': 269346174, 'author': 277220883, 'institute': 26918, 'fos': 712960, 'journal': 49052, 'conference': 4547},
"small": {'paper': 1000000, 'author': 1926066, 'institute': 14751, 'fos': 190449, 'journal': 15277, 'conference': 1215},
"medium": {'paper': 10000000, 'author': 15544654, 'institute': 23256, 'fos': 415054, 'journal': 37565, 'conference': 4189},
"large": {'paper': 100000000, 'author': 116959896, 'institute': 26524, 'fos': 649707, 'journal': 48820, 'conference': 4490},
"tiny": {'paper': 100000, 'author': 357041, 'institute': 8738, 'fos': 84220, 'journal': 8101, 'conference': 398}
}[self.dataset_size]
self.use_journal_conference = True
self.separate_sampling_aggregation = separate_sampling_aggregation
self.torch_tensor_input_dir = data_path
self.torch_tensor_input = self.torch_tensor_input_dir != ""
self.multithreading = multithreading
# This class only stores the edge data, labels, and the train/val
# indices
self.edge_dict = self.load_edge_dict()
self.label = self.load_labels()
self.full_num_trainable_nodes = (
227130858 if self.num_classes != 2983 else 157675969)
self.train_idx, self.val_idx = self.get_train_val_test_indices()
if self.use_fp16:
float2half(
os.path.join(
self.dir,
self.dataset_size,
"processed"),
self.dataset_size)
def load_edge_dict(self):
mmap_mode = None if self.in_memory else "r"
edges = [
"paper__cites__paper",
"paper__written_by__author",
"author__affiliated_to__institute",
"paper__topic__fos"]
if self.use_journal_conference:
edges += ["paper__published__journal", "paper__venue__conference"]
loaded_edges = None
def load_edge(edge, mmap=mmap_mode, parent_path=osp.join(
self.dir, self.dataset_size, "processed")):
return edge, torch.from_numpy(
np.load(osp.join(parent_path, edge, "edge_index.npy"), mmap_mode=mmap))
if self.multithreading:
with concurrent.futures.ThreadPoolExecutor() as executor:
loaded_edges = executor.map(load_edge, edges)
loaded_edges = {
tuple(edge.split("__")): (edge_index[:, 0], edge_index[:, 1]) for edge, edge_index in loaded_edges
}
else:
loaded_edges = {
tuple(edge.split("__")): (edge_index[:, 0], edge_index[:, 1])
for edge, edge_index in map(load_edge, edges)
}
return self.augment_edges(loaded_edges)
def load_labels(self):
if self.dataset_size not in ['full', 'large']:
return torch.from_numpy(
np.load(
osp.join(
self.dir,
self.dataset_size,
'processed',
'paper',
self.label_file)
)
).to(torch.long)
else:
return torch.from_numpy(
np.memmap(
osp.join(
self.dir,
self.dataset_size,
'processed',
'paper',
self.label_file
),
dtype='float32',
mode='r',
shape=(
(269346174 if self.dataset_size == "full" else 100000000)
)
)
).to(torch.long)
def augment_edges(self, edge_dict):
# Adds reverse edge connections to the graph
# add rev_{edge} to every edge except paper-cites-paper
edge_dict.update(
{
(dst, f"rev_{edge}", src): (dst_idx, src_idx)
for (src, edge, dst), (src_idx, dst_idx) in edge_dict.items()
if src != dst
}
)
paper_cites_paper = edge_dict[("paper", 'cites', 'paper')]
self_loop = torch.arange(self.num_nodes['paper'])
mask = paper_cites_paper[0] != paper_cites_paper[1]
paper_cites_paper = (
torch.cat((paper_cites_paper[0][mask], self_loop.clone())),
torch.cat((paper_cites_paper[1][mask], self_loop.clone()))
)
edge_dict[("paper", 'cites', 'paper')] = (
torch.cat((paper_cites_paper[0], paper_cites_paper[1])),
torch.cat((paper_cites_paper[1], paper_cites_paper[0]))
)
return edge_dict
def get_train_val_test_indices(self):
base_dir = osp.join(self.dir, self.dataset_size, "processed")
assert osp.exists(osp.join(base_dir, "train_idx.pt")) and osp.exists(osp.join(base_dir, "val_idx.pt")), \
"Train and validation indices not found. Please run GLT's split_seeds.py first."
return (
torch.load(
osp.join(
self.dir,
self.dataset_size,
"processed",
"train_idx.pt")),
torch.load(
osp.join(
self.dir,
self.dataset_size,
"processed",
"val_idx.pt"))
)
class Features:
"""
Lazily initializes the features for IGBH.
Features will be initialized only when *build_features* is called.
Features will be placed into shared memory when *share_features* is called
or if the features are built (either mmap-ed or loaded in memory)
and *torch.multiprocessing.spawn* is called
"""
def __init__(self, path, dataset_size, in_memory=True, use_fp16=True):
self.path = path
self.dataset_size = dataset_size
self.in_memory = in_memory
self.use_fp16 = use_fp16
if self.use_fp16:
self.dtype = torch.float16
else:
self.dtype = torch.float32
self.feature = {}
def build_features(self, use_journal_conference=False,
multithreading=False):
node_types = ['paper', 'author', 'institute', 'fos']
if use_journal_conference or self.dataset_size in ['large', 'full']:
node_types += ['conference', 'journal']
if multithreading:
def load_feature(feature_store, feature_name):
return feature_store.load(feature_name), feature_name
with concurrent.futures.ThreadPoolExecutor() as executor:
loaded_features = executor.map(
load_feature, [(self, ntype) for ntype in node_types])
self.feature = {
node_type: feature_value for feature_value, node_type in loaded_features
}
else:
for node_type in node_types:
self.feature[node_type] = self.load(node_type)
def share_features(self):
for node_type in self.feature:
self.feature[node_type] = self.feature[node_type].share_memory_()
def load_from_tensor(self, node):
return torch.load(osp.join(self.path, self.dataset_size,
"processed", node, "node_feat_fp16.pt"))
def load_in_memory_numpy(self, node):
return torch.from_numpy(np.load(
osp.join(self.path, self.dataset_size, 'processed', node, 'node_feat.npy')))
def load_mmap_numpy(self, node):
"""
Loads a given numpy array through mmap_mode="r"
"""
return torch.from_numpy(np.load(osp.join(
self.path, self.dataset_size, "processed", node, "node_feat.npy"), mmap_mode="r"))
def memmap_mmap_numpy(self, node):
"""
Loads a given NumPy array through memory-mapping np.memmap.
This is the same code as the one provided in IGB codebase.
"""
shape = [None, 1024]
if self.dataset_size == "full":
if node == "paper":
shape[0] = 269346174
elif node == "author":
shape[0] = 277220883
elif self.dataset_size == "large":
if node == "paper":
shape[0] = 100000000
elif node == "author":
shape[0] = 116959896
assert shape[0] is not None
return torch.from_numpy(np.memmap(osp.join(self.path, self.dataset_size,
"processed", node, "node_feat.npy"), dtype="float32", mode='r', shape=tuple(shape)))
def load(self, node):
if self.in_memory:
if self.use_fp16:
return self.load_from_tensor(node)
else:
if self.dataset_size in [
'large', 'full'] and node in ['paper', 'author']:
return self.memmap_mmap_numpy(node)
else:
return self.load_in_memory_numpy(node)
else:
if self.dataset_size in [
'large', 'full'] and node in ['paper', 'author']:
return self.memmap_mmap_numpy(node)
else:
return self.load_mmap_numpy(node)
def get_input_features(self, input_dict, device):
# fetches the batch inputs
# moving it here so so that future modifications could be easier
return {
key: self.feature[key][value.to(torch.device("cpu")), :].to(
device).to(self.dtype)
for key, value in input_dict.items()
}
import dgl
import torch
class PyGSampler(dgl.dataloading.Sampler):
r"""
An example DGL sampler implementation that matches PyG/GLT sampler behavior.
The following differences need to be addressed:
1. PyG/GLT applies conv_i to edges in layer_i, and all subsequent layers, while DGL only applies conv_i to edges in layer_i.
For instance, consider a path a->b->c. At layer 0,
DGL updates only node b's embedding with a->b, but
PyG/GLT updates both node b and c's embeddings.
Therefore, if we use h_i(x) to denote the hidden representation of node x at layer i, then the output h_2(c) is:
DGL: h_2(c) = conv_2(h_1(c), h_1(b)) = conv_2(h_0(c), conv_1(h_0(b), h_0(a)))
PyG/GLT: h_2(c) = conv_2(h_1(c), h_1(b)) = conv_2(conv_1(h_0(c), h_0(b)), conv_1(h_0(b), h_0(a)))
2. When creating blocks for layer i-1, DGL not only uses the destination nodes from layer i,
but also includes all subsequent i+1 ... n layers' destination nodes as seed nodes.
More discussions and examples can be found here: https://github.com/alibaba/graphlearn-for-pytorch/issues/79.
"""
def __init__(self, fanouts, num_threads=1):
super().__init__()
self.fanouts = fanouts
self.num_threads = num_threads
def sample(self, g, seed_nodes):
if self.num_threads != 1:
old_num_threads = torch.get_num_threads()
torch.set_num_threads(self.num_threads)
output_nodes = seed_nodes
subgs = []
previous_edges = {}
previous_seed_nodes = seed_nodes
input_nodes = seed_nodes
device = None
for key in seed_nodes:
device = seed_nodes[key].device
not_sampled = {
ntype: torch.ones([g.num_nodes(ntype)], dtype=torch.bool, device=device) for ntype in g.ntypes
}
for fanout in reversed(self.fanouts):
for node_type in seed_nodes:
not_sampled[node_type][seed_nodes[node_type]] = 0
# Sample a fixed number of neighbors of the current seed nodes.
sg = g.sample_neighbors(seed_nodes, fanout)
# Before we add the edges, we need to first record the source nodes (of the current seed nodes)
# so that other edges' source nodes will not be included as next
# layer's seed nodes.
temp = dgl.to_block(sg, previous_seed_nodes,
include_dst_in_src=False)
seed_nodes = temp.srcdata[dgl.NID]
# GLT/PyG does not sample again on previously-sampled nodes
# we mimic this behavior here
for node_type in g.ntypes:
seed_nodes[node_type] = seed_nodes[node_type][not_sampled[node_type]
[seed_nodes[node_type]]]
# We add all previously accumulated edges to this subgraph
for etype in previous_edges:
sg.add_edges(*previous_edges[etype], etype=etype)
# This subgraph now contains all its new edges
# and previously accumulated edges
# so we add them
previous_edges = {}
for etype in sg.etypes:
previous_edges[etype] = sg.edges(etype=etype)
# Convert this subgraph to a message flow graph.
# we need to turn on the include_dst_in_src
# so that we get compatibility with DGL's OOTB GATConv.
sg = dgl.to_block(sg, previous_seed_nodes, include_dst_in_src=True)
# for this layers seed nodes -
# they will be our next layers' destination nodes
# so we add them to the collection of previous seed nodes.
previous_seed_nodes = sg.srcdata[dgl.NID]
# we insert the block to our list of blocks
subgs.insert(0, sg)
input_nodes = seed_nodes
if self.num_threads != 1:
torch.set_num_threads(old_num_threads)
return input_nodes, output_nodes, subgs
FROM ubuntu:22.04
ENV PYTHON_VERSION=3.10
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV PATH /opt/anaconda3/bin:$PATH
WORKDIR /root
ENV HOME /root
RUN apt-get update
RUN apt-get install -y --no-install-recommends \
git \
build-essential \
software-properties-common \
ca-certificates \
wget \
curl \
htop \
zip \
unzip
# Install conda
RUN arch=$(uname -m) && \
if [ "$arch" = "x86_64" ]; then \
MINICONDA_URL="https://repo.anaconda.com/miniconda/Miniconda3-py310_24.9.2-0-Linux-x86_64.sh"; \
elif [ "$arch" = "aarch64" ]; then \
MINICONDA_URL="https://repo.anaconda.com/miniconda/Miniconda3-py310_24.9.2-0-Linux-aarch64.sh"; \
else \
echo "Unsupported architecture: $arch"; \
exit 1; \
fi && \
cd /opt && \
wget --quiet $MINICONDA_URL -O miniconda.sh && \
bash ./miniconda.sh -b -p /opt/anaconda3 && \
rm miniconda.sh && \
/opt/anaconda3/bin/conda clean -a && \
ln -s /opt/anaconda3/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
echo ". /opt/anaconda3/etc/profile.d/conda.sh" >> ~/.bashrc && \
echo "conda activate base" >> ~/.bashrc && \
conda config --set always_yes yes --set changeps1 no
# Install requirements
RUN conda install -c conda-forge libstdcxx-ng
RUN pip install --upgrade pip
RUN pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cpu
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
RUN cd /tmp && \
git clone --recursive https://github.com/mlcommons/inference && \
cd inference/loadgen && \
pip install pybind11 && \
CFLAGS="-std=c++14" python3 setup.py install
RUN export TORCH_VERSION=$(python -c "import torch; print(torch.__version__)")
RUN pip install torch-geometric torch-scatter torch-sparse -f https://data.pyg.org/whl/torch-${TORCH_VERSION}.html
RUN pip install dgl -f https://data.dgl.ai/wheels/torch-2.1/repo.html
# Clean up
RUN rm -rf mlperf \
rm requirements.txt
ENTRYPOINT ["/bin/bash"]
ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:23.04-py3
FROM ${FROM_IMAGE_NAME}
SHELL ["/bin/bash", "-c"]
ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8
ENV TZ=US/Pacific
ENV DEBIAN_FRONTEND=noninteractive
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
RUN rm -rf /var/lib/apt/lists/* && rm -rf /etc/apt/sources.list.d/* \
&& apt update \
&& apt install -y --no-install-recommends build-essential autoconf \
libtool git ccache curl wget pkg-config sudo ca-certificates \
automake libssl-dev bc python3-dev python3-pip google-perftools \
gdb libglib2.0-dev clang sshfs libre2-dev libboost-dev \
libnuma-dev numactl sysstat sshpass ntpdate less iputils-ping \
&& apt -y autoremove \
&& apt remove -y cmake \
&& apt install -y --no-install-recommends pkg-config zip g++ zlib1g-dev \
unzip libarchive-dev
RUN apt install -y --no-install-recommends rsync
# Upgrade pip
RUN python3 -m pip install --upgrade pip
RUN pip install torch-geometric torch-scatter torch-sparse -f https://pytorch-geometric.com/whl/torch-2.1.0+cu121.html
RUN pip install dgl -f https://data.dgl.ai/wheels/torch-2.1/cu121/repo.html
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
RUN cd /tmp && \
git clone --recursive https://github.com/mlcommons/inference && \
cd inference/loadgen && \
pip install pybind11 && \
CFLAGS="-std=c++14" python3 setup.py install
# Clean up
RUN rm -rf mlperf \
rm requirements.txt
\ No newline at end of file
"""
implementation of coco dataset
"""
# pylint: disable=unused-argument,missing-docstring
# Parts of this script were taken from:
# https://github.com/mlcommons/training/blob/master/graph_neural_network/dataset.py
# Specifically the float2half function and the IGBH class are
# slightly modified copies.
from typing import Literal
from torch_geometric.utils import add_self_loops, remove_self_loops
import torch
import os
import logging
import argparse
import dataset
import numpy as np
logging.basicConfig(level=logging.INFO)
log = logging.getLogger("coco")
def float2half(base_path, dataset_size):
paper_nodes_num = {
"tiny": 100000,
"small": 1000000,
"medium": 10000000,
"large": 100000000,
"full": 269346174,
}
author_nodes_num = {
"tiny": 357041,
"small": 1926066,
"medium": 15544654,
"large": 116959896,
"full": 277220883,
}
# paper node
paper_feat_path = os.path.join(base_path, "paper", "node_feat.npy")
paper_fp16_feat_path = os.path.join(
base_path, "paper", "node_feat_fp16.pt")
if not os.path.exists(paper_fp16_feat_path):
if dataset_size in ["large", "full"]:
num_paper_nodes = paper_nodes_num[dataset_size]
paper_node_features = torch.from_numpy(
np.memmap(
paper_feat_path,
dtype="float32",
mode="r",
shape=(num_paper_nodes, 1024),
)
)
else:
paper_node_features = torch.from_numpy(
np.load(paper_feat_path, mmap_mode="r")
)
paper_node_features = paper_node_features.half()
torch.save(paper_node_features, paper_fp16_feat_path)
# author node
author_feat_path = os.path.join(base_path, "author", "node_feat.npy")
author_fp16_feat_path = os.path.join(
base_path, "author", "node_feat_fp16.pt")
if not os.path.exists(author_fp16_feat_path):
if dataset_size in ["large", "full"]:
num_author_nodes = author_nodes_num[dataset_size]
author_node_features = torch.from_numpy(
np.memmap(
author_feat_path,
dtype="float32",
mode="r",
shape=(num_author_nodes, 1024),
)
)
else:
author_node_features = torch.from_numpy(
np.load(author_feat_path, mmap_mode="r")
)
author_node_features = author_node_features.half()
torch.save(author_node_features, author_fp16_feat_path)
# institute node
institute_feat_path = os.path.join(base_path, "institute", "node_feat.npy")
institute_fp16_feat_path = os.path.join(
base_path, "institute", "node_feat_fp16.pt")
if not os.path.exists(institute_fp16_feat_path):
institute_node_features = torch.from_numpy(
np.load(institute_feat_path, mmap_mode="r")
)
institute_node_features = institute_node_features.half()
torch.save(institute_node_features, institute_fp16_feat_path)
# fos node
fos_feat_path = os.path.join(base_path, "fos", "node_feat.npy")
fos_fp16_feat_path = os.path.join(base_path, "fos", "node_feat_fp16.pt")
if not os.path.exists(fos_fp16_feat_path):
fos_node_features = torch.from_numpy(
np.load(fos_feat_path, mmap_mode="r"))
fos_node_features = fos_node_features.half()
torch.save(fos_node_features, fos_fp16_feat_path)
# conference node
conference_feat_path = os.path.join(
base_path, "conference", "node_feat.npy")
conference_fp16_feat_path = os.path.join(
base_path, "conference", "node_feat_fp16.pt"
)
if not os.path.exists(conference_fp16_feat_path):
conference_node_features = torch.from_numpy(
np.load(conference_feat_path, mmap_mode="r")
)
conference_node_features = conference_node_features.half()
torch.save(conference_node_features, conference_fp16_feat_path)
# journal node
journal_feat_path = os.path.join(base_path, "journal", "node_feat.npy")
journal_fp16_feat_path = os.path.join(
base_path, "journal", "node_feat_fp16.pt")
if not os.path.exists(journal_fp16_feat_path):
journal_node_features = torch.from_numpy(
np.load(journal_feat_path, mmap_mode="r")
)
journal_node_features = journal_node_features.half()
torch.save(journal_node_features, journal_fp16_feat_path)
class IGBHeteroDataset(object):
def __init__(
self,
path,
dataset_size="tiny",
in_memory=False,
use_label_2K=False,
with_edges=True,
layout: Literal["CSC", "CSR", "COO"] = "COO",
use_fp16=False,
):
self.dir = path
self.dataset_size = dataset_size
self.in_memory = in_memory
self.use_label_2K = use_label_2K
self.with_edges = with_edges
self.layout = layout
self.use_fp16 = use_fp16
self.ntypes = [
"paper",
"author",
"institute",
"fos",
"journal",
"conference"]
self.etypes = None
self.edge_dict = {}
self.feat_dict = {}
self.paper_nodes_num = {
"tiny": 100000,
"small": 1000000,
"medium": 10000000,
"large": 100000000,
"full": 269346174,
}
self.author_nodes_num = {
"tiny": 357041,
"small": 1926066,
"medium": 15544654,
"large": 116959896,
"full": 277220883,
}
# 'paper' nodes.
self.label = None
self.train_idx = None
self.val_idx = None
self.test_idx = None
self.base_path = os.path.join(path, self.dataset_size, "processed")
if self.use_fp16:
float2half(self.base_path, self.dataset_size)
self.process()
def process(self):
# load edges
if self.with_edges:
if self.layout == "COO":
if self.in_memory:
paper_paper_edges = torch.from_numpy(
np.load(
os.path.join(
self.base_path, "paper__cites__paper", "edge_index.npy"
)
)
).t()
author_paper_edges = torch.from_numpy(
np.load(
os.path.join(
self.base_path,
"paper__written_by__author",
"edge_index.npy",
)
)
).t()
affiliation_author_edges = torch.from_numpy(
np.load(
os.path.join(
self.base_path,
"author__affiliated_to__institute",
"edge_index.npy",
)
)
).t()
paper_fos_edges = torch.from_numpy(
np.load(
os.path.join(
self.base_path, "paper__topic__fos", "edge_index.npy"
)
)
).t()
paper_published_journal = torch.from_numpy(
np.load(
os.path.join(
self.base_path,
"paper__published__journal",
"edge_index.npy",
)
)
).t()
paper_venue_conference = torch.from_numpy(
np.load(
os.path.join(
self.base_path,
"paper__venue__conference",
"edge_index.npy",
)
)
).t()
else:
paper_paper_edges = torch.from_numpy(
np.load(
os.path.join(
self.base_path, "paper__cites__paper", "edge_index.npy"
),
mmap_mode="r",
)
).t()
author_paper_edges = torch.from_numpy(
np.load(
os.path.join(
self.base_path,
"paper__written_by__author",
"edge_index.npy",
),
mmap_mode="r",
)
).t()
affiliation_author_edges = torch.from_numpy(
np.load(
os.path.join(
self.base_path,
"author__affiliated_to__institute",
"edge_index.npy",
),
mmap_mode="r",
)
).t()
paper_fos_edges = torch.from_numpy(
np.load(
os.path.join(
self.base_path, "paper__topic__fos", "edge_index.npy"
),
mmap_mode="r",
)
).t()
paper_published_journal = torch.from_numpy(
np.load(
os.path.join(
self.base_path,
"paper__published__journal",
"edge_index.npy",
),
mmap_mode="r",
)
).t()
paper_venue_conference = torch.from_numpy(
np.load(
os.path.join(
self.base_path,
"paper__venue__conference",
"edge_index.npy",
),
mmap_mode="r",
)
).t()
cites_edge = add_self_loops(
remove_self_loops(paper_paper_edges)[0])[0]
self.edge_dict = {
("paper", "cites", "paper"): (
torch.cat([cites_edge[1, :], cites_edge[0, :]]),
torch.cat([cites_edge[0, :], cites_edge[1, :]]),
),
("paper", "written_by", "author"): author_paper_edges,
("author", "affiliated_to", "institute"): affiliation_author_edges,
("paper", "topic", "fos"): paper_fos_edges,
("author", "rev_written_by", "paper"): (
author_paper_edges[1, :],
author_paper_edges[0, :],
),
("institute", "rev_affiliated_to", "author"): (
affiliation_author_edges[1, :],
affiliation_author_edges[0, :],
),
("fos", "rev_topic", "paper"): (
paper_fos_edges[1, :],
paper_fos_edges[0, :],
),
}
self.edge_dict[("paper", "published", "journal")] = (
paper_published_journal
)
self.edge_dict[("paper", "venue", "conference")] = (
paper_venue_conference
)
self.edge_dict[("journal", "rev_published", "paper")] = (
paper_published_journal[1, :],
paper_published_journal[0, :],
)
self.edge_dict[("conference", "rev_venue", "paper")] = (
paper_venue_conference[1, :],
paper_venue_conference[0, :],
)
# directly load from CSC or CSC files, which can be generated using
# compress_graph.py
else:
compress_edge_dict = {}
compress_edge_dict[("paper", "cites", "paper")
] = "paper__cites__paper"
compress_edge_dict[("paper", "written_by", "author")] = (
"paper__written_by__author"
)
compress_edge_dict[("author", "affiliated_to", "institute")] = (
"author__affiliated_to__institute"
)
compress_edge_dict[("paper", "topic", "fos")
] = "paper__topic__fos"
compress_edge_dict[("author", "rev_written_by", "paper")] = (
"author__rev_written_by__paper"
)
compress_edge_dict[("institute", "rev_affiliated_to", "author")] = (
"institute__rev_affiliated_to__author"
)
compress_edge_dict[("fos", "rev_topic", "paper")] = (
"fos__rev_topic__paper"
)
compress_edge_dict[("paper", "published", "journal")] = (
"paper__published__journal"
)
compress_edge_dict[("paper", "venue", "conference")] = (
"paper__venue__conference"
)
compress_edge_dict[("journal", "rev_published", "paper")] = (
"journal__rev_published__paper"
)
compress_edge_dict[("conference", "rev_venue", "paper")] = (
"conference__rev_venue__paper"
)
for etype in compress_edge_dict.keys():
edge_path = os.path.join(
self.base_path, self.layout, compress_edge_dict[etype]
)
try:
edge_path = os.path.join(
self.base_path, self.layout, compress_edge_dict[etype]
)
indptr = torch.load(
os.path.join(edge_path, "indptr.pt"))
indices = torch.load(
os.path.join(edge_path, "indices.pt"))
if self.layout == "CSC":
self.edge_dict[etype] = (indices, indptr)
else:
self.edge_dict[etype] = (indptr, indices)
except FileNotFoundError as e:
print(f"FileNotFound: {e}")
exit()
except Exception as e:
print(f"Exception: {e}")
exit()
self.etypes = list(self.edge_dict.keys())
# load features and labels
label_file = (
"node_label_19.npy" if not self.use_label_2K else "node_label_2K.npy"
)
paper_feat_path = os.path.join(
self.base_path, "paper", "node_feat.npy")
paper_lbl_path = os.path.join(self.base_path, "paper", label_file)
num_paper_nodes = self.paper_nodes_num[self.dataset_size]
if self.in_memory:
if self.use_fp16:
paper_node_features = torch.load(
os.path.join(self.base_path, "paper", "node_feat_fp16.pt")
)
else:
paper_node_features = torch.from_numpy(
np.load(paper_feat_path))
else:
if self.dataset_size in ["large", "full"]:
paper_node_features = torch.from_numpy(
np.memmap(
paper_feat_path,
dtype="float32",
mode="r",
shape=(num_paper_nodes, 1024),
)
)
else:
paper_node_features = torch.from_numpy(
np.load(paper_feat_path, mmap_mode="r")
)
if self.dataset_size in ["large", "full"]:
paper_node_labels = torch.from_numpy(
np.memmap(
paper_lbl_path, dtype="float32", mode="r", shape=(num_paper_nodes)
)
).to(torch.long)
else:
paper_node_labels = torch.from_numpy(
np.load(paper_lbl_path)).to(
torch.long)
self.feat_dict["paper"] = paper_node_features
self.label = paper_node_labels
num_author_nodes = self.author_nodes_num[self.dataset_size]
author_feat_path = os.path.join(
self.base_path, "author", "node_feat.npy")
if self.in_memory:
if self.use_fp16:
author_node_features = torch.load(
os.path.join(self.base_path, "author", "node_feat_fp16.pt")
)
else:
author_node_features = torch.from_numpy(
np.load(author_feat_path))
else:
if self.dataset_size in ["large", "full"]:
author_node_features = torch.from_numpy(
np.memmap(
author_feat_path,
dtype="float32",
mode="r",
shape=(num_author_nodes, 1024),
)
)
else:
author_node_features = torch.from_numpy(
np.load(author_feat_path, mmap_mode="r")
)
self.feat_dict["author"] = author_node_features
if self.in_memory:
if self.use_fp16:
institute_node_features = torch.load(
os.path.join(
self.base_path,
"institute",
"node_feat_fp16.pt")
)
else:
institute_node_features = torch.from_numpy(
np.load(
os.path.join(
self.base_path,
"institute",
"node_feat.npy"))
)
else:
institute_node_features = torch.from_numpy(
np.load(
os.path.join(self.base_path, "institute", "node_feat.npy"),
mmap_mode="r",
)
)
self.feat_dict["institute"] = institute_node_features
if self.in_memory:
if self.use_fp16:
fos_node_features = torch.load(
os.path.join(self.base_path, "fos", "node_feat_fp16.pt")
)
else:
fos_node_features = torch.from_numpy(
np.load(
os.path.join(
self.base_path,
"fos",
"node_feat.npy"))
)
else:
fos_node_features = torch.from_numpy(
np.load(
os.path.join(self.base_path, "fos", "node_feat.npy"), mmap_mode="r"
)
)
self.feat_dict["fos"] = fos_node_features
if self.in_memory:
if self.use_fp16:
conference_node_features = torch.load(
os.path.join(
self.base_path,
"conference",
"node_feat_fp16.pt")
)
else:
conference_node_features = torch.from_numpy(
np.load(
os.path.join(
self.base_path,
"conference",
"node_feat.npy"))
)
else:
conference_node_features = torch.from_numpy(
np.load(
os.path.join(
self.base_path,
"conference",
"node_feat.npy"),
mmap_mode="r",
)
)
self.feat_dict["conference"] = conference_node_features
if self.in_memory:
if self.use_fp16:
journal_node_features = torch.load(
os.path.join(
self.base_path,
"journal",
"node_feat_fp16.pt")
)
else:
journal_node_features = torch.from_numpy(
np.load(
os.path.join(
self.base_path,
"journal",
"node_feat.npy"))
)
else:
journal_node_features = torch.from_numpy(
np.load(
os.path.join(self.base_path, "journal", "node_feat.npy"),
mmap_mode="r",
)
)
self.feat_dict["journal"] = journal_node_features
# Please ensure that train_idx and val_idx have been generated using
# split_seeds.py
try:
self.train_idx = torch.load(
os.path.join(
self.base_path,
"train_idx.pt"))
self.val_idx = torch.load(
os.path.join(
self.base_path,
"val_idx.pt"))
except FileNotFoundError as e:
print(
f"FileNotFound: {e}, please ensure that train_idx and val_idx have been generated using split_seeds.py"
)
exit()
except Exception as e:
print(f"Exception: {e}")
exit()
class IGBH(dataset.Dataset):
def __init__(
self,
data_path,
name="igbh",
dataset_size="full",
use_label_2K=True,
in_memory=False,
layout: Literal["CSC", "CSR", "COO"] = "COO",
type: Literal["fp16", "fp32"] = "fp16",
device="cpu",
edge_dir="in",
**kwargs,
):
super().__init__()
self.data_path = data_path
self.name = name
self.size = dataset_size
self.igbh_dataset = IGBHeteroDataset(
data_path,
dataset_size=dataset_size,
in_memory=in_memory,
use_label_2K=use_label_2K,
layout=layout,
use_fp16=(type == "fp16"),
)
self.num_samples = len(self.igbh_dataset.val_idx)
def get_samples(self, id_list):
return self.igbh_dataset.val_idx[id_list]
def get_labels(self, id_list):
return self.igbh_dataset.label[self.get_samples(id_list)]
def get_item_count(self):
return len(self.igbh_dataset.val_idx)
def load_query_samples(self, id):
pass
def unload_query_samples(self, sample_list):
return super().unload_query_samples(sample_list)
class PostProcessIGBH:
def __init__(
self,
device="cpu",
dtype="uint8",
statistics_path=os.path.join(
os.path.dirname(__file__), "tools", "val2014.npz"),
):
self.results = []
self.content_ids = []
self.samples_ids = []
def add_results(self, results):
self.results.extend(results)
def __call__(self, results, ids, sample_ids, result_dict=None):
self.content_ids.extend(ids)
self.samples_ids.extend(sample_ids)
return results.argmax(1).cpu().numpy()
def start(self):
self.results = []
def finalize(self, result_dict, ds=None, output_dir=None):
labels = ds.get_labels(self.content_ids)
total = len(self.results)
good = 0
for l, r in zip(labels, self.results):
if l == r:
good += 1
result_dict["accuracy"] = good / total
return result_dict
"""
mlperf inference benchmarking tool
"""
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import argparse
import array
import collections
import json
import logging
import os
import sys
import threading
import time
from queue import Queue
import mlperf_loadgen as lg
import numpy as np
import torch
import dataset
import igbh
import dgl_utilities.feature_fetching as dgl_igbh
logging.basicConfig(level=logging.INFO)
log = logging.getLogger("main")
NANO_SEC = 1e9
MILLI_SEC = 1000
SUPPORTED_DATASETS = {
"igbh-dgl-tiny": (
dgl_igbh.IGBH,
dataset.preprocess,
igbh.PostProcessIGBH(),
{"dataset_size": "tiny", "use_label_2K": True},
),
"igbh-dgl-small": (
dgl_igbh.IGBH,
dataset.preprocess,
igbh.PostProcessIGBH(),
{"dataset_size": "small", "use_label_2K": True},
),
"igbh-dgl-medium": (
dgl_igbh.IGBH,
dataset.preprocess,
igbh.PostProcessIGBH(),
{"dataset_size": "medium", "use_label_2K": True},
),
"igbh-dgl-large": (
dgl_igbh.IGBH,
dataset.preprocess,
igbh.PostProcessIGBH(),
{"dataset_size": "large", "use_label_2K": True},
),
"igbh-dgl": (
dgl_igbh.IGBH,
dataset.preprocess,
igbh.PostProcessIGBH(),
{"dataset_size": "full", "use_label_2K": True},
),
}
SUPPORTED_PROFILES = {
"defaults": {
"dataset": "igbh-dgl-tiny",
"backend": "dgl",
"model-name": "rgat",
},
"debug-dgl": {
"dataset": "igbh-dgl-tiny",
"backend": "dgl",
"model-name": "rgat",
},
"rgat-dgl-small": {
"dataset": "igbh-dgl-small",
"backend": "dgl",
"model-name": "rgat",
},
"rgat-dgl-medium": {
"dataset": "igbh-dgl-medium",
"backend": "dgl",
"model-name": "rgat",
},
"rgat-dgl-large": {
"dataset": "igbh-dgl-large",
"backend": "dgl",
"model-name": "rgat",
},
"rgat-dgl-full": {
"dataset": "igbh-dgl",
"backend": "dgl",
"model-name": "rgat",
},
}
SCENARIO_MAP = {
"SingleStream": lg.TestScenario.SingleStream,
"MultiStream": lg.TestScenario.MultiStream,
"Server": lg.TestScenario.Server,
"Offline": lg.TestScenario.Offline,
}
def get_args():
parser = argparse.ArgumentParser()
# Dataset arguments
parser.add_argument(
"--dataset",
choices=SUPPORTED_DATASETS.keys(),
help="dataset")
parser.add_argument(
"--dataset-path",
required=True,
help="path to the dataset")
parser.add_argument(
"--layout",
default="COO",
choices=["CSC", "CSR", "COO"],
help="layout of the dataset",
)
parser.add_argument(
"--profile", choices=SUPPORTED_PROFILES.keys(), help="standard profiles"
)
parser.add_argument(
"--scenario",
default="SingleStream",
help="mlperf benchmark scenario, one of " +
str(list(SCENARIO_MAP.keys())),
)
parser.add_argument(
"--max-batchsize",
type=int,
default=1,
help="max batch size in a single inference",
)
parser.add_argument("--threads", default=1, type=int, help="threads")
parser.add_argument(
"--accuracy",
action="store_true",
help="enable accuracy pass")
parser.add_argument(
"--find-peak-performance",
action="store_true",
help="enable finding peak performance pass",
)
# Backend Arguments
parser.add_argument("--backend", help="Name of the backend")
parser.add_argument("--model-name", help="Name of the model")
parser.add_argument("--output", default="output", help="test results")
parser.add_argument("--qps", type=int, help="target qps")
parser.add_argument("--model-path", help="Path to model weights")
parser.add_argument(
"--dtype",
default="fp32",
choices=["fp32", "fp16"],
help="dtype of the model",
)
parser.add_argument(
"--device",
default="gpu",
choices=["gpu", "cpu"],
help="device to run the benchmark",
)
# file for user LoadGen settings such as target QPS
parser.add_argument(
"--user_conf",
default="user.conf",
help="user config for user LoadGen settings such as target QPS",
)
# file for LoadGen audit settings
parser.add_argument(
"--audit_conf", default="audit.config", help="config for LoadGen audit settings"
)
# below will override mlperf rules compliant settings - don't use for
# official submission
parser.add_argument("--time", type=int, help="time to scan in seconds")
parser.add_argument("--count", type=int, help="dataset items to use")
parser.add_argument("--debug", action="store_true", help="debug")
parser.add_argument(
"--performance-sample-count",
type=int,
help="performance sample count",
default=5000,
)
parser.add_argument(
"--max-latency", type=float, help="mlperf max latency in pct tile"
)
parser.add_argument(
"--samples-per-query",
default=8,
type=int,
help="mlperf multi-stream samples per query",
)
args = parser.parse_args()
# don't use defaults in argparser. Instead we default to a dict, override that with a profile
# and take this as default unless command line give
defaults = SUPPORTED_PROFILES["defaults"]
if args.profile:
profile = SUPPORTED_PROFILES[args.profile]
defaults.update(profile)
for k, v in defaults.items():
kc = k.replace("-", "_")
if getattr(args, kc) is None:
setattr(args, kc, v)
if args.scenario not in SCENARIO_MAP:
parser.error("valid scanarios:" + str(list(SCENARIO_MAP.keys())))
return args
def get_backend(backend, **kwargs):
if backend == "dgl":
from backend_dgl import BackendDGL
backend = BackendDGL(**kwargs)
else:
raise ValueError("unknown backend: " + backend)
return backend
class Item:
"""An item that we queue for processing by the thread pool."""
def __init__(self, query_id, content_id, samples):
self.query_id = query_id
self.content_id = content_id
self.samples = samples
self.start = time.time()
class RunnerBase:
def __init__(self, model, ds, threads, post_proc=None, max_batchsize=128):
self.take_accuracy = False
self.ds = ds
self.model = model
self.post_process = post_proc
self.threads = threads
self.take_accuracy = False
self.max_batchsize = max_batchsize
self.result_timing = []
def handle_tasks(self, tasks_queue):
pass
def start_run(self, result_dict, take_accuracy):
self.result_dict = result_dict
self.result_timing = []
self.take_accuracy = take_accuracy
self.post_process.start()
def run_one_item(self, qitem: Item):
# run the prediction
processed_results = []
try:
results = self.model.predict(qitem.samples)
processed_results = self.post_process(
results, qitem.content_id, qitem.samples, self.result_dict
)
if self.take_accuracy:
self.post_process.add_results(processed_results)
self.result_timing.append(time.time() - qitem.start)
except Exception as ex: # pylint: disable=broad-except
src = [i for i in qitem.content_id]
log.error("thread: failed on contentid=%s, %s", src, ex)
# since post_process will not run, fake empty responses
processed_results = [[]] * len(qitem.query_id)
finally:
response_array_refs = []
response = []
for idx, query_id in enumerate(qitem.query_id):
response_array = array.array(
"B", np.array(processed_results[idx], np.uint8).tobytes()
)
response_array_refs.append(response_array)
bi = response_array.buffer_info()
response.append(lg.QuerySampleResponse(query_id, bi[0], bi[1]))
lg.QuerySamplesComplete(response)
def enqueue(self, query_samples):
idx = [q.index for q in query_samples]
query_id = [q.id for q in query_samples]
if len(query_samples) < self.max_batchsize:
samples = self.ds.get_samples(idx)
self.run_one_item(Item(query_id, idx, samples))
else:
bs = self.max_batchsize
for i in range(0, len(idx), bs):
samples = self.ds.get_samples(idx[i: i + bs])
self.run_one_item(
Item(query_id[i: i + bs], idx[i: i + bs], samples))
def finish(self):
pass
class QueueRunner(RunnerBase):
def __init__(self, model, ds, threads, post_proc=None, max_batchsize=128):
super().__init__(model, ds, threads, post_proc, max_batchsize)
self.tasks = Queue(maxsize=threads * 4)
self.workers = []
self.result_dict = {}
for _ in range(self.threads):
worker = threading.Thread(
target=self.handle_tasks, args=(
self.tasks,))
worker.daemon = True
self.workers.append(worker)
worker.start()
def handle_tasks(self, tasks_queue):
"""Worker thread."""
while True:
qitem = tasks_queue.get()
if qitem is None:
# None in the queue indicates the parent want us to exit
tasks_queue.task_done()
break
self.run_one_item(qitem)
tasks_queue.task_done()
def enqueue(self, query_samples):
idx = [q.index for q in query_samples]
query_id = [q.id for q in query_samples]
if len(query_samples) < self.max_batchsize:
samples = self.ds.get_samples(idx)
self.tasks.put(Item(query_id, idx, samples))
else:
bs = self.max_batchsize
for i in range(0, len(idx), bs):
ie = i + bs
samples = self.ds.get_samples(idx[i:ie])
self.tasks.put(Item(query_id[i:ie], idx[i:ie], samples))
def finish(self):
# exit all threads
for _ in self.workers:
self.tasks.put(None)
for worker in self.workers:
worker.join()
def main():
args = get_args()
log.info(args)
# dataset to use
dataset_class, pre_proc, post_proc, kwargs = SUPPORTED_DATASETS[args.dataset]
ds = dataset_class(
data_path=args.dataset_path,
name=args.dataset,
layout=args.layout,
type=args.dtype,
**kwargs,
)
# find backend
backend = get_backend(
args.backend,
type=args.dtype,
device=args.device,
ckpt_path=args.model_path,
batch_size=args.max_batchsize,
igbh=ds,
layout=args.layout,
)
# --count applies to accuracy mode only and can be used to limit the number of images
# for testing.
count_override = False
count = args.count
if count:
count_override = True
# load model to backend
model = backend.load()
final_results = {
"runtime": model.name(),
"version": model.version(),
"time": int(time.time()),
"args": vars(args),
"cmdline": str(args),
}
user_conf = os.path.abspath(args.user_conf)
if not os.path.exists(user_conf):
log.error("{} not found".format(user_conf))
sys.exit(1)
audit_config = os.path.abspath(args.audit_conf)
if args.output:
output_dir = os.path.abspath(args.output)
os.makedirs(output_dir, exist_ok=True)
os.chdir(output_dir)
#
# make one pass over the dataset to validate accuracy
#
count = ds.get_item_count()
# warmup
warmup_samples = torch.Tensor([0]).to(torch.int64)
for i in range(5):
_ = backend.predict(warmup_samples)
scenario = SCENARIO_MAP[args.scenario]
runner_map = {
lg.TestScenario.SingleStream: RunnerBase,
lg.TestScenario.MultiStream: QueueRunner,
lg.TestScenario.Server: QueueRunner,
lg.TestScenario.Offline: QueueRunner,
}
runner = runner_map[scenario](
model, ds, args.threads, post_proc=post_proc, max_batchsize=args.max_batchsize
)
def issue_queries(query_samples):
runner.enqueue(query_samples)
def flush_queries():
pass
log_output_settings = lg.LogOutputSettings()
log_output_settings.outdir = output_dir
log_output_settings.copy_summary_to_stdout = False
log_settings = lg.LogSettings()
log_settings.enable_trace = args.debug
log_settings.log_output = log_output_settings
settings = lg.TestSettings()
settings.FromConfig(user_conf, args.model_name, args.scenario)
settings.scenario = scenario
settings.mode = lg.TestMode.PerformanceOnly
if args.accuracy:
settings.mode = lg.TestMode.AccuracyOnly
if args.find_peak_performance:
settings.mode = lg.TestMode.FindPeakPerformance
if args.time:
# override the time we want to run
settings.min_duration_ms = args.time * MILLI_SEC
settings.max_duration_ms = args.time * MILLI_SEC
if args.qps:
qps = float(args.qps)
settings.server_target_qps = qps
settings.offline_expected_qps = qps
if count_override:
settings.min_query_count = count
settings.max_query_count = count
if args.samples_per_query:
settings.multi_stream_samples_per_query = args.samples_per_query
if args.max_latency:
settings.server_target_latency_ns = int(args.max_latency * NANO_SEC)
settings.multi_stream_expected_latency_ns = int(
args.max_latency * NANO_SEC)
performance_sample_count = (
args.performance_sample_count
if args.performance_sample_count
else min(count, 500)
)
sut = lg.ConstructSUT(issue_queries, flush_queries)
qsl = lg.ConstructQSL(
count, performance_sample_count, ds.load_query_samples, ds.unload_query_samples
)
log.info("starting {}".format(scenario))
result_dict = {"scenario": str(scenario)}
runner.start_run(result_dict, args.accuracy)
lg.StartTestWithLogSettings(sut, qsl, settings, log_settings, audit_config)
if args.accuracy:
post_proc.finalize(result_dict, ds, output_dir=args.output)
final_results["accuracy_results"] = result_dict
runner.finish()
lg.DestroyQSL(qsl)
lg.DestroySUT(sut)
#
# write final results
#
if args.output:
with open("results.json", "w") as f:
json.dump(final_results, f, sort_keys=True, indent=4)
if __name__ == "__main__":
main()
colorama==0.4.6
tqdm==4.66.4
requests==2.32.2
torchdata==0.7.0
pybind11==2.12.0
PyYAML==6.0.1
pydantic==2.7.1
git+https://github.com/IllinoisGraphBenchmark/IGB-Datasets.git
\ No newline at end of file
# This script was taken from:
# https://github.com/mlcommons/training/blob/master/graph_neural_network/dataset.py
import torch
import torch.nn.functional as F
from torch_geometric.nn import HeteroConv, GATConv, GCNConv, SAGEConv
from torch_geometric.utils import trim_to_layer
class RGNN(torch.nn.Module):
r"""[Relational GNN model](https://arxiv.org/abs/1703.06103).
Args:
etypes: edge types.
in_dim: input size.
h_dim: Dimension of hidden layer.
out_dim: Output dimension.
num_layers: Number of conv layers.
dropout: Dropout probability for hidden layers.
model: "rsage" or "rgat".
heads: Number of multi-head-attentions for GAT.
node_type: The predict node type for node classification.
"""
def __init__(
self,
etypes,
in_dim,
h_dim,
out_dim,
num_layers=2,
dropout=0.2,
model="rgat",
heads=4,
node_type=None,
with_trim=False,
):
super().__init__()
self.node_type = node_type
if node_type is not None:
self.lin = torch.nn.Linear(h_dim, out_dim)
self.convs = torch.nn.ModuleList()
for i in range(num_layers):
in_dim = in_dim if i == 0 else h_dim
h_dim = out_dim if (
i == (
num_layers -
1) and node_type is None) else h_dim
if model == "rsage":
self.convs.append(
HeteroConv(
{
etype: SAGEConv(in_dim, h_dim, root_weight=False)
for etype in etypes
}
)
)
elif model == "rgat":
self.convs.append(
HeteroConv(
{
etype: GATConv(
in_dim,
h_dim // heads,
heads=heads,
add_self_loops=False,
)
for etype in etypes
}
)
)
self.dropout = torch.nn.Dropout(dropout)
self.with_trim = with_trim
def forward(
self,
x_dict,
edge_index_dict,
num_sampled_edges_dict=None,
num_sampled_nodes_dict=None,
):
for i, conv in enumerate(self.convs):
if self.with_trim:
x_dict, edge_index_dict, _ = trim_to_layer(
layer=i,
num_sampled_nodes_per_hop=num_sampled_nodes_dict,
num_sampled_edges_per_hop=num_sampled_edges_dict,
x=x_dict,
edge_index=edge_index_dict,
)
for key in list(edge_index_dict.keys()):
if key[0] not in x_dict or key[-1] not in x_dict:
del edge_index_dict[key]
x_dict = conv(x_dict, edge_index_dict)
if i != len(self.convs) - 1:
x_dict = {key: F.leaky_relu(x) for key, x in x_dict.items()}
x_dict = {key: self.dropout(x) for key, x in x_dict.items()}
if hasattr(self, "lin"): # for node classification
return self.lin(x_dict[self.node_type])
else:
return x_dict
import argparse
import numpy as np
import torch
import json
import os
def get_args():
"""Parse commandline."""
parser = argparse.ArgumentParser()
parser.add_argument(
"--mlperf-accuracy-file", required=True, help="path to mlperf_log_accuracy.json"
)
parser.add_argument(
"--dataset-path",
default="igbh",
help="Path to IHGB dataset",
)
parser.add_argument(
"--dataset-size",
default="full",
choices=["tiny", "small", "medium", "large", "full"]
)
parser.add_argument(
"--verbose",
action="store_true",
help="verbose messages")
parser.add_argument(
"--output-file", default="results.json", help="path to output file"
)
parser.add_argument(
"--dtype",
default="uint8",
choices=["uint8", "float32", "int32", "int64"],
help="data type of the label",
)
args = parser.parse_args()
return args
def load_labels(base_path, dataset_size, use_label_2K=True):
# load labels
paper_nodes_num = {
"tiny": 100000,
"small": 1000000,
"medium": 10000000,
"large": 100000000,
"full": 269346174,
}
label_file = (
"node_label_19.npy" if not use_label_2K else "node_label_2K.npy"
)
paper_lbl_path = os.path.join(
base_path,
dataset_size,
"processed",
"paper",
label_file)
if dataset_size in ["large", "full"]:
paper_node_labels = torch.from_numpy(
np.memmap(
paper_lbl_path, dtype="float32", mode="r", shape=(paper_nodes_num[dataset_size])
)
).to(torch.long)
else:
paper_node_labels = torch.from_numpy(
np.load(paper_lbl_path)).to(
torch.long)
labels = paper_node_labels
val_idx = torch.load(
os.path.join(
base_path,
dataset_size,
"processed",
"val_idx.pt"))
return labels, val_idx
def get_labels(labels, val_idx, id_list):
return labels[val_idx[id_list]]
if __name__ == "__main__":
args = get_args()
dtype_map = {
"uint8": np.uint8,
"float32": np.float32,
"int32": np.int32,
"int64": np.int64}
with open(args.mlperf_accuracy_file, "r") as f:
mlperf_results = json.load(f)
labels, val_idx = load_labels(args.dataset_path, args.dataset_size)
results = {}
seen = set()
good = 0
total = 0
for result in mlperf_results:
idx = result["qsl_idx"]
if idx in seen:
continue
seen.add(idx)
# get ground truth
label = get_labels(labels, val_idx, idx)
# get prediction
data = int(np.frombuffer(bytes.fromhex(
result["data"]), dtype_map[args.dtype])[0])
if label == data:
good += 1
total += 1
results["accuracy"] = good / total
results["model"] = "rgat"
results["number_correct_samples"] = good
results["performance_sample_count"] = total
with open(args.output_file, "w") as fp:
fp.write("accuracy={:.3f}%, good={}, total={}".format(
100.0 *
results["accuracy"], results["number_correct_samples"], results["performance_sample_count"]
))
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment