git init

3c15726c · yangzhong · 3c15726c · 3c15726c · 3c15726c · 3c15726c
Commit 3c15726c authored Nov 01, 2025 by yangzhong
20 changed files
--- a/docs/install/index.md
+++ b/docs/install/index.md
+---
+hide:
+  - toc
+---
+
+# Installation
+We use MLCommons CM Automation framework to run MLPerf inference benchmarks.
+
+CM needs `git`, `python3-pip` and `python3-venv` installed on your system. If any of these are absent, please follow the [official CM installation page](https://docs.mlcommons.org/ck/install) to install them. Once the dependencies are installed, do the following
+
+## Activate a Virtual ENV for CM
+This step is not mandatory as CM can use separate virtual environment for MLPerf inference. But the latest `pip` install requires this or else will need the `--break-system-packages` flag while installing `cm4mlops`.
+
+```bash
+python3 -m venv cm
+source cm/bin/activate
+```
+
+## Install CM and pulls any needed repositories
+=== "Use the default fork of CM MLOps repository"
+    ```bash
+     pip install cm4mlops
+    ```
+
+=== "Use custom fork/branch of the CM MLOps repository"
+    ```bash
+     pip install cmind && cm init --quiet --repo=mlcommons@cm4mlops --branch=mlperf-inference
+    ```
+    Here, `repo` is in the format `githubUsername@githubRepo`.
+
+Now, you are ready to use the `cm` commands to run MLPerf inference as given in the [benchmarks](../index.md) page
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
+mkdocs-material
+swagger-markdown
+mkdocs-macros-plugin
+ruamel.yaml
+mkdocs-redirects
+mkdocs-site-urls
--- a/docs/submission/index.md
+++ b/docs/submission/index.md
+---
+hide:
+  - toc
+---
+
+Click [here](https://docs.google.com/presentation/d/1cmbpZUpVr78EIrhzyMBnnWnjJrD-mZ2vmSb-yETkTA8/edit?usp=sharing) to view the proposal slide for Common Automation for MLPerf Inference Submission Generation through CM.
+
+=== "Custom automation based MLPerf results"
+    If you have not followed the `cm run` commands under the individual model pages in the [benchmarks](../index.md) directory, please make sure that the result directory is structured in the following way. You can see the real examples for the expected folder structure [here](https://github.com/mlcommons/inference/tree/submission-generation-examples).
+    ```
+    └── System description ID(SUT Name)
+        ├── system_meta.json
+        └── Benchmark
+            └── Scenario
+                ├── Performance
+                |   └── run_1 run for all scenarios
+                |       ├── mlperf_log_summary.txt
+                |       └── mlperf_log_detail.txt
+                ├── Accuracy
+                |   ├── mlperf_log_summary.txt
+                |   ├── mlperf_log_detail.txt
+                |   ├── mlperf_log_accuracy.json
+                |   └── accuracy.txt
+                |── Compliance_Test_ID
+                |   ├── Performance
+                |   |   └── run_x/#1 run for all scenarios
+                |   |       ├── mlperf_log_summary.txt
+                |   |       └── mlperf_log_detail.txt
+                |   ├── Accuracy # for TEST01 only
+                |   |   ├── baseline_accuracy.txt (if test fails in deterministic mode)
+                |   |   ├── compliance_accuracy.txt (if test fails in deterministic mode)
+                |   |   ├── mlperf_log_accuracy.json
+                |   |   └── accuracy.txt
+                |   ├── verify_performance.txt
+                |   └── verify_accuracy.txt # for TEST01 only
+                |── user.conf
+                └── measurements.json
+    ```
+    
+    <details>
+    <summary>Click here if you are submitting in open division</summary>
+
+    * The `model_mapping.json` should be included inside the SUT folder which is used to map the custom model full name to the official model name. The format of json file is:
+
+    ```
+        {
+            "custom_model_name_for_model1":"official_model_name_for_model1",
+            "custom_model_name_for_model2":"official_model_name_for_model2",
+
+        }
+    ```
+    </details>
+
+=== "CM automation based results"
+    If you have followed the `cm run` commands under the individual model pages in the [benchmarks](../index.md) directory, all the valid results will get aggregated to the `cm cache` folder. The following command could be used to browse the structure of inference results folder generated by CM.
+    ### Get results folder structure
+    ```bash
+    cm find cache --tags=get,mlperf,inference,results,dir | xargs tree
+    ```
+
+
+Once all the results across all the models are ready you can use the following the below section to generate a valid submission tree compliant with the [MLPerf requirements](https://github.com/mlcommons/policies/blob/master/submission_rules.adoc#inference-1).
+
+## Generate submission folder
+
+The submission generation flow is explained in the below diagram
+
+```mermaid
+flowchart LR
+    subgraph Generation [Submission Generation SUT1]
+      direction TB
+      A[populate system details] --> B[generate submission structure]
+      B --> C[truncate-accuracy-logs]
+      C --> D{Infer low talency results <br>and/or<br> filter out invalid results}
+      D --> yes --> E[preprocess-mlperf-inference-submission]
+      D --> no --> F[run-mlperf-inference-submission-checker]
+      E --> F
+    end
+    Input((Results SUT1)) --> Generation
+    Generation --> Output((Submission Folder <br> SUT1))
+```
+
+### Command to generate submission folder
+
+```bash
+cm run script --tags=generate,inference,submission \
+  --clean \
+  --preprocess_submission=yes \
+  --run-checker=yes \
+  --submitter=MLCommons \
+  --division=closed \
+  --env.CM_DETERMINE_MEMORY_CONFIGURATION=yes \
+  --quiet
+```
+!!! tip
+    * Use `--hw_name="My system name"` to give a meaningful system name. Examples can be seen [here](https://github.com/mlcommons/inference_results_v3.0/tree/main/open/cTuning/systems)
+
+    * Use `--submitter=<Your name>` if your organization is an official MLCommons member and would like to submit under your organization
+
+    * Use `--hw_notes_extra` option to add additional notes like `--hw_notes_extra="Result taken by NAME" `
+
+    * Use `--results_dir` option to specify the results folder.  It is automatically taken from CM cache for MLPerf automation based runs
+
+    * Use `--submission_dir` option to specify the submission folder. (You can avoid this if you're pushing to github or only running a single SUT and CM will use its cache folder)
+
+    * Use `--division=open` for open division submission 
+
+    * Use `--category` option to specify the category for which submission is generated(datacenter/edge). By default, the category is taken from `system_meta.json` file located in the SUT root directory.
+
+    * Use `--submission_base_dir` to specify the directory to which the outputs from preprocess submission script and final submission is added. No need to provide `--submission_dir` along with this. For `docker run`, use `--submission_base_dir` instead of `--submission_dir`.
+
+
+If there are multiple systems where MLPerf results are collected, the same process needs to be repeated on each of them. One we have submission folders on all the SUTs, we need to sync them to make a single submission folder
+
+=== "Sync Locally"
+    If you are having results in multiple systems, you need to merge them to one system. You can use `rsync` for this. For example, the below command will sync the submission folder from SUT2 to the one in SUT1. 
+    ```
+    rsync -avz username@host1:<path_to_submission_folder2>/ <path_to_submission_folder1>/
+    ```
+    Same needs to be repeated for all other SUTs so that we have the full submissions in SUT1.
+
+    ```mermaid
+        flowchart LR
+            subgraph SUT1 [Submission Generation SUT1]
+              A[Submission Folder SUT1]
+            end
+            subgraph SUT2 [Submission Generation SUT2]
+              B[Submission Folder SUT2]
+            end
+            subgraph SUT3 [Submission Generation SUT3]
+              C[Submission Folder SUT3]
+            end
+            subgraph SUTN [Submission Generation SUTN]
+              D[Submission Folder SUTN]
+            end
+            SUT2 --> SUT1
+            SUT3 --> SUT1
+            SUTN --> SUT1
+           
+    ```
+
+=== "Sync via a Github repo"
+    If you are collecting results across multiple systems you can generate different submissions and aggregate all of them to a GitHub repository (can be private) and use it to generate a single tar ball which can be uploaded to the [MLCommons Submission UI](https://submissions-ui.mlcommons.org/submission). 
+
+    Run the following command after **replacing `--repo_url` with your GitHub repository URL**.
+
+    ```bash
+    cm run script --tags=push,github,mlperf,inference,submission \
+       --repo_url=https://github.com/mlcommons/mlperf_inference_submissions_v5.0 \
+       --commit_message="Results on <HW name> added by <Name>" \
+       --quiet
+    ```
+    
+    ```mermaid
+        flowchart LR
+            subgraph SUT1 [Submission Generation SUT1]
+              A[Submission Folder SUT1]
+            end
+            subgraph SUT2 [Submission Generation SUT2]
+              B[Submission Folder SUT2]
+            end
+            subgraph SUT3 [Submission Generation SUT3]
+              C[Submission Folder SUT3]
+            end
+            subgraph SUTN [Submission Generation SUTN]
+              D[Submission Folder SUTN]
+            end
+	    SUT2 -- git sync and push --> G[Github Repo]
+	    SUT3 -- git sync and push --> G[Github Repo]
+	    SUTN -- git sync and push --> G[Github Repo]
+	    SUT1 -- git sync and push --> G[Github Repo]
+           
+    ```
+
+## Upload the final submission
+    
+!!! warning
+    If you are using GitHub for consolidating your results, make sure that you have run the [`push-to-github` command](#__tabbed_2_2) on the same system to ensure results are synced as is on the GitHub repository.
+
+Once you have all the results on the system, you can upload them to the MLCommons submission server as follows:
+
+=== "via CLI"
+    You can do the following command which will run the submission checker and upload the results to the MLCommons submission server
+    ```
+    cm run script --tags=run,submission,checker \
+    --submitter_id=<> \
+    --submission_dir=<Path to the submission folder>
+    ```
+=== "via Browser"
+    You can do the following command to generate the final submission tar file and then upload to the [MLCommons Submission UI](https://submissions-ui.mlcommons.org/submission). 
+    ```
+    cm run script --tags=run,submission,checker \
+    --submission_dir=<Path to the submission folder> \
+    --tar=yes \
+    --submission_tar_file=mysubmission.tar.gz
+    ```
+    
+```mermaid
+        flowchart LR
+            subgraph SUT [Combined Submissions]
+              A[Combined Submission Folder in SUT1]
+            end
+	    SUT --> B[Run submission checker]
+	    B --> C[Upload to MLC Submission server]
+	    C --> D[Receive validation email]
+```
+
+
+
+<!--Click [here](https://youtu.be/eI1Hoecc3ho) to view the recording of the workshop: Streamlining your MLPerf Inference results using CM.-->
--- a/docs/system_requirements.yml
+++ b/docs/system_requirements.yml
+# All memory requirements in GB
+resnet:
+  reference:
+    fp32:
+      system_memory: 8
+      accelerator_memory: 4
+      disk_storage: 25
+  nvidia:
+    int8:
+      system_memory: 8
+      accelerator_memory: 4
+      disk_storage: 100
+  intel:
+    int8:
+      system_memory: 8
+      accelerator_memory: 0
+      disk_storage: 50
+  qualcomm:
+    int8:
+      system_memory: 8
+      accelerator_memory: 8
+      disk_storage: 50
+retinanet:
+  reference:
+    fp32:
+      system_memory: 8
+      accelerator_memory: 8
+      disk_storage: 200
+  nvidia:
+    int8:
+      system_memory: 8
+      accelerator_memory: 8
+      disk_storage: 200
+  intel:
+    int8:
+      system_memory: 8
+      accelerator_memory: 0
+      disk_storage: 200
+  qualcomm:
+    int8:
+      system_memory: 8
+      accelerator_memory: 8
+      disk_storage: 200
+rgat:
+  reference:
+    fp32:
+      system_memory: 768
+      accelerator_memory: 8
+      disk_storage: 2300
+
--- a/docs/usage/index.md
+++ b/docs/usage/index.md
+---
+hide:
+  - toc
+---
+
+# Using CM for MLPerf Inference
--- a/graph/R-GAT/README.md
+++ b/graph/R-GAT/README.md
+# MLPerf™ Inference Benchmark for Graph Neural Network
+
+This is the reference implementation for MLPerf Inference Graph Neural Network. The reference implementation currently uses Deep Graph Library (DGL), and pytorch as the backbone of the model.
+
+**Hardware requirements:** The minimun requirements to run this benchmark are ~600GB of RAM and ~2.3TB of disk. This requires to create a memory map for the graph features and not load them to memory all at once.
+
+## Supported Models
+
+| model | accuracy | dataset | model source | precision | notes |
+| ---- | ---- | ---- | ---- | ---- | ---- |
+| RGAT | 0.7286 | IGBH | [Illiois Graph Benchmark](https://github.com/IllinoisGraphBenchmark/IGB-Datasets/) | fp32 | - |
+
+## Dataset
+
+| Data | Description | Task |
+| ---- | ---- | ---- |
+| IGBH | Illinois Graph Benchmark Heterogeneous is a graph dataset consisting of one heterogeneous graph with 547,306,935 nodes and 5,812,005,639 edges. Node types: Author, Conference, FoS, Institute, Journal, Paper. A subset of 1% of the paper nodes are randomly choosen as the validation dataset using the [split seeds script](tools/split_seeds.py). The validation dataset will be used as the input queries for the SUT, however the whole dataset is needed to run the benchmarks, since all the graph connections are needed to achieve the quality target. | Node Classification |
+| IGBH (calibration) | We sampled 5000 nodes from the training paper nodes of the IGBH for the calibration dataset. We provide the [Node ids](../../calibration/IGBH/calibration.txt) and the [script](tools/split_seeds.py) to generate them (using the `--calibration` flag). | Node Classification |
+
+## Automated command to run the benchmark via MLCommons CM
+
+Please see the [new docs site](https://docs.mlcommons.org/inference/benchmarks/graph/rgat/) for an automated way to run this benchmark across different available implementations and do an end-to-end submission with or without docker.
+
+You can also do `pip install cm4mlops` and then use `cm` commands for downloading the model and datasets using the commands given in the later sections.
+ 
+## Setup
+Set the following helper variables
+```bash
+export ROOT_INFERENCE=$PWD/inference
+export GRAPH_FOLDER=$PWD/inference/graph/R-GAT/
+export LOADGEN_FOLDER=$PWD/inference/loadgen
+export MODEL_PATH=$PWD/inference/graph/R-GAT/model/
+```
+### Clone the repository
+```bash
+git clone --recurse-submodules https://github.com/mlcommons/inference.git --depth 1
+```
+
+
+### Install pytorch
+**For NVIDIA GPU based runs:**
+```bash
+pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121
+```
+**For CPU based runs:**
+```bash
+pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cpu
+```
+
+### Install requirements (only for running without using docker)
+Install requirements:
+```bash
+cd $GRAPH_FOLDER
+pip install -r requirements.txt
+```
+Install loadgen:
+```bash
+cd $LOADGEN_FOLDER
+CFLAGS="-std=c++14" python setup.py install
+```
+
+### Install pytorch geometric
+
+```bash
+export TORCH_VERSION=$(python -c "import torch; print(torch.__version__)")
+pip install torch-geometric torch-scatter torch-sparse -f https://data.pyg.org/whl/torch-${TORCH_VERSION}.html
+```
+
+### Install DGL
+**For NVIDIA GPU based runs:**
+```bash
+pip install  dgl -f https://data.dgl.ai/wheels/torch-2.1/cu121/repo.html
+```
+**For CPU based runs:**
+```bash
+pip install  dgl -f https://data.dgl.ai/wheels/torch-2.1/repo.html
+```
+
+
+### Download model through CM (Collective Minds)
+
+```
+cm run script --tags=get,ml-model,rgat --outdirname=<path_to_download>
+```
+
+### Download model using Rclone
+
+To run Rclone on Windows, you can download the executable [here](https://rclone.org/install/#windows).
+To install Rclone on Linux/macOS/BSD systems, run:
+```
+sudo -v ; curl https://rclone.org/install.sh | sudo bash
+```
+Once Rclone is installed, run the following command to authenticate with the bucket:
+```
+rclone config create mlc-inference s3 provider=Cloudflare access_key_id=f65ba5eef400db161ea49967de89f47b secret_access_key=fbea333914c292b854f14d3fe232bad6c5407bf0ab1bebf78833c2b359bdfd2b endpoint=https://c2686074cb2caf5cbaf6d134bdba8b47.r2.cloudflarestorage.com
+```
+You can then navigate in the terminal to your desired download directory and run the following commands to download the checkpoints:
+
+**`fp32`**
+```
+rclone copy mlc-inference:mlcommons-inference-wg-public/R-GAT/RGAT.pt $MODEL_PATH -P
+```
+
+
+
+### Download and setup dataset
+#### Debug Dataset
+
+**CM Command**
+```
+cm run script --tags=get,dataset,igbh,_debug --outdirname=<path to download>
+```
+
+**Download Dataset**
+```bash
+cd $GRAPH_FOLDER
+python3 tools/download_igbh_test.py
+```
+
+**Split Seeds**
+```bash
+cd $GRAPH_FOLDER
+python3 tools/split_seeds.py --path igbh --dataset_size tiny
+```
+
+
+
+#### Full Dataset
+**Warning:** This script will download 2.2TB of data
+
+**CM Command**
+```
+cm run script --tags=get,dataset,igbh,_full --outdirname=<path to download>
+```
+
+```bash
+cd $GRAPH_FOLDER
+./tools/download_igbh_full.sh igbh/
+```
+
+**Split Seeds**
+```bash
+cd $GRAPH_FOLDER
+python3 tools/split_seeds.py --path igbh --dataset_size full
+```
+
+
+#### Calibration dataset
+
+The calibration dataset contains 5000 nodes from the training paper nodes of the IGBH dataset. We provide the [Node ids](../../calibration/IGBH/calibration.txt) and the [script](tools/split_seeds.py) to generate them (using the `--calibration` flag). 
+
+**CM Command**
+```
+cm run script --tags=get,dataset,igbh,_full,_calibration --outdirname=<path to download>
+```
+
+### Run the benchmark
+#### Debug Run
+```bash
+# Go to the benchmark folder
+cd $GRAPH_FOLDER
+
+# Run the benchmark DGL
+python3 main.py --dataset igbh-dgl-tiny --dataset-path igbh/ --profile debug-dgl [--model-path <path_to_ckpt>] [--device <cpu or gpu>] [--dtype <fp16 or fp32>] [--scenario <SingleStream, MultiStream, Server or Offline>]
+```
+
+
+#### Local run
+```bash
+# Go to the benchmark folder
+cd $GRAPH_FOLDER
+
+# Run the benchmark DGL
+python3 main.py --dataset igbh-dgl --dataset-path igbh/ --profile rgat-dgl-full [--model-path <path_to_ckpt>] [--device <cpu or gpu>] [--dtype <fp16 or fp32>] [--scenario <SingleStream, MultiStream, Server or Offline>]
+```
+
+### Evaluate the accuracy
+```bash
+cm run script --tags=process,mlperf,accuracy,_igbh --result_dir=<Path to directory where files are generated after the benchmark run>
+```
+
+Please click [here](https://github.com/mlcommons/inference/blob/dev/graph/R-GAT/tools/accuracy_igbh.py) to view the Python script for evaluating accuracy for the IGBH dataset.
+
+#### Run using docker
+
+Not implemented yet
+
+#### Accuracy run
+Add the `--accuracy` to the command to run the benchmark
+```bash
+python3 main.py --dataset igbh --dataset-path igbh/ --accuracy --model-path model/ [--model-path <path_to_ckpt>] [--device <cpu or gpu>] [--dtype <fp16 or fp32>] [--scenario <SingleStream, MultiStream, Server or Offline>] [--layout <COO, CSC or CSR>]
+```
+
+**NOTE:** For official submissions you should submit the results of the accuracy run in a file called `accuracy.txt` with the following format:
+```
+accuracy=<accuracy>%, good=<number_of_good_samples>, total=<number_of_total_samples>
+hash=<hash>
+```
+
+### Docker run
+**CPU:**
+Build docker image
+```bash
+docker build . -f dockerfile.cpu -t rgat-cpu
+```
+Run docker container:
+```bash
+docker run --rm -it -v $(pwd):/root rgat-cpu
+```
+Run benchmark inside the docker container:
+```bash
+python3 main.py --dataset igbh-dgl --dataset-path igbh/ --profile rgat-dgl-full --device cpu [--model-path <path_to_ckpt>] [--dtype <fp16 or fp32>] [--scenario <SingleStream, MultiStream, Server or Offline>]
+```
+
+
+**GPU:**
+Build docker image
+```bash
+docker build . -f dockerfile.gpu -t rgat-gpu
+```
+Run docker container:
+```bash
+docker run --rm -it -v $(pwd):/workspace/root --gpus all rgat-gpu
+```
+Go inside the root folder and run benchmark inside the docker container:
+```bash
+cd root
+python3 main.py --dataset igbh-dgl --dataset-path igbh/ --profile rgat-dgl-full --device gpu [--model-path <path_to_ckpt>] [--dtype <fp16 or fp32>] [--scenario <SingleStream, MultiStream, Server or Offline>]
+```
+
+**NOTE:** For official submissions, this benchmark is required to run in equal issue mode. Please make sure that the flag `rgat.*.sample_concatenate_permutation` is set to one in the [mlperf.conf](../../loadgen/mlperf.conf) file when loadgen is built.
--- a/graph/R-GAT/backend.py
+++ b/graph/R-GAT/backend.py
+"""
+abstract backend class
+"""
+
+
+class Backend:
+    def __init__(self):
+        self.inputs = []
+        self.outputs = []
+
+    def version(self):
+        raise NotImplementedError("Backend:version")
+
+    def name(self):
+        raise NotImplementedError("Backend:name")
+
+    def load(self, model_path, inputs=None, outputs=None):
+        raise NotImplementedError("Backend:load")
+
+    def predict(self, feed):
+        raise NotImplementedError("Backend:predict")
--- a/graph/R-GAT/backend_dgl.py
+++ b/graph/R-GAT/backend_dgl.py
+
+from typing import Optional, List, Union, Any
+from dgl_utilities.feature_fetching import IGBHeteroGraphStructure, Features, IGBH
+from dgl_utilities.components import build_graph, get_loader, RGAT
+from dgl_utilities.pyg_sampler import PyGSampler
+import os
+import torch
+import logging
+import backend
+from typing import Literal
+
+logging.basicConfig(level=logging.INFO)
+log = logging.getLogger("backend-dgl")
+
+
+class BackendDGL(backend.Backend):
+    def __init__(
+        self,
+        model_type="rgat",
+        type: Literal["fp16", "fp32"] = "fp16",
+        device: Literal["cpu", "gpu"] = "gpu",
+        ckpt_path: str = None,
+        igbh: IGBH = None,
+        batch_size: int = 1,
+        layout: Literal["CSC", "CSR", "COO"] = "COO",
+        edge_dir: str = "in",
+    ):
+        super(BackendDGL, self).__init__()
+        # Set device and type
+        if device == "gpu":
+            self.device = torch.device("cuda")
+        else:
+            self.device = torch.device("cpu")
+
+        if type == "fp32":
+            self.type = torch.float32
+        else:
+            self.type = torch.float16
+        # Create Node and neighbor loader
+        self.fan_out = [5, 10, 15]
+        self.igbh_graph_structure = igbh.igbh_dataset
+        self.feature_store = Features(
+            self.igbh_graph_structure.dir,
+            self.igbh_graph_structure.dataset_size,
+            self.igbh_graph_structure.in_memory,
+            use_fp16=self.igbh_graph_structure.use_fp16,
+        )
+        self.feature_store.build_features(use_journal_conference=True)
+        self.graph = build_graph(
+            self.igbh_graph_structure,
+            "dgl",
+            features=self.feature_store)
+        self.neighbor_loader = PyGSampler([5, 10, 15])
+        # Load model Architechture
+        self.model = RGAT(
+            backend="dgl",
+            device=device,
+            graph=self.graph,
+            in_feats=1024,
+            h_feats=512,
+            num_classes=2983,
+            num_layers=len(self.fan_out),
+            n_heads=4
+        ).to(self.type).to(self.device)
+        self.model.eval()
+        # Load model checkpoint
+        ckpt = None
+        if ckpt_path is not None:
+            try:
+                ckpt = torch.load(ckpt_path, map_location=self.device)
+            except FileNotFoundError as e:
+                print(f"Checkpoint file not found: {e}")
+                return -1
+        if ckpt is not None:
+            self.model.load_state_dict(ckpt["model_state_dict"])
+
+    def version(self):
+        return torch.__version__
+
+    def name(self):
+        return "pytorch-SUT"
+
+    def image_format(self):
+        return "NCHW"
+
+    def load(self):
+        return self
+
+    def predict(self, inputs: torch.Tensor):
+        with torch.no_grad():
+            input_size = inputs.shape[0]
+            # Get batch
+            batch = self.neighbor_loader.sample(self.graph, {"paper": inputs})
+            batch_preds, batch_labels = self.model(
+                batch, self.device, self.feature_store)
+        return batch_preds
--- a/graph/R-GAT/benchmark-checklist.md
+++ b/graph/R-GAT/benchmark-checklist.md
+
+#### **1. Applicable Categories**
+- Datacenter
+
+---
+
+#### **2. Applicable Scenarios for Each Category**
+- Offline
+
+---
+
+#### **3. Applicable Compliance Tests**
+- TEST01
+
+---
+
+#### **4. Latency Threshold for Server Scenarios**
+- Not applicable
+
+---
+
+#### **5. Validation Dataset: Unique Samples**
+Number of **unique samples** in the validation dataset and the QSL size specified in 
+- [X] [inference policies benchmark section](https://github.com/mlcommons/inference_policies/blob/master/inference_rules.adoc#41-benchmarks)
+- [X] [mlperf.conf](https://github.com/mlcommons/inference/blob/master/loadgen/mlperf.conf)
+- [X] [Inference benchmark docs](https://github.com/mlcommons/inference/blob/docs/docs/index.md)
+  *(Ensure QSL size overflows the system cache if possible.)*
+
+---
+
+#### **6. Equal Issue Mode Applicability**
+Documented whether **Equal Issue Mode** is applicable in 
+- [X] [mlperf.conf](https://github.com/mlcommons/inference/blob/master/loadgen/mlperf.conf#L42)
+- [X] [Inference benchmark docs](https://github.com/mlcommons/inference/blob/docs/docs/index.md)
+  *(Relevant if sample processing times are inconsistent across inputs.)*
+
+---
+
+#### **7. Expected Accuracy and `accuracy.txt` Contents**
+- [X] Expected accuracy updated in the [inference policies](https://github.com/mlcommons/inference_policies/blob/master/inference_rules.adoc#41-benchmarks)
+- [X] `accuracy.txt` file generated by the reference accuracy script from the MLPerf accuracy log and is validated by the submission checker.
+
+---
+
+#### **8. Reference Model Details**
+- [X] Reference model details updated in [Inference benchmark docs](https://github.com/mlcommons/inference/blob/docs/docs/index.md)  
+
+---
+
+#### **9. Reference Implementation Dataset Coverage**
+- [X] Reference implementation successfully processes the entire validation dataset during:
+  - [X] Performance runs
+  - [X] Accuracy runs
+  - [X] Compliance runs  
+- [X] Valid log files passing the submission checker are generated for all runs - [link](https://github.com/mlcommons/mlperf_inference_unofficial_submissions_v5.0/tree/main/closed/MLCommons/results/mlc-server-reference-gpu-pytorch_v2.4.0-cu124/rgat/offline/performance/run_1).
+
+---
+
+#### **10. Test Runs with Smaller Input Sets**
+- [X] Verified the reference implementation can perform test runs with a smaller subset of inputs for:
+  - [X] Performance runs
+  - [X] Accuracy runs
+
+---
+
+#### **11. Dataset and Reference Model Instructions**
+- [X] Clear instructions provided for:
+  - [X] Downloading the dataset and reference model.
+  - [X] Using the dataset and model for the benchmark.
+
+---
+
+#### **12. Documentation of Recommended System Requirements to run the reference implementation**
+- [X] Added [here](https://github.com/mlcommons/inference/blob/docs/docs/system_requirements.yml#L44)
+
+---
+
+#### **13. Submission Checker Modifications**
+- [X] All necessary changes made to the **submission checker** to validate the benchmark.
+
+---
+
+#### **14. Sample Log Files**
+- [X] Include sample logs for all the applicable scenario runs:
+  - [X] Offline 
+    - [X] [`mlperf_log_summary.txt`](https://github.com/mlcommons/mlperf_inference_unofficial_submissions_v5.0/blob/main/closed/MLCommons/results/mlc-server-reference-gpu-pytorch_v2.4.0-cu124/rgat/offline/performance/run_1/mlperf_log_summary.txt)
+    - [X] [`mlperf_log_detail.txt`](https://github.com/mlcommons/mlperf_inference_unofficial_submissions_v5.0/blob/main/closed/MLCommons/results/mlc-server-reference-gpu-pytorch_v2.4.0-cu124/rgat/offline/performance/run_1/mlperf_log_detail.txt)  
+- [X] Ensure sample logs successfully pass the submission checker and applicable compliance runs. [Link](https://htmlpreview.github.io/?https://github.com/mlcommons/mlperf_inference_unofficial_submissions_v5.0/blob/refs/heads/auto-update/closed/MLCommons/results/mlc-server-reference-gpu-pytorch_v2.4.0-cu124/summary.html)
--- a/graph/R-GAT/dataset.py
+++ b/graph/R-GAT/dataset.py
+"""
+dataset related classes and methods
+"""
+
+# pylint: disable=unused-argument,missing-docstring
+
+import logging
+
+
+logging.basicConfig(level=logging.INFO)
+log = logging.getLogger("dataset")
+
+
+class Dataset:
+    def __init__(self):
+        pass
+
+    def preprocess(self, use_cache=True):
+        raise NotImplementedError("Dataset:preprocess")
+
+    def get_item_count(self):
+        return NotImplementedError("Dataset:get_item_count")
+
+    def get_list(self):
+        raise NotImplementedError("Dataset:get_list")
+
+    def load_query_samples(self, sample_list):
+        pass
+
+    def unload_query_samples(self, sample_list):
+        pass
+
+    def get_samples(self, id_list):
+        pass
+
+    def get_item(self, id):
+        raise NotImplementedError("Dataset:get_item")
+
+
+def preprocess(id):
+    return id
--- a/graph/R-GAT/dgl_utilities/components.py
+++ b/graph/R-GAT/dgl_utilities/components.py
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import math
+
+from dgl_utilities.pyg_sampler import PyGSampler
+
+DGL_AVAILABLE = True
+
+try:
+    import dgl
+except ModuleNotFoundError:
+    DGL_AVAILABLE = False
+    dgl = None
+
+
+def check_dgl_available():
+    assert DGL_AVAILABLE, "DGL Not available in the container"
+
+
+def build_graph(graph_structure, backend, features=None):
+    assert graph_structure.separate_sampling_aggregation or (features is not None), \
+        "Either we need a feature to build the graph, or \
+            we should specify to separate sampling from aggregation"
+
+    if backend.lower() == "dgl":
+        check_dgl_available()
+
+        graph = dgl.heterograph(graph_structure.edge_dict)
+        graph.predict = "paper"
+
+        if features is not None:
+            for node, node_feature in features.feature.items():
+                if graph.num_nodes(ntype=node) < node_feature.shape[0]:
+                    graph.add_nodes(
+                        node_feature.shape[0] -
+                        graph.num_nodes(
+                            ntype=node),
+                        ntype=node)
+                else:
+                    assert graph.num_nodes(ntype=node) == node_feature.shape[0], f"\
+                    Graph has more {node} nodes ({graph.num_nodes(ntype=node)}) \
+                        than feature shape ({node_feature.shape[0]})"
+
+                if not graph_structure.separate_sampling_aggregation:
+                    for node, node_feature in features.feature.items():
+                        graph.nodes[node].data['feat'] = node_feature
+                        setattr(
+                            graph,
+                            f"num_{node}_nodes",
+                            node_feature.shape[0])
+
+        graph = dgl.remove_self_loop(graph, etype="cites")
+        graph = dgl.add_self_loop(graph, etype="cites")
+
+        graph.nodes['paper'].data['label'] = graph_structure.label
+
+        return graph
+    else:
+        assert False, "Unrecognized backend " + backend
+
+
+def get_sampler(use_pyg_sampler=False):
+    if use_pyg_sampler:
+        return PyGSampler
+    else:
+        return dgl.dataloading.MultiLayerNeighborSampler
+
+
+def get_loader(graph, index, fanouts, backend, use_pyg_sampler=True, **kwargs):
+    if backend.lower() == "dgl":
+        check_dgl_available()
+        fanouts = [int(fanout) for fanout in fanouts.split(",")]
+        return dgl.dataloading.DataLoader(
+            graph, {"paper": index},
+            get_sampler(use_pyg_sampler=use_pyg_sampler)(fanouts),
+            **kwargs
+        )
+    else:
+        assert False, "Unrecognized backend " + backend
+
+
+def glorot(value):
+    if isinstance(value, torch.Tensor):
+        stdv = math.sqrt(6.0 / (value.size(-2) + value.size(-1)))
+        value.data.uniform_(-stdv, stdv)
+    else:
+        for v in value.parameters() if hasattr(value, 'parameters') else []:
+            glorot(v)
+        for v in value.buffers() if hasattr(value, 'buffers') else []:
+            glorot(v)
+
+
+class GATPatched(dgl.nn.pytorch.GATConv):
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+
+    def reset_parameters(self):
+        if hasattr(self, 'fc'):
+            glorot(self.fc.weight)
+        else:
+            glorot(self.fc_src.weight)
+            glorot(self.fc_dst.weight)
+        glorot(self.attn_l)
+        glorot(self.attn_r)
+        if self.bias is not None:
+            nn.init.constant_(self.bias, 0)
+        if isinstance(self.res_fc, nn.Linear):
+            glorot(self.res_fc.weight)
+
+
+class RGAT_DGL(nn.Module):
+    def __init__(
+            self,
+            etypes,
+            in_feats, h_feats, num_classes,
+            num_layers=2, n_heads=4, dropout=0.2,
+            with_trim=None):
+        super().__init__()
+        self.layers = nn.ModuleList()
+
+        # does not support other models since they are not used
+        self.layers.append(dgl.nn.pytorch.HeteroGraphConv({
+            etype: GATPatched(in_feats, h_feats // n_heads, n_heads)
+            for etype in etypes}))
+
+        for _ in range(num_layers - 2):
+            self.layers.append(dgl.nn.pytorch.HeteroGraphConv({
+                etype: GATPatched(h_feats, h_feats // n_heads, n_heads)
+                for etype in etypes}))
+
+        self.layers.append(dgl.nn.pytorch.HeteroGraphConv({
+            etype: GATPatched(h_feats, h_feats // n_heads, n_heads)
+            for etype in etypes}))
+        self.dropout = nn.Dropout(dropout)
+        self.linear = nn.Linear(h_feats, num_classes)
+
+    def forward(self, blocks, x):
+        h = x
+        for l, (layer, block) in enumerate(zip(self.layers, blocks)):
+            h = layer(block, h)
+            h = dgl.apply_each(
+                h, lambda x: x.view(
+                    x.shape[0], x.shape[1] * x.shape[2]))
+            if l != len(self.layers) - 1:
+                h = dgl.apply_each(h, F.leaky_relu)
+                h = dgl.apply_each(h, self.dropout)
+        return self.linear(h['paper'])
+
+    def extract_graph_structure(self, batch, device):
+        # moves all blocks to device
+        return [block.to(device) for block in batch[-1]]
+
+    def extract_inputs_and_outputs(self, sampled_subgraph, device, features):
+        # input to the batch argument would be a list of blocks
+        # the sampled sbgraph is already moved to device in
+        # extract_graph_structure
+
+        # in case if the input feature is not stored on the graph,
+        # but rather in shared memory: (separate_sampling_aggregation)
+        # we use this method to extract them based on the blocks
+        if features is None or features.feature == {}:
+            batch_inputs = {
+                key: value.to(torch.float32)
+                for key, value in sampled_subgraph[0].srcdata['feat'].items()
+            }
+        else:
+            batch_inputs = features.get_input_features(
+                sampled_subgraph[0].srcdata[dgl.NID],
+                device
+            )
+        batch_labels = sampled_subgraph[-1].dstdata['label']['paper']
+        return batch_inputs, batch_labels
+
+
+class RGAT(torch.nn.Module):
+    def __init__(self, backend, device, graph, **model_kwargs):
+        super().__init__()
+        self.backend = backend.lower()
+        if backend.lower() == "dgl":
+            check_dgl_available()
+            etypes = graph.etypes
+            self.model = RGAT_DGL(etypes=etypes, **model_kwargs)
+        else:
+            assert False, "Unrecognized backend " + backend
+
+        self.device = device
+        self.layers = self.model.layers
+
+    def forward(self, batch, device, features):
+        # a general method to get the batches and move them to the
+        # corresponding device
+        batch = self.model.extract_graph_structure(batch, device)
+
+        # a general method to fetch the features given the sampled blocks
+        # and move them to corresponding device
+        batch_inputs, batch_labels = self.model.extract_inputs_and_outputs(
+            sampled_subgraph=batch,
+            device=device,
+            features=features,
+        )
+        return self.model.forward(batch, batch_inputs), batch_labels
--- a/graph/R-GAT/dgl_utilities/feature_fetching.py
+++ b/graph/R-GAT/dgl_utilities/feature_fetching.py
+import torch
+import os
+import concurrent.futures
+import os.path as osp
+import numpy as np
+from typing import Literal
+
+
+def float2half(base_path, dataset_size):
+    paper_nodes_num = {
+        "tiny": 100000,
+        "small": 1000000,
+        "medium": 10000000,
+        "large": 100000000,
+        "full": 269346174,
+    }
+    author_nodes_num = {
+        "tiny": 357041,
+        "small": 1926066,
+        "medium": 15544654,
+        "large": 116959896,
+        "full": 277220883,
+    }
+    # paper node
+    paper_feat_path = os.path.join(base_path, "paper", "node_feat.npy")
+    paper_fp16_feat_path = os.path.join(
+        base_path, "paper", "node_feat_fp16.pt")
+    if not os.path.exists(paper_fp16_feat_path):
+        if dataset_size in ["large", "full"]:
+            num_paper_nodes = paper_nodes_num[dataset_size]
+            paper_node_features = torch.from_numpy(
+                np.memmap(
+                    paper_feat_path,
+                    dtype="float32",
+                    mode="r",
+                    shape=(num_paper_nodes, 1024),
+                )
+            )
+        else:
+            paper_node_features = torch.from_numpy(
+                np.load(paper_feat_path, mmap_mode="r")
+            )
+        paper_node_features = paper_node_features.half()
+        torch.save(paper_node_features, paper_fp16_feat_path)
+
+    # author node
+    author_feat_path = os.path.join(base_path, "author", "node_feat.npy")
+    author_fp16_feat_path = os.path.join(
+        base_path, "author", "node_feat_fp16.pt")
+    if not os.path.exists(author_fp16_feat_path):
+        if dataset_size in ["large", "full"]:
+            num_author_nodes = author_nodes_num[dataset_size]
+            author_node_features = torch.from_numpy(
+                np.memmap(
+                    author_feat_path,
+                    dtype="float32",
+                    mode="r",
+                    shape=(num_author_nodes, 1024),
+                )
+            )
+        else:
+            author_node_features = torch.from_numpy(
+                np.load(author_feat_path, mmap_mode="r")
+            )
+        author_node_features = author_node_features.half()
+        torch.save(author_node_features, author_fp16_feat_path)
+
+    # institute node
+    institute_feat_path = os.path.join(base_path, "institute", "node_feat.npy")
+    institute_fp16_feat_path = os.path.join(
+        base_path, "institute", "node_feat_fp16.pt")
+    if not os.path.exists(institute_fp16_feat_path):
+        institute_node_features = torch.from_numpy(
+            np.load(institute_feat_path, mmap_mode="r")
+        )
+        institute_node_features = institute_node_features.half()
+        torch.save(institute_node_features, institute_fp16_feat_path)
+
+    # fos node
+    fos_feat_path = os.path.join(base_path, "fos", "node_feat.npy")
+    fos_fp16_feat_path = os.path.join(base_path, "fos", "node_feat_fp16.pt")
+    if not os.path.exists(fos_fp16_feat_path):
+        fos_node_features = torch.from_numpy(
+            np.load(fos_feat_path, mmap_mode="r"))
+        fos_node_features = fos_node_features.half()
+        torch.save(fos_node_features, fos_fp16_feat_path)
+
+    # conference node
+    conference_feat_path = os.path.join(
+        base_path, "conference", "node_feat.npy")
+    conference_fp16_feat_path = os.path.join(
+        base_path, "conference", "node_feat_fp16.pt"
+    )
+    if not os.path.exists(conference_fp16_feat_path):
+        conference_node_features = torch.from_numpy(
+            np.load(conference_feat_path, mmap_mode="r")
+        )
+        conference_node_features = conference_node_features.half()
+        torch.save(conference_node_features, conference_fp16_feat_path)
+
+    # journal node
+    journal_feat_path = os.path.join(base_path, "journal", "node_feat.npy")
+    journal_fp16_feat_path = os.path.join(
+        base_path, "journal", "node_feat_fp16.pt")
+    if not os.path.exists(journal_fp16_feat_path):
+        journal_node_features = torch.from_numpy(
+            np.load(journal_feat_path, mmap_mode="r")
+        )
+        journal_node_features = journal_node_features.half()
+        torch.save(journal_node_features, journal_fp16_feat_path)
+
+
+class IGBH:
+    def __init__(
+        self,
+        data_path,
+        name="igbh",
+        dataset_size="full",
+        use_label_2K=True,
+        in_memory=False,
+        layout: Literal["CSC", "CSR", "COO"] = "COO",
+        type: Literal["fp16", "fp32"] = "fp16",
+        device="cpu",
+        edge_dir="in",
+        **kwargs,
+    ):
+        super().__init__()
+        self.data_path = data_path
+        self.name = name
+        self.size = dataset_size
+        self.igbh_dataset = IGBHeteroGraphStructure(
+            data_path,
+            dataset_size=dataset_size,
+            in_memory=in_memory,
+            use_label_2K=use_label_2K,
+            layout=layout,
+            use_fp16=(type == "fp16")
+        )
+        self.num_samples = len(self.igbh_dataset.val_idx)
+
+    def get_samples(self, id_list):
+        return self.igbh_dataset.val_idx[id_list]
+
+    def get_labels(self, id_list):
+        return self.igbh_dataset.label[self.get_samples(id_list)]
+
+    def get_item_count(self):
+        return len(self.igbh_dataset.val_idx)
+
+    def load_query_samples(self, id):
+        pass
+
+    def unload_query_samples(self, sample_list):
+        pass
+
+
+class IGBHeteroGraphStructure:
+    """
+    Synchronously (optionally parallelly) loads the edge relations for IGBH.
+    Current IGBH edge relations are not yet converted to torch tensor.
+    """
+
+    def __init__(
+        self,
+        data_path,
+        dataset_size="full",
+        use_label_2K=True,
+        in_memory=False,
+        use_fp16=True,
+        # in-memory and memory-related optimizations
+        separate_sampling_aggregation=False,
+        # perf related
+        multithreading=True,
+        **kwargs,
+    ):
+
+        self.dir = data_path
+        self.dataset_size = dataset_size
+        self.use_fp16 = use_fp16
+        self.in_memory = in_memory
+        self.use_label_2K = use_label_2K
+        self.num_classes = 2983 if not self.use_label_2K else 19
+        self.label_file = "node_label_19.npy" if not self.use_label_2K else "node_label_2K.npy"
+
+        self.num_nodes = {
+            "full": {'paper': 269346174, 'author': 277220883, 'institute': 26918, 'fos': 712960, 'journal': 49052, 'conference': 4547},
+            "small": {'paper': 1000000, 'author': 1926066, 'institute': 14751, 'fos': 190449, 'journal': 15277, 'conference': 1215},
+            "medium": {'paper': 10000000, 'author': 15544654, 'institute': 23256, 'fos': 415054, 'journal': 37565, 'conference': 4189},
+            "large": {'paper': 100000000, 'author': 116959896, 'institute': 26524, 'fos': 649707, 'journal': 48820, 'conference': 4490},
+            "tiny": {'paper': 100000, 'author': 357041, 'institute': 8738, 'fos': 84220, 'journal': 8101, 'conference': 398}
+        }[self.dataset_size]
+
+        self.use_journal_conference = True
+        self.separate_sampling_aggregation = separate_sampling_aggregation
+
+        self.torch_tensor_input_dir = data_path
+        self.torch_tensor_input = self.torch_tensor_input_dir != ""
+
+        self.multithreading = multithreading
+
+        # This class only stores the edge data, labels, and the train/val
+        # indices
+        self.edge_dict = self.load_edge_dict()
+        self.label = self.load_labels()
+        self.full_num_trainable_nodes = (
+            227130858 if self.num_classes != 2983 else 157675969)
+        self.train_idx, self.val_idx = self.get_train_val_test_indices()
+        if self.use_fp16:
+            float2half(
+                os.path.join(
+                    self.dir,
+                    self.dataset_size,
+                    "processed"),
+                self.dataset_size)
+
+    def load_edge_dict(self):
+        mmap_mode = None if self.in_memory else "r"
+
+        edges = [
+            "paper__cites__paper",
+            "paper__written_by__author",
+            "author__affiliated_to__institute",
+            "paper__topic__fos"]
+        if self.use_journal_conference:
+            edges += ["paper__published__journal", "paper__venue__conference"]
+
+        loaded_edges = None
+
+        def load_edge(edge, mmap=mmap_mode, parent_path=osp.join(
+                self.dir, self.dataset_size, "processed")):
+            return edge, torch.from_numpy(
+                np.load(osp.join(parent_path, edge, "edge_index.npy"), mmap_mode=mmap))
+
+        if self.multithreading:
+            with concurrent.futures.ThreadPoolExecutor() as executor:
+                loaded_edges = executor.map(load_edge, edges)
+            loaded_edges = {
+                tuple(edge.split("__")): (edge_index[:, 0], edge_index[:, 1]) for edge, edge_index in loaded_edges
+            }
+        else:
+            loaded_edges = {
+                tuple(edge.split("__")): (edge_index[:, 0], edge_index[:, 1])
+                for edge, edge_index in map(load_edge, edges)
+            }
+
+        return self.augment_edges(loaded_edges)
+
+    def load_labels(self):
+        if self.dataset_size not in ['full', 'large']:
+            return torch.from_numpy(
+                np.load(
+                    osp.join(
+                        self.dir,
+                        self.dataset_size,
+                        'processed',
+                        'paper',
+                        self.label_file)
+                )
+            ).to(torch.long)
+        else:
+            return torch.from_numpy(
+                np.memmap(
+                    osp.join(
+                        self.dir,
+                        self.dataset_size,
+                        'processed',
+                        'paper',
+                        self.label_file
+                    ),
+                    dtype='float32',
+                    mode='r',
+                    shape=(
+                        (269346174 if self.dataset_size == "full" else 100000000)
+                    )
+                )
+            ).to(torch.long)
+
+    def augment_edges(self, edge_dict):
+        # Adds reverse edge connections to the graph
+        # add rev_{edge} to every edge except paper-cites-paper
+        edge_dict.update(
+            {
+                (dst, f"rev_{edge}", src): (dst_idx, src_idx)
+                for (src, edge, dst), (src_idx, dst_idx) in edge_dict.items()
+                if src != dst
+            }
+        )
+
+        paper_cites_paper = edge_dict[("paper", 'cites', 'paper')]
+
+        self_loop = torch.arange(self.num_nodes['paper'])
+        mask = paper_cites_paper[0] != paper_cites_paper[1]
+
+        paper_cites_paper = (
+            torch.cat((paper_cites_paper[0][mask], self_loop.clone())),
+            torch.cat((paper_cites_paper[1][mask], self_loop.clone()))
+        )
+
+        edge_dict[("paper", 'cites', 'paper')] = (
+            torch.cat((paper_cites_paper[0], paper_cites_paper[1])),
+            torch.cat((paper_cites_paper[1], paper_cites_paper[0]))
+        )
+
+        return edge_dict
+
+    def get_train_val_test_indices(self):
+        base_dir = osp.join(self.dir, self.dataset_size, "processed")
+        assert osp.exists(osp.join(base_dir, "train_idx.pt")) and osp.exists(osp.join(base_dir, "val_idx.pt")), \
+            "Train and validation indices not found. Please run GLT's split_seeds.py first."
+
+        return (
+            torch.load(
+                osp.join(
+                    self.dir,
+                    self.dataset_size,
+                    "processed",
+                    "train_idx.pt")),
+            torch.load(
+                osp.join(
+                    self.dir,
+                    self.dataset_size,
+                    "processed",
+                    "val_idx.pt"))
+        )
+
+
+class Features:
+    """
+    Lazily initializes the features for IGBH.
+
+    Features will be initialized only when *build_features* is called.
+
+    Features will be placed into shared memory when *share_features* is called
+    or if the features are built (either mmap-ed or loaded in memory)
+    and *torch.multiprocessing.spawn* is called
+    """
+
+    def __init__(self, path, dataset_size, in_memory=True, use_fp16=True):
+        self.path = path
+        self.dataset_size = dataset_size
+        self.in_memory = in_memory
+        self.use_fp16 = use_fp16
+        if self.use_fp16:
+            self.dtype = torch.float16
+        else:
+            self.dtype = torch.float32
+        self.feature = {}
+
+    def build_features(self, use_journal_conference=False,
+                       multithreading=False):
+        node_types = ['paper', 'author', 'institute', 'fos']
+        if use_journal_conference or self.dataset_size in ['large', 'full']:
+            node_types += ['conference', 'journal']
+
+        if multithreading:
+            def load_feature(feature_store, feature_name):
+                return feature_store.load(feature_name), feature_name
+
+            with concurrent.futures.ThreadPoolExecutor() as executor:
+                loaded_features = executor.map(
+                    load_feature, [(self, ntype) for ntype in node_types])
+                self.feature = {
+                    node_type: feature_value for feature_value, node_type in loaded_features
+                }
+        else:
+            for node_type in node_types:
+                self.feature[node_type] = self.load(node_type)
+
+    def share_features(self):
+        for node_type in self.feature:
+            self.feature[node_type] = self.feature[node_type].share_memory_()
+
+    def load_from_tensor(self, node):
+        return torch.load(osp.join(self.path, self.dataset_size,
+                          "processed", node, "node_feat_fp16.pt"))
+
+    def load_in_memory_numpy(self, node):
+        return torch.from_numpy(np.load(
+            osp.join(self.path, self.dataset_size, 'processed', node, 'node_feat.npy')))
+
+    def load_mmap_numpy(self, node):
+        """
+        Loads a given numpy array through mmap_mode="r"
+        """
+        return torch.from_numpy(np.load(osp.join(
+            self.path, self.dataset_size, "processed", node, "node_feat.npy"), mmap_mode="r"))
+
+    def memmap_mmap_numpy(self, node):
+        """
+        Loads a given NumPy array through memory-mapping np.memmap.
+
+        This is the same code as the one provided in IGB codebase.
+        """
+        shape = [None, 1024]
+        if self.dataset_size == "full":
+            if node == "paper":
+                shape[0] = 269346174
+            elif node == "author":
+                shape[0] = 277220883
+        elif self.dataset_size == "large":
+            if node == "paper":
+                shape[0] = 100000000
+            elif node == "author":
+                shape[0] = 116959896
+
+        assert shape[0] is not None
+        return torch.from_numpy(np.memmap(osp.join(self.path, self.dataset_size,
+                                "processed", node, "node_feat.npy"), dtype="float32", mode='r', shape=tuple(shape)))
+
+    def load(self, node):
+        if self.in_memory:
+            if self.use_fp16:
+                return self.load_from_tensor(node)
+            else:
+                if self.dataset_size in [
+                        'large', 'full'] and node in ['paper', 'author']:
+                    return self.memmap_mmap_numpy(node)
+                else:
+                    return self.load_in_memory_numpy(node)
+        else:
+            if self.dataset_size in [
+                    'large', 'full'] and node in ['paper', 'author']:
+                return self.memmap_mmap_numpy(node)
+            else:
+                return self.load_mmap_numpy(node)
+
+    def get_input_features(self, input_dict, device):
+        # fetches the batch inputs
+        # moving it here so so that future modifications could be easier
+        return {
+            key: self.feature[key][value.to(torch.device("cpu")), :].to(
+                device).to(self.dtype)
+            for key, value in input_dict.items()
+        }
--- a/graph/R-GAT/dgl_utilities/pyg_sampler.py
+++ b/graph/R-GAT/dgl_utilities/pyg_sampler.py
+import dgl
+import torch
+
+
+class PyGSampler(dgl.dataloading.Sampler):
+    r"""
+    An example DGL sampler implementation that matches PyG/GLT sampler behavior.
+    The following differences need to be addressed:
+    1.  PyG/GLT applies conv_i to edges in layer_i, and all subsequent layers, while DGL only applies conv_i to edges in layer_i.
+        For instance, consider a path a->b->c. At layer 0,
+        DGL updates only node b's embedding with a->b, but
+        PyG/GLT updates both node b and c's embeddings.
+        Therefore, if we use h_i(x) to denote the hidden representation of node x at layer i, then the output h_2(c) is:
+            DGL:     h_2(c) = conv_2(h_1(c), h_1(b)) = conv_2(h_0(c), conv_1(h_0(b), h_0(a)))
+            PyG/GLT: h_2(c) = conv_2(h_1(c), h_1(b)) = conv_2(conv_1(h_0(c), h_0(b)), conv_1(h_0(b), h_0(a)))
+    2.  When creating blocks for layer i-1, DGL not only uses the destination nodes from layer i,
+        but also includes all subsequent i+1 ... n layers' destination nodes as seed nodes.
+    More discussions and examples can be found here: https://github.com/alibaba/graphlearn-for-pytorch/issues/79.
+    """
+
+    def __init__(self, fanouts, num_threads=1):
+        super().__init__()
+        self.fanouts = fanouts
+        self.num_threads = num_threads
+
+    def sample(self, g, seed_nodes):
+        if self.num_threads != 1:
+            old_num_threads = torch.get_num_threads()
+            torch.set_num_threads(self.num_threads)
+        output_nodes = seed_nodes
+        subgs = []
+        previous_edges = {}
+        previous_seed_nodes = seed_nodes
+        input_nodes = seed_nodes
+
+        device = None
+        for key in seed_nodes:
+            device = seed_nodes[key].device
+
+        not_sampled = {
+            ntype: torch.ones([g.num_nodes(ntype)], dtype=torch.bool, device=device) for ntype in g.ntypes
+        }
+
+        for fanout in reversed(self.fanouts):
+            for node_type in seed_nodes:
+                not_sampled[node_type][seed_nodes[node_type]] = 0
+
+            # Sample a fixed number of neighbors of the current seed nodes.
+            sg = g.sample_neighbors(seed_nodes, fanout)
+
+            # Before we add the edges, we need to first record the source nodes (of the current seed nodes)
+            # so that other edges' source nodes will not be included as next
+            # layer's seed nodes.
+            temp = dgl.to_block(sg, previous_seed_nodes,
+                                include_dst_in_src=False)
+            seed_nodes = temp.srcdata[dgl.NID]
+
+            # GLT/PyG does not sample again on previously-sampled nodes
+            # we mimic this behavior here
+            for node_type in g.ntypes:
+                seed_nodes[node_type] = seed_nodes[node_type][not_sampled[node_type]
+                                                              [seed_nodes[node_type]]]
+
+            # We add all previously accumulated edges to this subgraph
+            for etype in previous_edges:
+                sg.add_edges(*previous_edges[etype], etype=etype)
+
+            # This subgraph now contains all its new edges
+            # and previously accumulated edges
+            # so we add them
+            previous_edges = {}
+            for etype in sg.etypes:
+                previous_edges[etype] = sg.edges(etype=etype)
+
+            # Convert this subgraph to a message flow graph.
+            # we need to turn on the include_dst_in_src
+            # so that we get compatibility with DGL's OOTB GATConv.
+            sg = dgl.to_block(sg, previous_seed_nodes, include_dst_in_src=True)
+
+            # for this layers seed nodes -
+            # they will be our next layers' destination nodes
+            # so we add them to the collection of previous seed nodes.
+            previous_seed_nodes = sg.srcdata[dgl.NID]
+
+            # we insert the block to our list of blocks
+            subgs.insert(0, sg)
+            input_nodes = seed_nodes
+        if self.num_threads != 1:
+            torch.set_num_threads(old_num_threads)
+        return input_nodes, output_nodes, subgs
--- a/graph/R-GAT/dockerfile.cpu
+++ b/graph/R-GAT/dockerfile.cpu
+FROM ubuntu:22.04
+
+ENV PYTHON_VERSION=3.10
+ENV LANG C.UTF-8
+ENV LC_ALL C.UTF-8
+ENV PATH /opt/anaconda3/bin:$PATH
+
+WORKDIR /root
+ENV HOME /root
+
+RUN apt-get update
+
+RUN apt-get install -y --no-install-recommends \
+      git \
+      build-essential \
+      software-properties-common \
+      ca-certificates \
+      wget \
+      curl \
+      htop \
+      zip \
+      unzip
+
+
+
+# Install conda
+RUN arch=$(uname -m) && \
+    if [ "$arch" = "x86_64" ]; then \
+    MINICONDA_URL="https://repo.anaconda.com/miniconda/Miniconda3-py310_24.9.2-0-Linux-x86_64.sh"; \
+    elif [ "$arch" = "aarch64" ]; then \
+    MINICONDA_URL="https://repo.anaconda.com/miniconda/Miniconda3-py310_24.9.2-0-Linux-aarch64.sh"; \
+    else \
+    echo "Unsupported architecture: $arch"; \
+    exit 1; \
+    fi && \
+    cd /opt && \
+    wget --quiet $MINICONDA_URL -O miniconda.sh && \
+    bash ./miniconda.sh -b -p /opt/anaconda3 && \
+    rm miniconda.sh && \
+    /opt/anaconda3/bin/conda clean -a && \
+    ln -s /opt/anaconda3/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
+    echo ". /opt/anaconda3/etc/profile.d/conda.sh" >> ~/.bashrc && \
+    echo "conda activate base" >> ~/.bashrc && \
+    conda config --set always_yes yes --set changeps1 no
+
+
+# Install requirements
+RUN conda install -c conda-forge libstdcxx-ng
+RUN pip install --upgrade pip
+RUN pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cpu
+COPY requirements.txt requirements.txt
+RUN pip install -r requirements.txt
+RUN cd /tmp && \
+    git clone --recursive https://github.com/mlcommons/inference && \
+    cd inference/loadgen && \
+    pip install pybind11 && \
+    CFLAGS="-std=c++14" python3 setup.py install
+
+RUN export TORCH_VERSION=$(python -c "import torch; print(torch.__version__)")
+RUN pip install torch-geometric torch-scatter torch-sparse -f https://data.pyg.org/whl/torch-${TORCH_VERSION}.html
+RUN pip install  dgl -f https://data.dgl.ai/wheels/torch-2.1/repo.html
+# Clean up
+RUN rm -rf mlperf \
+    rm requirements.txt
+
+
+ENTRYPOINT ["/bin/bash"]
--- a/graph/R-GAT/dockerfile.gpu
+++ b/graph/R-GAT/dockerfile.gpu
+ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:23.04-py3
+FROM ${FROM_IMAGE_NAME}
+
+SHELL ["/bin/bash", "-c"]
+
+ENV LC_ALL=C.UTF-8
+ENV LANG=C.UTF-8
+
+ENV TZ=US/Pacific
+ENV DEBIAN_FRONTEND=noninteractive
+
+RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
+RUN rm -rf /var/lib/apt/lists/* && rm -rf /etc/apt/sources.list.d/* \
+ && apt update \
+ && apt install -y --no-install-recommends build-essential autoconf \
+        libtool git ccache curl wget pkg-config sudo ca-certificates \
+        automake libssl-dev bc python3-dev python3-pip google-perftools \
+        gdb libglib2.0-dev clang sshfs libre2-dev libboost-dev \
+        libnuma-dev numactl sysstat sshpass ntpdate less iputils-ping \
+ && apt -y autoremove \
+ && apt remove -y cmake \
+ && apt install -y --no-install-recommends pkg-config zip g++ zlib1g-dev \
+        unzip libarchive-dev
+RUN apt install -y --no-install-recommends rsync
+
+# Upgrade pip
+RUN python3 -m pip install --upgrade pip
+
+RUN pip install torch-geometric torch-scatter torch-sparse -f https://pytorch-geometric.com/whl/torch-2.1.0+cu121.html
+RUN pip install  dgl -f https://data.dgl.ai/wheels/torch-2.1/cu121/repo.html
+
+COPY requirements.txt requirements.txt
+RUN pip install -r requirements.txt
+RUN cd /tmp && \
+    git clone --recursive https://github.com/mlcommons/inference && \
+    cd inference/loadgen && \
+    pip install pybind11 && \
+    CFLAGS="-std=c++14" python3 setup.py install
+
+# Clean up
+RUN rm -rf mlperf \
+    rm requirements.txt
\ No newline at end of file
--- a/graph/R-GAT/igbh.py
+++ b/graph/R-GAT/igbh.py
+"""
+implementation of coco dataset
+"""
+
+# pylint: disable=unused-argument,missing-docstring
+# Parts of this script were taken from:
+# https://github.com/mlcommons/training/blob/master/graph_neural_network/dataset.py
+# Specifically the float2half function and the IGBH class are
+# slightly modified copies.
+
+from typing import Literal
+from torch_geometric.utils import add_self_loops, remove_self_loops
+import torch
+import os
+import logging
+import argparse
+import dataset
+import numpy as np
+
+
+logging.basicConfig(level=logging.INFO)
+log = logging.getLogger("coco")
+
+
+def float2half(base_path, dataset_size):
+    paper_nodes_num = {
+        "tiny": 100000,
+        "small": 1000000,
+        "medium": 10000000,
+        "large": 100000000,
+        "full": 269346174,
+    }
+    author_nodes_num = {
+        "tiny": 357041,
+        "small": 1926066,
+        "medium": 15544654,
+        "large": 116959896,
+        "full": 277220883,
+    }
+    # paper node
+    paper_feat_path = os.path.join(base_path, "paper", "node_feat.npy")
+    paper_fp16_feat_path = os.path.join(
+        base_path, "paper", "node_feat_fp16.pt")
+    if not os.path.exists(paper_fp16_feat_path):
+        if dataset_size in ["large", "full"]:
+            num_paper_nodes = paper_nodes_num[dataset_size]
+            paper_node_features = torch.from_numpy(
+                np.memmap(
+                    paper_feat_path,
+                    dtype="float32",
+                    mode="r",
+                    shape=(num_paper_nodes, 1024),
+                )
+            )
+        else:
+            paper_node_features = torch.from_numpy(
+                np.load(paper_feat_path, mmap_mode="r")
+            )
+        paper_node_features = paper_node_features.half()
+        torch.save(paper_node_features, paper_fp16_feat_path)
+
+    # author node
+    author_feat_path = os.path.join(base_path, "author", "node_feat.npy")
+    author_fp16_feat_path = os.path.join(
+        base_path, "author", "node_feat_fp16.pt")
+    if not os.path.exists(author_fp16_feat_path):
+        if dataset_size in ["large", "full"]:
+            num_author_nodes = author_nodes_num[dataset_size]
+            author_node_features = torch.from_numpy(
+                np.memmap(
+                    author_feat_path,
+                    dtype="float32",
+                    mode="r",
+                    shape=(num_author_nodes, 1024),
+                )
+            )
+        else:
+            author_node_features = torch.from_numpy(
+                np.load(author_feat_path, mmap_mode="r")
+            )
+        author_node_features = author_node_features.half()
+        torch.save(author_node_features, author_fp16_feat_path)
+
+    # institute node
+    institute_feat_path = os.path.join(base_path, "institute", "node_feat.npy")
+    institute_fp16_feat_path = os.path.join(
+        base_path, "institute", "node_feat_fp16.pt")
+    if not os.path.exists(institute_fp16_feat_path):
+        institute_node_features = torch.from_numpy(
+            np.load(institute_feat_path, mmap_mode="r")
+        )
+        institute_node_features = institute_node_features.half()
+        torch.save(institute_node_features, institute_fp16_feat_path)
+
+    # fos node
+    fos_feat_path = os.path.join(base_path, "fos", "node_feat.npy")
+    fos_fp16_feat_path = os.path.join(base_path, "fos", "node_feat_fp16.pt")
+    if not os.path.exists(fos_fp16_feat_path):
+        fos_node_features = torch.from_numpy(
+            np.load(fos_feat_path, mmap_mode="r"))
+        fos_node_features = fos_node_features.half()
+        torch.save(fos_node_features, fos_fp16_feat_path)
+
+    # conference node
+    conference_feat_path = os.path.join(
+        base_path, "conference", "node_feat.npy")
+    conference_fp16_feat_path = os.path.join(
+        base_path, "conference", "node_feat_fp16.pt"
+    )
+    if not os.path.exists(conference_fp16_feat_path):
+        conference_node_features = torch.from_numpy(
+            np.load(conference_feat_path, mmap_mode="r")
+        )
+        conference_node_features = conference_node_features.half()
+        torch.save(conference_node_features, conference_fp16_feat_path)
+
+    # journal node
+    journal_feat_path = os.path.join(base_path, "journal", "node_feat.npy")
+    journal_fp16_feat_path = os.path.join(
+        base_path, "journal", "node_feat_fp16.pt")
+    if not os.path.exists(journal_fp16_feat_path):
+        journal_node_features = torch.from_numpy(
+            np.load(journal_feat_path, mmap_mode="r")
+        )
+        journal_node_features = journal_node_features.half()
+        torch.save(journal_node_features, journal_fp16_feat_path)
+
+
+class IGBHeteroDataset(object):
+    def __init__(
+        self,
+        path,
+        dataset_size="tiny",
+        in_memory=False,
+        use_label_2K=False,
+        with_edges=True,
+        layout: Literal["CSC", "CSR", "COO"] = "COO",
+        use_fp16=False,
+    ):
+        self.dir = path
+        self.dataset_size = dataset_size
+        self.in_memory = in_memory
+        self.use_label_2K = use_label_2K
+        self.with_edges = with_edges
+        self.layout = layout
+        self.use_fp16 = use_fp16
+
+        self.ntypes = [
+            "paper",
+            "author",
+            "institute",
+            "fos",
+            "journal",
+            "conference"]
+        self.etypes = None
+        self.edge_dict = {}
+        self.feat_dict = {}
+        self.paper_nodes_num = {
+            "tiny": 100000,
+            "small": 1000000,
+            "medium": 10000000,
+            "large": 100000000,
+            "full": 269346174,
+        }
+        self.author_nodes_num = {
+            "tiny": 357041,
+            "small": 1926066,
+            "medium": 15544654,
+            "large": 116959896,
+            "full": 277220883,
+        }
+        # 'paper' nodes.
+        self.label = None
+        self.train_idx = None
+        self.val_idx = None
+        self.test_idx = None
+        self.base_path = os.path.join(path, self.dataset_size, "processed")
+        if self.use_fp16:
+            float2half(self.base_path, self.dataset_size)
+        self.process()
+
+    def process(self):
+        # load edges
+        if self.with_edges:
+            if self.layout == "COO":
+                if self.in_memory:
+                    paper_paper_edges = torch.from_numpy(
+                        np.load(
+                            os.path.join(
+                                self.base_path, "paper__cites__paper", "edge_index.npy"
+                            )
+                        )
+                    ).t()
+                    author_paper_edges = torch.from_numpy(
+                        np.load(
+                            os.path.join(
+                                self.base_path,
+                                "paper__written_by__author",
+                                "edge_index.npy",
+                            )
+                        )
+                    ).t()
+                    affiliation_author_edges = torch.from_numpy(
+                        np.load(
+                            os.path.join(
+                                self.base_path,
+                                "author__affiliated_to__institute",
+                                "edge_index.npy",
+                            )
+                        )
+                    ).t()
+                    paper_fos_edges = torch.from_numpy(
+                        np.load(
+                            os.path.join(
+                                self.base_path, "paper__topic__fos", "edge_index.npy"
+                            )
+                        )
+                    ).t()
+                    paper_published_journal = torch.from_numpy(
+                        np.load(
+                            os.path.join(
+                                self.base_path,
+                                "paper__published__journal",
+                                "edge_index.npy",
+                            )
+                        )
+                    ).t()
+                    paper_venue_conference = torch.from_numpy(
+                        np.load(
+                            os.path.join(
+                                self.base_path,
+                                "paper__venue__conference",
+                                "edge_index.npy",
+                            )
+                        )
+                    ).t()
+                else:
+                    paper_paper_edges = torch.from_numpy(
+                        np.load(
+                            os.path.join(
+                                self.base_path, "paper__cites__paper", "edge_index.npy"
+                            ),
+                            mmap_mode="r",
+                        )
+                    ).t()
+                    author_paper_edges = torch.from_numpy(
+                        np.load(
+                            os.path.join(
+                                self.base_path,
+                                "paper__written_by__author",
+                                "edge_index.npy",
+                            ),
+                            mmap_mode="r",
+                        )
+                    ).t()
+                    affiliation_author_edges = torch.from_numpy(
+                        np.load(
+                            os.path.join(
+                                self.base_path,
+                                "author__affiliated_to__institute",
+                                "edge_index.npy",
+                            ),
+                            mmap_mode="r",
+                        )
+                    ).t()
+                    paper_fos_edges = torch.from_numpy(
+                        np.load(
+                            os.path.join(
+                                self.base_path, "paper__topic__fos", "edge_index.npy"
+                            ),
+                            mmap_mode="r",
+                        )
+                    ).t()
+                    paper_published_journal = torch.from_numpy(
+                        np.load(
+                            os.path.join(
+                                self.base_path,
+                                "paper__published__journal",
+                                "edge_index.npy",
+                            ),
+                            mmap_mode="r",
+                        )
+                    ).t()
+                    paper_venue_conference = torch.from_numpy(
+                        np.load(
+                            os.path.join(
+                                self.base_path,
+                                "paper__venue__conference",
+                                "edge_index.npy",
+                            ),
+                            mmap_mode="r",
+                        )
+                    ).t()
+
+                cites_edge = add_self_loops(
+                    remove_self_loops(paper_paper_edges)[0])[0]
+                self.edge_dict = {
+                    ("paper", "cites", "paper"): (
+                        torch.cat([cites_edge[1, :], cites_edge[0, :]]),
+                        torch.cat([cites_edge[0, :], cites_edge[1, :]]),
+                    ),
+                    ("paper", "written_by", "author"): author_paper_edges,
+                    ("author", "affiliated_to", "institute"): affiliation_author_edges,
+                    ("paper", "topic", "fos"): paper_fos_edges,
+                    ("author", "rev_written_by", "paper"): (
+                        author_paper_edges[1, :],
+                        author_paper_edges[0, :],
+                    ),
+                    ("institute", "rev_affiliated_to", "author"): (
+                        affiliation_author_edges[1, :],
+                        affiliation_author_edges[0, :],
+                    ),
+                    ("fos", "rev_topic", "paper"): (
+                        paper_fos_edges[1, :],
+                        paper_fos_edges[0, :],
+                    ),
+                }
+
+                self.edge_dict[("paper", "published", "journal")] = (
+                    paper_published_journal
+                )
+                self.edge_dict[("paper", "venue", "conference")] = (
+                    paper_venue_conference
+                )
+                self.edge_dict[("journal", "rev_published", "paper")] = (
+                    paper_published_journal[1, :],
+                    paper_published_journal[0, :],
+                )
+                self.edge_dict[("conference", "rev_venue", "paper")] = (
+                    paper_venue_conference[1, :],
+                    paper_venue_conference[0, :],
+                )
+
+            # directly load from CSC or CSC files, which can be generated using
+            # compress_graph.py
+            else:
+                compress_edge_dict = {}
+                compress_edge_dict[("paper", "cites", "paper")
+                                   ] = "paper__cites__paper"
+                compress_edge_dict[("paper", "written_by", "author")] = (
+                    "paper__written_by__author"
+                )
+                compress_edge_dict[("author", "affiliated_to", "institute")] = (
+                    "author__affiliated_to__institute"
+                )
+                compress_edge_dict[("paper", "topic", "fos")
+                                   ] = "paper__topic__fos"
+                compress_edge_dict[("author", "rev_written_by", "paper")] = (
+                    "author__rev_written_by__paper"
+                )
+                compress_edge_dict[("institute", "rev_affiliated_to", "author")] = (
+                    "institute__rev_affiliated_to__author"
+                )
+                compress_edge_dict[("fos", "rev_topic", "paper")] = (
+                    "fos__rev_topic__paper"
+                )
+                compress_edge_dict[("paper", "published", "journal")] = (
+                    "paper__published__journal"
+                )
+                compress_edge_dict[("paper", "venue", "conference")] = (
+                    "paper__venue__conference"
+                )
+                compress_edge_dict[("journal", "rev_published", "paper")] = (
+                    "journal__rev_published__paper"
+                )
+                compress_edge_dict[("conference", "rev_venue", "paper")] = (
+                    "conference__rev_venue__paper"
+                )
+
+                for etype in compress_edge_dict.keys():
+                    edge_path = os.path.join(
+                        self.base_path, self.layout, compress_edge_dict[etype]
+                    )
+                    try:
+                        edge_path = os.path.join(
+                            self.base_path, self.layout, compress_edge_dict[etype]
+                        )
+                        indptr = torch.load(
+                            os.path.join(edge_path, "indptr.pt"))
+                        indices = torch.load(
+                            os.path.join(edge_path, "indices.pt"))
+                        if self.layout == "CSC":
+                            self.edge_dict[etype] = (indices, indptr)
+                        else:
+                            self.edge_dict[etype] = (indptr, indices)
+                    except FileNotFoundError as e:
+                        print(f"FileNotFound: {e}")
+                        exit()
+                    except Exception as e:
+                        print(f"Exception: {e}")
+                        exit()
+            self.etypes = list(self.edge_dict.keys())
+
+        # load features and labels
+        label_file = (
+            "node_label_19.npy" if not self.use_label_2K else "node_label_2K.npy"
+        )
+        paper_feat_path = os.path.join(
+            self.base_path, "paper", "node_feat.npy")
+        paper_lbl_path = os.path.join(self.base_path, "paper", label_file)
+        num_paper_nodes = self.paper_nodes_num[self.dataset_size]
+        if self.in_memory:
+            if self.use_fp16:
+                paper_node_features = torch.load(
+                    os.path.join(self.base_path, "paper", "node_feat_fp16.pt")
+                )
+            else:
+                paper_node_features = torch.from_numpy(
+                    np.load(paper_feat_path))
+        else:
+            if self.dataset_size in ["large", "full"]:
+                paper_node_features = torch.from_numpy(
+                    np.memmap(
+                        paper_feat_path,
+                        dtype="float32",
+                        mode="r",
+                        shape=(num_paper_nodes, 1024),
+                    )
+                )
+            else:
+                paper_node_features = torch.from_numpy(
+                    np.load(paper_feat_path, mmap_mode="r")
+                )
+        if self.dataset_size in ["large", "full"]:
+            paper_node_labels = torch.from_numpy(
+                np.memmap(
+                    paper_lbl_path, dtype="float32", mode="r", shape=(num_paper_nodes)
+                )
+            ).to(torch.long)
+        else:
+            paper_node_labels = torch.from_numpy(
+                np.load(paper_lbl_path)).to(
+                torch.long)
+        self.feat_dict["paper"] = paper_node_features
+        self.label = paper_node_labels
+
+        num_author_nodes = self.author_nodes_num[self.dataset_size]
+        author_feat_path = os.path.join(
+            self.base_path, "author", "node_feat.npy")
+        if self.in_memory:
+            if self.use_fp16:
+                author_node_features = torch.load(
+                    os.path.join(self.base_path, "author", "node_feat_fp16.pt")
+                )
+            else:
+                author_node_features = torch.from_numpy(
+                    np.load(author_feat_path))
+        else:
+            if self.dataset_size in ["large", "full"]:
+                author_node_features = torch.from_numpy(
+                    np.memmap(
+                        author_feat_path,
+                        dtype="float32",
+                        mode="r",
+                        shape=(num_author_nodes, 1024),
+                    )
+                )
+            else:
+                author_node_features = torch.from_numpy(
+                    np.load(author_feat_path, mmap_mode="r")
+                )
+        self.feat_dict["author"] = author_node_features
+
+        if self.in_memory:
+            if self.use_fp16:
+                institute_node_features = torch.load(
+                    os.path.join(
+                        self.base_path,
+                        "institute",
+                        "node_feat_fp16.pt")
+                )
+            else:
+                institute_node_features = torch.from_numpy(
+                    np.load(
+                        os.path.join(
+                            self.base_path,
+                            "institute",
+                            "node_feat.npy"))
+                )
+        else:
+            institute_node_features = torch.from_numpy(
+                np.load(
+                    os.path.join(self.base_path, "institute", "node_feat.npy"),
+                    mmap_mode="r",
+                )
+            )
+        self.feat_dict["institute"] = institute_node_features
+
+        if self.in_memory:
+            if self.use_fp16:
+                fos_node_features = torch.load(
+                    os.path.join(self.base_path, "fos", "node_feat_fp16.pt")
+                )
+            else:
+                fos_node_features = torch.from_numpy(
+                    np.load(
+                        os.path.join(
+                            self.base_path,
+                            "fos",
+                            "node_feat.npy"))
+                )
+        else:
+            fos_node_features = torch.from_numpy(
+                np.load(
+                    os.path.join(self.base_path, "fos", "node_feat.npy"), mmap_mode="r"
+                )
+            )
+        self.feat_dict["fos"] = fos_node_features
+
+        if self.in_memory:
+            if self.use_fp16:
+                conference_node_features = torch.load(
+                    os.path.join(
+                        self.base_path,
+                        "conference",
+                        "node_feat_fp16.pt")
+                )
+            else:
+                conference_node_features = torch.from_numpy(
+                    np.load(
+                        os.path.join(
+                            self.base_path,
+                            "conference",
+                            "node_feat.npy"))
+                )
+        else:
+            conference_node_features = torch.from_numpy(
+                np.load(
+                    os.path.join(
+                        self.base_path,
+                        "conference",
+                        "node_feat.npy"),
+                    mmap_mode="r",
+                )
+            )
+        self.feat_dict["conference"] = conference_node_features
+
+        if self.in_memory:
+            if self.use_fp16:
+                journal_node_features = torch.load(
+                    os.path.join(
+                        self.base_path,
+                        "journal",
+                        "node_feat_fp16.pt")
+                )
+            else:
+                journal_node_features = torch.from_numpy(
+                    np.load(
+                        os.path.join(
+                            self.base_path,
+                            "journal",
+                            "node_feat.npy"))
+                )
+        else:
+            journal_node_features = torch.from_numpy(
+                np.load(
+                    os.path.join(self.base_path, "journal", "node_feat.npy"),
+                    mmap_mode="r",
+                )
+            )
+        self.feat_dict["journal"] = journal_node_features
+
+        # Please ensure that train_idx and val_idx have been generated using
+        # split_seeds.py
+        try:
+            self.train_idx = torch.load(
+                os.path.join(
+                    self.base_path,
+                    "train_idx.pt"))
+            self.val_idx = torch.load(
+                os.path.join(
+                    self.base_path,
+                    "val_idx.pt"))
+        except FileNotFoundError as e:
+            print(
+                f"FileNotFound: {e}, please ensure that train_idx and val_idx have been generated using split_seeds.py"
+            )
+            exit()
+        except Exception as e:
+            print(f"Exception: {e}")
+            exit()
+
+
+class IGBH(dataset.Dataset):
+    def __init__(
+        self,
+        data_path,
+        name="igbh",
+        dataset_size="full",
+        use_label_2K=True,
+        in_memory=False,
+        layout: Literal["CSC", "CSR", "COO"] = "COO",
+        type: Literal["fp16", "fp32"] = "fp16",
+        device="cpu",
+        edge_dir="in",
+        **kwargs,
+    ):
+        super().__init__()
+        self.data_path = data_path
+        self.name = name
+        self.size = dataset_size
+        self.igbh_dataset = IGBHeteroDataset(
+            data_path,
+            dataset_size=dataset_size,
+            in_memory=in_memory,
+            use_label_2K=use_label_2K,
+            layout=layout,
+            use_fp16=(type == "fp16"),
+        )
+        self.num_samples = len(self.igbh_dataset.val_idx)
+
+    def get_samples(self, id_list):
+        return self.igbh_dataset.val_idx[id_list]
+
+    def get_labels(self, id_list):
+        return self.igbh_dataset.label[self.get_samples(id_list)]
+
+    def get_item_count(self):
+        return len(self.igbh_dataset.val_idx)
+
+    def load_query_samples(self, id):
+        pass
+
+    def unload_query_samples(self, sample_list):
+        return super().unload_query_samples(sample_list)
+
+
+class PostProcessIGBH:
+    def __init__(
+        self,
+        device="cpu",
+        dtype="uint8",
+        statistics_path=os.path.join(
+            os.path.dirname(__file__), "tools", "val2014.npz"),
+    ):
+        self.results = []
+        self.content_ids = []
+        self.samples_ids = []
+
+    def add_results(self, results):
+        self.results.extend(results)
+
+    def __call__(self, results, ids, sample_ids, result_dict=None):
+        self.content_ids.extend(ids)
+        self.samples_ids.extend(sample_ids)
+        return results.argmax(1).cpu().numpy()
+
+    def start(self):
+        self.results = []
+
+    def finalize(self, result_dict, ds=None, output_dir=None):
+        labels = ds.get_labels(self.content_ids)
+        total = len(self.results)
+        good = 0
+        for l, r in zip(labels, self.results):
+            if l == r:
+                good += 1
+        result_dict["accuracy"] = good / total
+        return result_dict
--- a/graph/R-GAT/main.py
+++ b/graph/R-GAT/main.py
+"""
+mlperf inference benchmarking tool
+"""
+
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import argparse
+import array
+import collections
+import json
+import logging
+import os
+import sys
+import threading
+import time
+from queue import Queue
+
+import mlperf_loadgen as lg
+import numpy as np
+import torch
+
+import dataset
+import igbh
+import dgl_utilities.feature_fetching as dgl_igbh
+
+
+logging.basicConfig(level=logging.INFO)
+log = logging.getLogger("main")
+
+NANO_SEC = 1e9
+MILLI_SEC = 1000
+
+SUPPORTED_DATASETS = {
+    "igbh-dgl-tiny": (
+        dgl_igbh.IGBH,
+        dataset.preprocess,
+        igbh.PostProcessIGBH(),
+        {"dataset_size": "tiny", "use_label_2K": True},
+    ),
+    "igbh-dgl-small": (
+        dgl_igbh.IGBH,
+        dataset.preprocess,
+        igbh.PostProcessIGBH(),
+        {"dataset_size": "small", "use_label_2K": True},
+    ),
+    "igbh-dgl-medium": (
+        dgl_igbh.IGBH,
+        dataset.preprocess,
+        igbh.PostProcessIGBH(),
+        {"dataset_size": "medium", "use_label_2K": True},
+    ),
+    "igbh-dgl-large": (
+        dgl_igbh.IGBH,
+        dataset.preprocess,
+        igbh.PostProcessIGBH(),
+        {"dataset_size": "large", "use_label_2K": True},
+    ),
+    "igbh-dgl": (
+        dgl_igbh.IGBH,
+        dataset.preprocess,
+        igbh.PostProcessIGBH(),
+        {"dataset_size": "full", "use_label_2K": True},
+    ),
+}
+
+
+SUPPORTED_PROFILES = {
+    "defaults": {
+        "dataset": "igbh-dgl-tiny",
+        "backend": "dgl",
+        "model-name": "rgat",
+    },
+    "debug-dgl": {
+        "dataset": "igbh-dgl-tiny",
+        "backend": "dgl",
+        "model-name": "rgat",
+    },
+    "rgat-dgl-small": {
+        "dataset": "igbh-dgl-small",
+        "backend": "dgl",
+        "model-name": "rgat",
+    },
+    "rgat-dgl-medium": {
+        "dataset": "igbh-dgl-medium",
+        "backend": "dgl",
+        "model-name": "rgat",
+    },
+    "rgat-dgl-large": {
+        "dataset": "igbh-dgl-large",
+        "backend": "dgl",
+        "model-name": "rgat",
+    },
+    "rgat-dgl-full": {
+        "dataset": "igbh-dgl",
+        "backend": "dgl",
+        "model-name": "rgat",
+    },
+}
+
+SCENARIO_MAP = {
+    "SingleStream": lg.TestScenario.SingleStream,
+    "MultiStream": lg.TestScenario.MultiStream,
+    "Server": lg.TestScenario.Server,
+    "Offline": lg.TestScenario.Offline,
+}
+
+
+def get_args():
+    parser = argparse.ArgumentParser()
+    # Dataset arguments
+    parser.add_argument(
+        "--dataset",
+        choices=SUPPORTED_DATASETS.keys(),
+        help="dataset")
+    parser.add_argument(
+        "--dataset-path",
+        required=True,
+        help="path to the dataset")
+    parser.add_argument(
+        "--layout",
+        default="COO",
+        choices=["CSC", "CSR", "COO"],
+        help="layout of the dataset",
+    )
+    parser.add_argument(
+        "--profile", choices=SUPPORTED_PROFILES.keys(), help="standard profiles"
+    )
+    parser.add_argument(
+        "--scenario",
+        default="SingleStream",
+        help="mlperf benchmark scenario, one of " +
+        str(list(SCENARIO_MAP.keys())),
+    )
+    parser.add_argument(
+        "--max-batchsize",
+        type=int,
+        default=1,
+        help="max batch size in a single inference",
+    )
+    parser.add_argument("--threads", default=1, type=int, help="threads")
+    parser.add_argument(
+        "--accuracy",
+        action="store_true",
+        help="enable accuracy pass")
+    parser.add_argument(
+        "--find-peak-performance",
+        action="store_true",
+        help="enable finding peak performance pass",
+    )
+    # Backend Arguments
+    parser.add_argument("--backend", help="Name of the backend")
+    parser.add_argument("--model-name", help="Name of the model")
+    parser.add_argument("--output", default="output", help="test results")
+    parser.add_argument("--qps", type=int, help="target qps")
+    parser.add_argument("--model-path", help="Path to model weights")
+
+    parser.add_argument(
+        "--dtype",
+        default="fp32",
+        choices=["fp32", "fp16"],
+        help="dtype of the model",
+    )
+    parser.add_argument(
+        "--device",
+        default="gpu",
+        choices=["gpu", "cpu"],
+        help="device to run the benchmark",
+    )
+    # file for user LoadGen settings such as target QPS
+    parser.add_argument(
+        "--user_conf",
+        default="user.conf",
+        help="user config for user LoadGen settings such as target QPS",
+    )
+    # file for LoadGen audit settings
+    parser.add_argument(
+        "--audit_conf", default="audit.config", help="config for LoadGen audit settings"
+    )
+
+    # below will override mlperf rules compliant settings - don't use for
+    # official submission
+    parser.add_argument("--time", type=int, help="time to scan in seconds")
+    parser.add_argument("--count", type=int, help="dataset items to use")
+    parser.add_argument("--debug", action="store_true", help="debug")
+    parser.add_argument(
+        "--performance-sample-count",
+        type=int,
+        help="performance sample count",
+        default=5000,
+    )
+    parser.add_argument(
+        "--max-latency", type=float, help="mlperf max latency in pct tile"
+    )
+    parser.add_argument(
+        "--samples-per-query",
+        default=8,
+        type=int,
+        help="mlperf multi-stream samples per query",
+    )
+    args = parser.parse_args()
+
+    # don't use defaults in argparser. Instead we default to a dict, override that with a profile
+    # and take this as default unless command line give
+    defaults = SUPPORTED_PROFILES["defaults"]
+
+    if args.profile:
+        profile = SUPPORTED_PROFILES[args.profile]
+        defaults.update(profile)
+    for k, v in defaults.items():
+        kc = k.replace("-", "_")
+        if getattr(args, kc) is None:
+            setattr(args, kc, v)
+
+    if args.scenario not in SCENARIO_MAP:
+        parser.error("valid scanarios:" + str(list(SCENARIO_MAP.keys())))
+    return args
+
+
+def get_backend(backend, **kwargs):
+    if backend == "dgl":
+        from backend_dgl import BackendDGL
+        backend = BackendDGL(**kwargs)
+    else:
+        raise ValueError("unknown backend: " + backend)
+    return backend
+
+
+class Item:
+    """An item that we queue for processing by the thread pool."""
+
+    def __init__(self, query_id, content_id, samples):
+        self.query_id = query_id
+        self.content_id = content_id
+        self.samples = samples
+        self.start = time.time()
+
+
+class RunnerBase:
+    def __init__(self, model, ds, threads, post_proc=None, max_batchsize=128):
+        self.take_accuracy = False
+        self.ds = ds
+        self.model = model
+        self.post_process = post_proc
+        self.threads = threads
+        self.take_accuracy = False
+        self.max_batchsize = max_batchsize
+        self.result_timing = []
+
+    def handle_tasks(self, tasks_queue):
+        pass
+
+    def start_run(self, result_dict, take_accuracy):
+        self.result_dict = result_dict
+        self.result_timing = []
+        self.take_accuracy = take_accuracy
+        self.post_process.start()
+
+    def run_one_item(self, qitem: Item):
+        # run the prediction
+        processed_results = []
+        try:
+            results = self.model.predict(qitem.samples)
+            processed_results = self.post_process(
+                results, qitem.content_id, qitem.samples, self.result_dict
+            )
+            if self.take_accuracy:
+                self.post_process.add_results(processed_results)
+            self.result_timing.append(time.time() - qitem.start)
+        except Exception as ex:  # pylint: disable=broad-except
+            src = [i for i in qitem.content_id]
+            log.error("thread: failed on contentid=%s, %s", src, ex)
+            # since post_process will not run, fake empty responses
+            processed_results = [[]] * len(qitem.query_id)
+        finally:
+            response_array_refs = []
+            response = []
+            for idx, query_id in enumerate(qitem.query_id):
+                response_array = array.array(
+                    "B", np.array(processed_results[idx], np.uint8).tobytes()
+                )
+                response_array_refs.append(response_array)
+                bi = response_array.buffer_info()
+                response.append(lg.QuerySampleResponse(query_id, bi[0], bi[1]))
+            lg.QuerySamplesComplete(response)
+
+    def enqueue(self, query_samples):
+        idx = [q.index for q in query_samples]
+        query_id = [q.id for q in query_samples]
+        if len(query_samples) < self.max_batchsize:
+            samples = self.ds.get_samples(idx)
+            self.run_one_item(Item(query_id, idx, samples))
+        else:
+            bs = self.max_batchsize
+            for i in range(0, len(idx), bs):
+                samples = self.ds.get_samples(idx[i: i + bs])
+                self.run_one_item(
+                    Item(query_id[i: i + bs], idx[i: i + bs], samples))
+
+    def finish(self):
+        pass
+
+
+class QueueRunner(RunnerBase):
+    def __init__(self, model, ds, threads, post_proc=None, max_batchsize=128):
+        super().__init__(model, ds, threads, post_proc, max_batchsize)
+        self.tasks = Queue(maxsize=threads * 4)
+        self.workers = []
+        self.result_dict = {}
+
+        for _ in range(self.threads):
+            worker = threading.Thread(
+                target=self.handle_tasks, args=(
+                    self.tasks,))
+            worker.daemon = True
+            self.workers.append(worker)
+            worker.start()
+
+    def handle_tasks(self, tasks_queue):
+        """Worker thread."""
+        while True:
+            qitem = tasks_queue.get()
+            if qitem is None:
+                # None in the queue indicates the parent want us to exit
+                tasks_queue.task_done()
+                break
+            self.run_one_item(qitem)
+            tasks_queue.task_done()
+
+    def enqueue(self, query_samples):
+        idx = [q.index for q in query_samples]
+        query_id = [q.id for q in query_samples]
+        if len(query_samples) < self.max_batchsize:
+            samples = self.ds.get_samples(idx)
+            self.tasks.put(Item(query_id, idx, samples))
+        else:
+            bs = self.max_batchsize
+            for i in range(0, len(idx), bs):
+                ie = i + bs
+                samples = self.ds.get_samples(idx[i:ie])
+                self.tasks.put(Item(query_id[i:ie], idx[i:ie], samples))
+
+    def finish(self):
+        # exit all threads
+        for _ in self.workers:
+            self.tasks.put(None)
+        for worker in self.workers:
+            worker.join()
+
+
+def main():
+    args = get_args()
+
+    log.info(args)
+
+    # dataset to use
+    dataset_class, pre_proc, post_proc, kwargs = SUPPORTED_DATASETS[args.dataset]
+    ds = dataset_class(
+        data_path=args.dataset_path,
+        name=args.dataset,
+        layout=args.layout,
+        type=args.dtype,
+        **kwargs,
+    )
+
+    # find backend
+    backend = get_backend(
+        args.backend,
+        type=args.dtype,
+        device=args.device,
+        ckpt_path=args.model_path,
+        batch_size=args.max_batchsize,
+        igbh=ds,
+        layout=args.layout,
+    )
+
+    # --count applies to accuracy mode only and can be used to limit the number of images
+    # for testing.
+    count_override = False
+    count = args.count
+    if count:
+        count_override = True
+
+    # load model to backend
+    model = backend.load()
+
+    final_results = {
+        "runtime": model.name(),
+        "version": model.version(),
+        "time": int(time.time()),
+        "args": vars(args),
+        "cmdline": str(args),
+    }
+
+    user_conf = os.path.abspath(args.user_conf)
+    if not os.path.exists(user_conf):
+        log.error("{} not found".format(user_conf))
+        sys.exit(1)
+
+    audit_config = os.path.abspath(args.audit_conf)
+
+    if args.output:
+        output_dir = os.path.abspath(args.output)
+        os.makedirs(output_dir, exist_ok=True)
+        os.chdir(output_dir)
+
+    #
+    # make one pass over the dataset to validate accuracy
+    #
+    count = ds.get_item_count()
+
+    # warmup
+    warmup_samples = torch.Tensor([0]).to(torch.int64)
+    for i in range(5):
+        _ = backend.predict(warmup_samples)
+
+    scenario = SCENARIO_MAP[args.scenario]
+    runner_map = {
+        lg.TestScenario.SingleStream: RunnerBase,
+        lg.TestScenario.MultiStream: QueueRunner,
+        lg.TestScenario.Server: QueueRunner,
+        lg.TestScenario.Offline: QueueRunner,
+    }
+    runner = runner_map[scenario](
+        model, ds, args.threads, post_proc=post_proc, max_batchsize=args.max_batchsize
+    )
+
+    def issue_queries(query_samples):
+        runner.enqueue(query_samples)
+
+    def flush_queries():
+        pass
+
+    log_output_settings = lg.LogOutputSettings()
+    log_output_settings.outdir = output_dir
+    log_output_settings.copy_summary_to_stdout = False
+    log_settings = lg.LogSettings()
+    log_settings.enable_trace = args.debug
+    log_settings.log_output = log_output_settings
+
+    settings = lg.TestSettings()
+    settings.FromConfig(user_conf, args.model_name, args.scenario)
+    settings.scenario = scenario
+    settings.mode = lg.TestMode.PerformanceOnly
+    if args.accuracy:
+        settings.mode = lg.TestMode.AccuracyOnly
+    if args.find_peak_performance:
+        settings.mode = lg.TestMode.FindPeakPerformance
+
+    if args.time:
+        # override the time we want to run
+        settings.min_duration_ms = args.time * MILLI_SEC
+        settings.max_duration_ms = args.time * MILLI_SEC
+
+    if args.qps:
+        qps = float(args.qps)
+        settings.server_target_qps = qps
+        settings.offline_expected_qps = qps
+
+    if count_override:
+        settings.min_query_count = count
+        settings.max_query_count = count
+
+    if args.samples_per_query:
+        settings.multi_stream_samples_per_query = args.samples_per_query
+    if args.max_latency:
+        settings.server_target_latency_ns = int(args.max_latency * NANO_SEC)
+        settings.multi_stream_expected_latency_ns = int(
+            args.max_latency * NANO_SEC)
+
+    performance_sample_count = (
+        args.performance_sample_count
+        if args.performance_sample_count
+        else min(count, 500)
+    )
+    sut = lg.ConstructSUT(issue_queries, flush_queries)
+    qsl = lg.ConstructQSL(
+        count, performance_sample_count, ds.load_query_samples, ds.unload_query_samples
+    )
+
+    log.info("starting {}".format(scenario))
+    result_dict = {"scenario": str(scenario)}
+    runner.start_run(result_dict, args.accuracy)
+
+    lg.StartTestWithLogSettings(sut, qsl, settings, log_settings, audit_config)
+
+    if args.accuracy:
+        post_proc.finalize(result_dict, ds, output_dir=args.output)
+        final_results["accuracy_results"] = result_dict
+
+    runner.finish()
+    lg.DestroyQSL(qsl)
+    lg.DestroySUT(sut)
+
+    #
+    # write final results
+    #
+    if args.output:
+        with open("results.json", "w") as f:
+            json.dump(final_results, f, sort_keys=True, indent=4)
+
+
+if __name__ == "__main__":
+    main()
--- a/graph/R-GAT/requirements.txt
+++ b/graph/R-GAT/requirements.txt
+colorama==0.4.6
+tqdm==4.66.4
+requests==2.32.2
+torchdata==0.7.0
+pybind11==2.12.0
+PyYAML==6.0.1
+pydantic==2.7.1
+git+https://github.com/IllinoisGraphBenchmark/IGB-Datasets.git
\ No newline at end of file
--- a/graph/R-GAT/rgnn.py
+++ b/graph/R-GAT/rgnn.py
+# This script was taken from:
+# https://github.com/mlcommons/training/blob/master/graph_neural_network/dataset.py
+
+import torch
+import torch.nn.functional as F
+
+from torch_geometric.nn import HeteroConv, GATConv, GCNConv, SAGEConv
+from torch_geometric.utils import trim_to_layer
+
+
+class RGNN(torch.nn.Module):
+    r"""[Relational GNN model](https://arxiv.org/abs/1703.06103).
+
+    Args:
+      etypes: edge types.
+      in_dim: input size.
+      h_dim: Dimension of hidden layer.
+      out_dim: Output dimension.
+      num_layers: Number of conv layers.
+      dropout: Dropout probability for hidden layers.
+      model: "rsage" or "rgat".
+      heads: Number of multi-head-attentions for GAT.
+      node_type: The predict node type for node classification.
+
+    """
+
+    def __init__(
+        self,
+        etypes,
+        in_dim,
+        h_dim,
+        out_dim,
+        num_layers=2,
+        dropout=0.2,
+        model="rgat",
+        heads=4,
+        node_type=None,
+        with_trim=False,
+    ):
+        super().__init__()
+        self.node_type = node_type
+        if node_type is not None:
+            self.lin = torch.nn.Linear(h_dim, out_dim)
+
+        self.convs = torch.nn.ModuleList()
+        for i in range(num_layers):
+            in_dim = in_dim if i == 0 else h_dim
+            h_dim = out_dim if (
+                i == (
+                    num_layers -
+                    1) and node_type is None) else h_dim
+            if model == "rsage":
+                self.convs.append(
+                    HeteroConv(
+                        {
+                            etype: SAGEConv(in_dim, h_dim, root_weight=False)
+                            for etype in etypes
+                        }
+                    )
+                )
+            elif model == "rgat":
+                self.convs.append(
+                    HeteroConv(
+                        {
+                            etype: GATConv(
+                                in_dim,
+                                h_dim // heads,
+                                heads=heads,
+                                add_self_loops=False,
+                            )
+                            for etype in etypes
+                        }
+                    )
+                )
+        self.dropout = torch.nn.Dropout(dropout)
+        self.with_trim = with_trim
+
+    def forward(
+        self,
+        x_dict,
+        edge_index_dict,
+        num_sampled_edges_dict=None,
+        num_sampled_nodes_dict=None,
+    ):
+        for i, conv in enumerate(self.convs):
+            if self.with_trim:
+                x_dict, edge_index_dict, _ = trim_to_layer(
+                    layer=i,
+                    num_sampled_nodes_per_hop=num_sampled_nodes_dict,
+                    num_sampled_edges_per_hop=num_sampled_edges_dict,
+                    x=x_dict,
+                    edge_index=edge_index_dict,
+                )
+            for key in list(edge_index_dict.keys()):
+                if key[0] not in x_dict or key[-1] not in x_dict:
+                    del edge_index_dict[key]
+
+            x_dict = conv(x_dict, edge_index_dict)
+            if i != len(self.convs) - 1:
+                x_dict = {key: F.leaky_relu(x) for key, x in x_dict.items()}
+                x_dict = {key: self.dropout(x) for key, x in x_dict.items()}
+        if hasattr(self, "lin"):  # for node classification
+            return self.lin(x_dict[self.node_type])
+        else:
+            return x_dict
--- a/graph/R-GAT/tools/accuracy_igbh.py
+++ b/graph/R-GAT/tools/accuracy_igbh.py
+import argparse
+import numpy as np
+import torch
+import json
+import os
+
+
+def get_args():
+    """Parse commandline."""
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--mlperf-accuracy-file", required=True, help="path to mlperf_log_accuracy.json"
+    )
+    parser.add_argument(
+        "--dataset-path",
+        default="igbh",
+        help="Path to IHGB dataset",
+    )
+    parser.add_argument(
+        "--dataset-size",
+        default="full",
+        choices=["tiny", "small", "medium", "large", "full"]
+    )
+    parser.add_argument(
+        "--verbose",
+        action="store_true",
+        help="verbose messages")
+    parser.add_argument(
+        "--output-file", default="results.json", help="path to output file"
+    )
+    parser.add_argument(
+        "--dtype",
+        default="uint8",
+        choices=["uint8", "float32", "int32", "int64"],
+        help="data type of the label",
+    )
+    args = parser.parse_args()
+    return args
+
+
+def load_labels(base_path, dataset_size, use_label_2K=True):
+    # load labels
+    paper_nodes_num = {
+        "tiny": 100000,
+        "small": 1000000,
+        "medium": 10000000,
+        "large": 100000000,
+        "full": 269346174,
+    }
+    label_file = (
+        "node_label_19.npy" if not use_label_2K else "node_label_2K.npy"
+    )
+    paper_lbl_path = os.path.join(
+        base_path,
+        dataset_size,
+        "processed",
+        "paper",
+        label_file)
+
+    if dataset_size in ["large", "full"]:
+        paper_node_labels = torch.from_numpy(
+            np.memmap(
+                paper_lbl_path, dtype="float32", mode="r", shape=(paper_nodes_num[dataset_size])
+            )
+        ).to(torch.long)
+    else:
+        paper_node_labels = torch.from_numpy(
+            np.load(paper_lbl_path)).to(
+            torch.long)
+    labels = paper_node_labels
+    val_idx = torch.load(
+        os.path.join(
+            base_path,
+            dataset_size,
+            "processed",
+            "val_idx.pt"))
+    return labels, val_idx
+
+
+def get_labels(labels, val_idx, id_list):
+    return labels[val_idx[id_list]]
+
+
+if __name__ == "__main__":
+    args = get_args()
+    dtype_map = {
+        "uint8": np.uint8,
+        "float32": np.float32,
+        "int32": np.int32,
+        "int64": np.int64}
+
+    with open(args.mlperf_accuracy_file, "r") as f:
+        mlperf_results = json.load(f)
+
+    labels, val_idx = load_labels(args.dataset_path, args.dataset_size)
+    results = {}
+
+    seen = set()
+    good = 0
+    total = 0
+    for result in mlperf_results:
+        idx = result["qsl_idx"]
+        if idx in seen:
+            continue
+        seen.add(idx)
+
+        # get ground truth
+        label = get_labels(labels, val_idx, idx)
+        # get prediction
+        data = int(np.frombuffer(bytes.fromhex(
+            result["data"]), dtype_map[args.dtype])[0])
+        if label == data:
+            good += 1
+        total += 1
+    results["accuracy"] = good / total
+    results["model"] = "rgat"
+    results["number_correct_samples"] = good
+    results["performance_sample_count"] = total
+
+    with open(args.output_file, "w") as fp:
+        fp.write("accuracy={:.3f}%, good={}, total={}".format(
+            100.0 *
+            results["accuracy"], results["number_correct_samples"], results["performance_sample_count"]
+        ))