Add NUMA binding

Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/472 Add NUMA binding to d2go. It equally distributes the GPUs to the CPU sockets so that the CPU traffic, GPU to CPU traffic are all balanced. It helps the diffusion model training, but it is a general technique that can be applied to all models. We still want to manually enable it in each case though, until we are confident that it gives better performance and set it as a default. NUMA binding is based on jspark1105's work D42827082. Full credit goes to him. This diff does not enable the feature. Reviewed By: newstzpz Differential Revision: D43036817 fbshipit-source-id: fe67fd656ed3980f04bc81909cae7ba2527346fd

Add NUMA binding
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/472 Add NUMA binding to d2go. It equally distributes the GPUs to the CPU sockets so that the CPU traffic, GPU to CPU traffic are all balanced. It helps the diffusion model training, but it is a general technique that can be applied to all models. We still want to manually enable it in each case though, until we are confident that it gives better performance and set it as a default. NUMA binding is based on jspark1105's work D42827082. Full credit goes to him. This diff does not enable the feature. Reviewed By: newstzpz Differential Revision: D43036817 fbshipit-source-id: fe67fd656ed3980f04bc81909cae7ba2527346fd
07ddd262 · Fei Sun · Facebook GitHub Bot · 8bb24bb0 · 07ddd262 · 07ddd262
Commit 07ddd262 authored Feb 14, 2023 by Fei Sun Committed by Facebook GitHub Bot Feb 14, 2023
Hide whitespace changes
Inline Side-by-side

Showing with 14 additions and 0 deletions

d2go/runner/config_defaults.py d2go/runner/config_defaults.py +3 -0

d2go/runner/default_runner.py d2go/runner/default_runner.py +11 -0

No files found.
--- a/d2go/runner/config_defaults.py
+++ b/d2go/runner/config_defaults.py
@@ -108,6 +108,9 @@ def _add_detectron2go_runner_default_cfg(_C: CN) -> None:
    # Add FB specific configs
    _add_detectron2go_runner_default_fb_cfg(_C)

+    # Specify whether to perform NUMA binding
+    _C.NUMA_BINDING = False
+

 def _add_rcnn_default_config(_C: CN) -> None:
    _C.EXPORT_CAFFE2 = CN()

--- a/d2go/runner/default_runner.py
+++ b/d2go/runner/default_runner.py
@@ -475,6 +475,17 @@ class Detectron2GoRunner(D2GoDataAPIMixIn, BaseRunner):
        # if a model has input-dependent logic
        attach_profilers(cfg, model)

+        if cfg.NUMA_BINDING is True:
+            import numa
+
+            num_gpus_per_node = comm.get_local_size()
+            num_sockets = numa.get_max_node() + 1
+            socket_id = torch.cuda.current_device() // (
+                max(num_gpus_per_node // num_sockets, 1)
+            )
+            node_mask = set([socket_id])
+            numa.bind(node_mask)
+
        optimizer = self.build_optimizer(cfg, model)
        scheduler = self.build_lr_scheduler(cfg, optimizer)