Commit aa7716be authored by Ayushi Dalmia's avatar Ayushi Dalmia Committed by Facebook GitHub Bot
Browse files

Make lightning reproducible

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/661

X-link: https://github.com/fairinternal/detectron2/pull/603

X-link: https://github.com/facebookresearch/detectron2/pull/5273

In this diff we make changes to ensure we can control reproducibility in d2go:

- update setup.py to enforce deterministic performance if set via config
- set lightning parameters if deterministic is passed:

```
 {
                "sync_batchnorm": True,
                "deterministic": True,
                "replace_sampler_ddp": False,
 }
```
- allow passing prefetch_factor, pin_memory, persistent_memory as args to batch dataloader.
- minor fix in training sampler

Differential Revision: D55767128

fbshipit-source-id: eeab50c95969a91c58f1773473b6fc666494cc16
parent 05b33018
...@@ -336,17 +336,16 @@ def setup_after_launch( ...@@ -336,17 +336,16 @@ def setup_after_launch(
torch.backends.cuda.matmul.allow_tf32 = False torch.backends.cuda.matmul.allow_tf32 = False
torch.backends.cudnn.allow_tf32 = False torch.backends.cudnn.allow_tf32 = False
# seed
seed(cfg.SEED, deterministic=2)
# pytorch deterministic # pytorch deterministic
torch.set_deterministic_debug_mode(2)
torch.backends.cudnn.deterministic = True torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False torch.backends.cudnn.benchmark = False
torch.use_deterministic_algorithms(True) torch.use_deterministic_algorithms(True)
torch.utils.deterministic.fill_uninitialized_memory = True
# reference: https://docs.nvidia.com/cuda/cublas/index.html#cublasApi_reproducibility # reference: https://docs.nvidia.com/cuda/cublas/index.html#cublasApi_reproducibility
os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:8" os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:8"
elif cfg.SEED > 0: if cfg.SEED > 0:
seed_all_rng(cfg.SEED) seed_all_rng(cfg.SEED)
return runner return runner
......
...@@ -108,6 +108,15 @@ def get_trainer_params(cfg: CfgNode) -> Dict[str, Any]: ...@@ -108,6 +108,15 @@ def get_trainer_params(cfg: CfgNode) -> Dict[str, Any]:
} }
) )
if hasattr(cfg, "SOLVER.DETERMINISTIC"):
params.update(
{
"sync_batchnorm": True,
"deterministic": True,
"replace_sampler_ddp": False,
}
)
return params return params
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment