Benchmark: Model benchmark - deterministic training support (#731) (#2)
Adds opt-in deterministic training mode to SuperBench's PyTorch model benchmarks. When enabled --enable-determinism. PyTorch deterministic algorithms are enforced, and per-step numerical fingerprints (loss, activation means) are recorded as metrics. These can be compared across runs using the existing sb result diagnosis pipeline to verify bit-exact reproducibility — useful for hardware validation and platform comparison. Flags added - --enable-determinism --check-frequency: Number of steps after which you want the metrics to be recorded --deterministic-seed Changes - Updated pytorch_base.py to handle deterministic settings, logging. Added a new example script: pytorch_deterministic_example.py Added a test file: test_pytorch_determinism_all.py to verify everything works as expected. Usage - Step 1: Run 1 - Run with --enable-determinism and the necessary metrics will be recorded in the results-summary.jsonl file Step 2: Generate the baseline file from the Run 1 results using - sb result generate-baseline Step 3: Run 2 - Run with --enable-determinism and the necessary metrics will be recorded in the results-summary.jsonl file on a different machine (or the same machine) Step 4: Run diagnosis on the results generated from the 2 runs using the - sb result diagnosis command Note - 1. Make sure all the parameters are constant between the 2 runs 2. Running the diagnosis command requires the rules.yaml file --------- Co-authored-by:Aishwarya Tonpe <aishwarya.tonpe25@gmail.com> Co-authored-by:
Ubuntu <rdadmin@HPCPLTNODE0.n3kgq4m0lhoednrx3hxtad2nha.cdmx.internal.cloudapp.net>
Showing
Please register or sign in to comment