- 18 Apr, 2026 1 commit
-
-
one authored
Adds opt-in deterministic training mode to SuperBench's PyTorch model benchmarks. When enabled --enable-determinism. PyTorch deterministic algorithms are enforced, and per-step numerical fingerprints (loss, activation means) are recorded as metrics. These can be compared across runs using the existing sb result diagnosis pipeline to verify bit-exact reproducibility — useful for hardware validation and platform comparison. Flags added - --enable-determinism --check-frequency: Number of steps after which you want the metrics to be recorded --deterministic-seed Changes - Updated pytorch_base.py to handle deterministic settings, logging. Added a new example script: pytorch_deterministic_example.py Added a test file: test_pytorch_determinism_all.py to verify everything works as expected. Usage - Step 1: Run 1 - Run with --enable-determinism and the necessary metrics will be recorded in the results-summary.jsonl file Step 2: Generate the baseline file from the Run 1 results using - sb result generate-baseline Step 3: Run 2 - Run with --enable-determinism and the necessary metrics will be recorded in the results-summary.jsonl file on a different machine (or the same machine) Step 4: Run diagnosis on the results generated from the 2 runs using the - sb result diagnosis command Note - 1. Make sure all the parameters are constant between the 2 runs 2. Running the diagnosis command requires the rules.yaml file --------- Co-authored-by:
Aishwarya Tonpe <aishwarya.tonpe25@gmail.com> Co-authored-by:
Ubuntu <rdadmin@HPCPLTNODE0.n3kgq4m0lhoednrx3hxtad2nha.cdmx.internal.cloudapp.net>
-
- 30 Jun, 2025 1 commit
-
-
pdr authored
Added MoE model using MixtralConfig. 1. Added 8x7b and 8x22b variants 2. Requires high VRAM as all experts are loaded in memory. Thus, disabled training due to memory constraint on test worker. --------- Co-authored-by:
Hongtao Zhang <garyworkzht@gmail.com> Co-authored-by:
Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by:
Hongtao Zhang <hongtaozhang@microsoft.com>
-
- 28 Nov, 2024 1 commit
-
-
pdr authored
Added llama benchmark - training and inference in accordance with the existing pytorch models implementation like gpt2, lstm etc. - added llama fp8 unit test for better code coverage, to reduce memory required - updated transformers version >= 4.28.0 for LLamaConfig - set tokenizers version <= 0.20.3 to avoid 0.20.4 version [issues](https://github.com/huggingface/tokenizers/issues/1691 ) with py3.8 - added llama2 to tensorrt - llama2 tests not added to test_tensorrt_inference_performance.py due to large memory requirement for worker gpu. tests validated separately on gh200 --------- Co-authored-by:
dpatlolla <dpatlolla@microsoft.com>
-
- 07 Dec, 2023 1 commit
-
-
Yuting Jiang authored
**Description** Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark
-
- 28 Jan, 2023 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.7.0 to main. **Major Revisions** * Benchmarks - Fix missing include in FP8 benchmark (#460) * Fix bug in TE BERT model (#461) * Doc - Update benchmark doc (#465) * Bug: Fix bug for incorrect datatype judgement in cublas-function source code (#464) * Support `sb deploy` without pulling image (#466) * Docs - Upgrade version and release note (#467) Co-authored-by:
Russell J. Hewett <russell.j.hewett@gmail.com> Co-authored-by:
Yuting Jiang <yutingjiang@microsoft.com>
-
- 25 Jan, 2022 1 commit
-
-
Yifan Xiong authored
__Description__ Update benchmark naming to support annotations. __Major Revisions__ - Update name for `create_benchmark_context` in executor. - Backward compatibility for model benchmarks using "_models" suffix. - Update documents.
-
- 19 Jan, 2022 1 commit
-
-
guoshzhao authored
**Description** Add 50th, 90th, 95th, 99th, 99.9th latency metrics for ORT and pytorch inference benchmarks.
-
- 09 Dec, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Unify metric names of benchmarks.
-
- 27 Oct, 2021 1 commit
-
-
Yifan Xiong authored
Add introduction and metrics for micro-benchmarks and model-benchmarks document.
-
- 12 Oct, 2021 1 commit
-
-
Yifan Xiong authored
__Major Revisions__ * Refine document structure for user tutorial. __Minor Revisions__ * Add AMD part in installation. * Change default config file to latest link.
-
- 30 Jun, 2021 1 commit
-
-
TobeyQin authored
* Add introduction and release documents. * Fix some typos in documents.
-
- 25 Jun, 2021 1 commit
-
-
Yifan Xiong authored
Update SuperBench documents.
-