- 27 Feb, 2026 5 commits
- 26 Feb, 2026 4 commits
- 24 Feb, 2026 4 commits
- 22 Feb, 2026 1 commit
-
-
one authored
-
- 21 Feb, 2026 6 commits
- 20 Feb, 2026 1 commit
-
-
one authored
-
- 11 Feb, 2026 3 commits
-
-
one authored
- Rename transfer fields for clarity and introduce separate methods for reporting non-P2P and P2P transfers. - Add new P2P fields extraction and sorting logic to improve data presentation. - Update method names and comments for better understanding of functionality.
-
one authored
-
one authored
- Introduce `rccl_log_parser.py` for parsing RCCL logs, extracting system information, user-defined environment variables, graph info, and transfer arguments. - Add usage examples in `README.md` for running the parser as a wrapper and processing existing log files.
-
- 05 Feb, 2026 1 commit
-
-
one authored
- Update run.sh to include new options for warmups and prompt stretching. - Refactor test_evo2_generation_batched.py to improve trace output formatting and add support for warmup sequences. - Adjust batch processing to include detailed profiling for each step.
-
- 04 Feb, 2026 1 commit
-
-
one authored
- Remove prompt_stretch option from run.sh - Adjust condition in test_evo2_generation_batched.py to allow prompt stretching based on batch size
-
- 03 Feb, 2026 1 commit
-
-
one authored
- Update run.sh to include new command-line options for prompt stretching and token limits. - Modify test_evo2_generation_batched.py to adjust profiling settings and improve output formatting. - Add support for stretching prompts to the longest length for batch processing.
-
- 01 Feb, 2026 4 commits
-
-
one authored
- Update run.sh to include trace logging options with gzip support. - Modify test_evo2_generation_batched.py to add command-line arguments for trace log directory and gzip option. - Refactor custom trace handler to utilize gzip compression for trace outputs.
-
one authored
- Remove gemv_export.cpp - Update Makefile and README for compiler variable changes - Adjust run-all.sh for consistent build commands
-
one authored
- Introduce kernel_launch_overhead.cu to measure kernel launch latency, system throughput, CPU dispatch overhead, and GPU dispatch time. - Create Makefile for building the benchmark with support for nvcc and hipcc. - Add run-all.sh script to execute the benchmark with specified device settings.
-
one authored
- Add fix-pt-trace.sh for repairing non-UTF-8 traces. - Remove deprecated run-rocblas.sh. - Update trace handler (worker names) and tune GPU bindings in run-all.sh.
-
- 31 Jan, 2026 2 commits
- 30 Jan, 2026 3 commits
- 29 Jan, 2026 3 commits
- 28 Jan, 2026 1 commit
-
-
one authored
-