"official/utils/logs/mlperf_helper.py" did not exist on "f2b702a056ba08a2f2344425f116a673a302abdd"
Pipeline parallelism implementation with periodic full-pipeline syncs
Also includes following changes for inter-layer model-parallel implementation: - Refactoring of model implementations - Training loop changes to support inter-layer communication using `ring_exchange` - New groups for inter-layer communication - Checkpoint changes - Command line arguments
Showing
Please register or sign in to comment