• Deepak Narayanan's avatar
    Interleaved pipeline execution and code refactoring · dd889062
    Deepak Narayanan authored
    - Split a model's computation into multiple virtual stages as needed,
    and schedule communication correctly between these virtual stages
    - Move schedule code into `schedules.py` and communication code into
    `p2p_communication.py`
    - Use hyphens instead of spaces in all time logging for consistency
    - Factor out code in megatron/training.py into helper functions
    - Refactor evaluate() function: make it use forward_backward_schedule
    functions
    dd889062
initialize.py 6.32 KB