"megatron/legacy/mpu/tests/test_random.py" did not exist on "1afe3541ecd4327070471684daa454c9fb1a1006"
-
Lawrence McAfee authored
merge sequence parallelism's layernorm all-reduce into distributed optimizer.
c9a59554
merge sequence parallelism's layernorm all-reduce into distributed optimizer.