• jeffhataws's avatar
    Add AWS Neuron torchrun support (#20806) · c59d71b2
    jeffhataws authored
    * Add XLA torchrun support
    
    * Clarify that currently DDP doesn't work with torch.distributed XLA backend yet
    
    * Enable DDP with torchrun and XLA (now available in PT-XLA 1.13)
    
    * Add check for AWS Neuron availability and AWS Neuron specific compiler flag
    
    * Change the new test's name to TestTrainerDistributedNeuronCore
    
    * Remove "assert" and replace raised exception
    
    * Remove compiler flag as it is optional. If needed, will be another PR.
    
    * Use TORCHELASTIC_RUN_ID to determine whether torchrun is used
    c59d71b2
testing_utils.py 56.4 KB