W0526 15:40:34.779000 23412721211200 torch/distributed/run.py:779] W0526 15:40:34.779000 23412721211200 torch/distributed/run.py:779] ***************************************** W0526 15:40:34.779000 23412721211200 torch/distributed/run.py:779] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. W0526 15:40:34.779000 23412721211200 torch/distributed/run.py:779] ***************************************** INFO 05-26 15:40:38 __init__.py:193] Automatically detected platform rocm. INFO 05-26 15:40:38 __init__.py:193] Automatically detected platform rocm. INFO 05-26 15:40:38 __init__.py:193] Automatically detected platform rocm. INFO 05-26 15:40:38 __init__.py:193] Automatically detected platform rocm. INFO 05-26 15:40:39 __init__.py:193] Automatically detected platform rocm. INFO 05-26 15:40:39 __init__.py:193] Automatically detected platform rocm. INFO 05-26 15:40:39 __init__.py:193] Automatically detected platform rocm. INFO 05-26 15:40:39 __init__.py:193] Automatically detected platform rocm. Could not load Sliding Tile Attention. WARNING: Logging before InitGoogleLogging() is written to STDERR I0526 15:40:40.336937 89349 ProcessGroupNCCL.cpp:869] [PG 0 Rank 3] ProcessGroupNCCL initialization options: size: 8, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0 I0526 15:40:40.336980 89349 ProcessGroupNCCL.cpp:878] [PG 0 Rank 3] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0 I0526 15:40:40.337304 89349 ProcessGroupNCCL.cpp:869] [PG 1 Rank 3] ProcessGroupNCCL initialization options: size: 4, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 1 I0526 15:40:40.337319 89349 ProcessGroupNCCL.cpp:878] [PG 1 Rank 3] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Could not load Sliding Tile Attention. WARNING: Logging before InitGoogleLogging() is written to STDERR I0526 15:40:40.390194 89348 ProcessGroupNCCL.cpp:869] [PG 0 Rank 2] ProcessGroupNCCL initialization options: size: 8, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0 I0526 15:40:40.390235 89348 ProcessGroupNCCL.cpp:878] [PG 0 Rank 2] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0 I0526 15:40:40.390537 89348 ProcessGroupNCCL.cpp:869] [PG 1 Rank 2] ProcessGroupNCCL initialization options: size: 4, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 1 I0526 15:40:40.390552 89348 ProcessGroupNCCL.cpp:878] [PG 1 Rank 2] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Could not load Sliding Tile Attention. WARNING: Logging before InitGoogleLogging() is written to STDERR I0526 15:40:40.468447 89351 ProcessGroupNCCL.cpp:869] [PG 0 Rank 5] ProcessGroupNCCL initialization options: size: 8, global rank: 5, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0 I0526 15:40:40.468519 89351 ProcessGroupNCCL.cpp:878] [PG 0 Rank 5] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0 I0526 15:40:40.469101 89351 ProcessGroupNCCL.cpp:869] [PG 2 Rank 1] ProcessGroupNCCL initialization options: size: 4, global rank: 5, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 2 I0526 15:40:40.469116 89351 ProcessGroupNCCL.cpp:878] [PG 2 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Could not load Sliding Tile Attention. Could not load Sliding Tile Attention. WARNING: Logging before InitGoogleLogging() is written to STDERR I0526 15:40:41.540874 89352 ProcessGroupNCCL.cpp:869] [PG 0 Rank 6] ProcessGroupNCCL initialization options: size: 8, global rank: 6, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0 I0526 15:40:41.540922 89352 ProcessGroupNCCL.cpp:878] [PG 0 Rank 6] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0 I0526 15:40:41.541432 89352 ProcessGroupNCCL.cpp:869] [PG 2 Rank 2] ProcessGroupNCCL initialization options: size: 4, global rank: 6, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 2 I0526 15:40:41.541446 89352 ProcessGroupNCCL.cpp:878] [PG 2 Rank 2] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< WARNING: Logging before InitGoogleLogging() is written to STDERR I0526 15:40:41.548353 89346 ProcessGroupNCCL.cpp:869] [PG 0 Rank 0] ProcessGroupNCCL initialization options: size: 8, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0 I0526 15:40:41.548408 89346 ProcessGroupNCCL.cpp:878] [PG 0 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0 I0526 15:40:41.548835 89346 ProcessGroupNCCL.cpp:869] [PG 1 Rank 0] ProcessGroupNCCL initialization options: size: 4, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 1 I0526 15:40:41.548847 89346 ProcessGroupNCCL.cpp:878] [PG 1 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0 --> loading model from /public/model/HunyuanVideo/hunyuan-video-t2v-720p <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Could not load Sliding Tile Attention. Could not load Sliding Tile Attention. WARNING: Logging before InitGoogleLogging() is written to STDERR I0526 15:40:41.837021 89353 ProcessGroupNCCL.cpp:869] [PG 0 Rank 7] ProcessGroupNCCL initialization options: size: 8, global rank: 7, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0 I0526 15:40:41.837060 89353 ProcessGroupNCCL.cpp:878] [PG 0 Rank 7] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0 I0526 15:40:41.837443 89353 ProcessGroupNCCL.cpp:869] [PG 2 Rank 3] ProcessGroupNCCL initialization options: size: 4, global rank: 7, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 2 I0526 15:40:41.837458 89353 ProcessGroupNCCL.cpp:878] [PG 2 Rank 3] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< WARNING: Logging before InitGoogleLogging() is written to STDERR I0526 15:40:41.848428 89347 ProcessGroupNCCL.cpp:869] [PG 0 Rank 1] ProcessGroupNCCL initialization options: size: 8, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0 I0526 15:40:41.848471 89347 ProcessGroupNCCL.cpp:878] [PG 0 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0 I0526 15:40:41.848771 89347 ProcessGroupNCCL.cpp:869] [PG 1 Rank 1] ProcessGroupNCCL initialization options: size: 4, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 1 I0526 15:40:41.848786 89347 ProcessGroupNCCL.cpp:878] [PG 1 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Could not load Sliding Tile Attention. WARNING: Logging before InitGoogleLogging() is written to STDERR I0526 15:40:42.108510 89350 ProcessGroupNCCL.cpp:869] [PG 0 Rank 4] ProcessGroupNCCL initialization options: size: 8, global rank: 4, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0 I0526 15:40:42.108565 89350 ProcessGroupNCCL.cpp:878] [PG 0 Rank 4] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0 I0526 15:40:42.109011 89350 ProcessGroupNCCL.cpp:869] [PG 2 Rank 0] ProcessGroupNCCL initialization options: size: 4, global rank: 4, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 2 I0526 15:40:42.109025 89350 ProcessGroupNCCL.cpp:878] [PG 2 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Total training parameters = 12821.012544 M --> Initializing FSDP with sharding strategy: full >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --> applying fdsp activation checkpointing... --> applying fdsp activation checkpointing... --> applying fdsp activation checkpointing... --> model loaded --> applying fdsp activation checkpointing... optimizer: AdamW ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 1e-05 maximize: False weight_decay: 0.01 ) ***** Running training ***** Num examples = 101 Dataloader size = 13 Num Epochs = 39 Resume training from step 0 Instantaneous batch size per device = 1 Total train batch size (w. data & sequence parallel, accumulation) = 2.0 Gradient Accumulation steps = 1 Total optimization steps = 2000 Total training parameters per FSDP shard = 1.602626568 B Master weight dtype: torch.float32 Steps: 0%| | 0/2000 [00:00 applying fdsp activation checkpointing... I0526 15:42:10.140130 89346 ProcessGroupNCCL.cpp:2074] [PG 1 Rank 0] ProcessGroupNCCL broadcast unique ID through store took 0.106799 ms I0526 15:42:10.140599 89349 ProcessGroupNCCL.cpp:2074] [PG 1 Rank 3] ProcessGroupNCCL broadcast unique ID through store took 4345.63 ms I0526 15:42:10.140630 89348 ProcessGroupNCCL.cpp:2074] [PG 1 Rank 2] ProcessGroupNCCL broadcast unique ID through store took 4589.22 ms --> applying fdsp activation checkpointing... I0526 15:42:10.436164 89350 ProcessGroupNCCL.cpp:2074] [PG 2 Rank 0] ProcessGroupNCCL broadcast unique ID through store took 0.197698 ms I0526 15:42:10.436353 89353 ProcessGroupNCCL.cpp:2074] [PG 2 Rank 3] ProcessGroupNCCL broadcast unique ID through store took 1017.89 ms --> applying fdsp activation checkpointing... I0526 15:42:10.834331 89351 ProcessGroupNCCL.cpp:2074] [PG 2 Rank 1] ProcessGroupNCCL broadcast unique ID through store took 0.286167 ms I0526 15:42:10.932520 89352 ProcessGroupNCCL.cpp:2074] [PG 2 Rank 2] ProcessGroupNCCL broadcast unique ID through store took 0.325397 ms --> applying fdsp activation checkpointing... I0526 15:42:11.800187 89350 ProcessGroupNCCL.cpp:2183] [PG 2 Rank 0] ProcessGroupNCCL created ncclComm_ 0x5606a55d29d0 on CUDA device:  I0526 15:42:11.800220 89353 ProcessGroupNCCL.cpp:2183] [PG 2 Rank 3] ProcessGroupNCCL created ncclComm_ 0x563e3849eaf0 on CUDA device:  I0526 15:42:11.800222 89352 ProcessGroupNCCL.cpp:2183] [PG 2 Rank 2] ProcessGroupNCCL created ncclComm_ 0x5571303a5800 on CUDA device:  I0526 15:42:11.800279 89350 ProcessGroupNCCL.cpp:2188] [PG 2 Rank 0] NCCL_DEBUG: N/A I0526 15:42:11.800294 89353 ProcessGroupNCCL.cpp:2188] [PG 2 Rank 3] NCCL_DEBUG: N/A I0526 15:42:11.800345 89352 ProcessGroupNCCL.cpp:2188] [PG 2 Rank 2] NCCL_DEBUG: N/A I0526 15:42:11.801318 89351 ProcessGroupNCCL.cpp:2183] [PG 2 Rank 1] ProcessGroupNCCL created ncclComm_ 0x555ee2337e90 on CUDA device:  I0526 15:42:11.801399 89351 ProcessGroupNCCL.cpp:2188] [PG 2 Rank 1] NCCL_DEBUG: N/A I0526 15:42:11.974504 89347 ProcessGroupNCCL.cpp:2074] [PG 1 Rank 1] ProcessGroupNCCL broadcast unique ID through store took 0.300847 ms I0526 15:42:12.899058 89348 ProcessGroupNCCL.cpp:2183] [PG 1 Rank 2] ProcessGroupNCCL created ncclComm_ 0x557f562a2620 on CUDA device:  I0526 15:42:12.899091 89346 ProcessGroupNCCL.cpp:2183] [PG 1 Rank 0] ProcessGroupNCCL created ncclComm_ 0x55b6f5928900 on CUDA device: I0526 15:42:12.899174 89348 ProcessGroupNCCL.cpp:2188] [PG 1 Rank 2] NCCL_DEBUG: N/A I0526 15:42:12.899205 89346 ProcessGroupNCCL.cpp:2188] [PG 1 Rank 0] NCCL_DEBUG: N/A I0526 15:42:12.899204 89349 ProcessGroupNCCL.cpp:2183] [PG 1 Rank 3] ProcessGroupNCCL created ncclComm_ 0x55de54e073d0 on CUDA device:  I0526 15:42:12.899271 89349 ProcessGroupNCCL.cpp:2188] [PG 1 Rank 3] NCCL_DEBUG: N/A I0526 15:42:12.899235 89347 ProcessGroupNCCL.cpp:2183] [PG 1 Rank 1] ProcessGroupNCCL created ncclComm_ 0x55c981004470 on CUDA device:  I0526 15:42:12.899341 89347 ProcessGroupNCCL.cpp:2188] [PG 1 Rank 1] NCCL_DEBUG: N/A I0526 15:42:13.361822 89346 ProcessGroupNCCL.cpp:2074] [PG 0 (default_pg) Rank 0] ProcessGroupNCCL broadcast unique ID through store took 0.086089 ms I0526 15:42:13.362102 89351 ProcessGroupNCCL.cpp:2074] [PG 0 (default_pg) Rank 5] ProcessGroupNCCL broadcast unique ID through store took 1030.98 ms I0526 15:42:13.362121 89350 ProcessGroupNCCL.cpp:2074] [PG 0 (default_pg) Rank 4] ProcessGroupNCCL broadcast unique ID through store took 1027.7 ms I0526 15:42:13.362154 89352 ProcessGroupNCCL.cpp:2074] [PG 0 (default_pg) Rank 6] ProcessGroupNCCL broadcast unique ID through store took 1020.04 ms I0526 15:42:13.362164 89353 ProcessGroupNCCL.cpp:2074] [PG 0 (default_pg) Rank 7] ProcessGroupNCCL broadcast unique ID through store took 1020.44 ms I0526 15:42:13.362969 89347 ProcessGroupNCCL.cpp:2074] [PG 0 (default_pg) Rank 1] ProcessGroupNCCL broadcast unique ID through store took 0.251227 ms I0526 15:42:13.365820 89349 ProcessGroupNCCL.cpp:2074] [PG 0 (default_pg) Rank 3] ProcessGroupNCCL broadcast unique ID through store took 0.212888 ms I0526 15:42:13.370061 89348 ProcessGroupNCCL.cpp:2074] [PG 0 (default_pg) Rank 2] ProcessGroupNCCL broadcast unique ID through store took 0.133879 ms I0526 15:42:13.855226 89346 ProcessGroupNCCL.cpp:2183] [PG 0 (default_pg) Rank 0] ProcessGroupNCCL created ncclComm_ 0x55b6f608fa50 on CUDA device: I0526 15:42:13.855289 89346 ProcessGroupNCCL.cpp:2188] [PG 0 (default_pg) Rank 0] NCCL_DEBUG: N/A I0526 15:42:13.855336 89349 ProcessGroupNCCL.cpp:2183] [PG 0 (default_pg) Rank 3] ProcessGroupNCCL created ncclComm_ 0x55de54db6c00 on CUDA device:  I0526 15:42:13.855326 89350 ProcessGroupNCCL.cpp:2183] [PG 0 (default_pg) Rank 4] ProcessGroupNCCL created ncclComm_ 0x5606a5797450 on CUDA device:  I0526 15:42:13.855337 89352 ProcessGroupNCCL.cpp:2183] [PG 0 (default_pg) Rank 6] ProcessGroupNCCL created ncclComm_ 0x55713060ba10 on CUDA device:  I0526 15:42:13.855357 89353 ProcessGroupNCCL.cpp:2183] [PG 0 (default_pg) Rank 7] ProcessGroupNCCL created ncclComm_ 0x563e38753a30 on CUDA device:  I0526 15:42:13.855398 89349 ProcessGroupNCCL.cpp:2188] [PG 0 (default_pg) Rank 3] NCCL_DEBUG: N/A I0526 15:42:13.855408 89351 ProcessGroupNCCL.cpp:2183] [PG 0 (default_pg) Rank 5] ProcessGroupNCCL created ncclComm_ 0x555ee276b710 on CUDA device:  I0526 15:42:13.855451 89348 ProcessGroupNCCL.cpp:2183] [PG 0 (default_pg) Rank 2] ProcessGroupNCCL created ncclComm_ 0x557f565420d0 on CUDA device:  I0526 15:42:13.855479 89347 ProcessGroupNCCL.cpp:2183] [PG 0 (default_pg) Rank 1] ProcessGroupNCCL created ncclComm_ 0x55c980ebde00 on CUDA device:  I0526 15:42:13.856510 89350 ProcessGroupNCCL.cpp:2188] [PG 0 (default_pg) Rank 4] NCCL_DEBUG: N/A I0526 15:42:13.856552 89352 ProcessGroupNCCL.cpp:2188] [PG 0 (default_pg) Rank 6] NCCL_DEBUG: N/A I0526 15:42:13.856582 89353 ProcessGroupNCCL.cpp:2188] [PG 0 (default_pg) Rank 7] NCCL_DEBUG: N/A I0526 15:42:13.856609 89351 ProcessGroupNCCL.cpp:2188] [PG 0 (default_pg) Rank 5] NCCL_DEBUG: N/A I0526 15:42:13.856634 89348 ProcessGroupNCCL.cpp:2188] [PG 0 (default_pg) Rank 2] NCCL_DEBUG: N/A I0526 15:42:13.856678 89347 ProcessGroupNCCL.cpp:2188] [PG 0 (default_pg) Rank 1] NCCL_DEBUG: N/A /usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] /usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] /usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] /usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] /usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] /usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] /usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] /usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] Steps: 0%| | 0/2000 [02:49 sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 348, in wrapper return f(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 901, in main run(args) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 892, in run elastic_launch( File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 133, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 255, in launch_agent result = agent.run() File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/metrics/api.py", line 124, in wrapper result = f(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/agent/server/api.py", line 694, in run self._shutdown() File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/agent/server/local_elastic_agent.py", line 347, in _shutdown self._pcontext.close(death_sig) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/api.py", line 544, in close self._close(death_sig=death_sig, timeout=timeout) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/api.py", line 868, in _close handler.proc.wait(time_to_wait) File "/usr/lib/python3.10/subprocess.py", line 1209, in wait return self._wait(timeout=timeout) File "/usr/lib/python3.10/subprocess.py", line 1953, in _wait time.sleep(delay) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/api.py", line 79, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 89279 got signal: 2