start exec
W0528 15:34:50.624000 139759071237952 torch/distributed/run.py:779] 
W0528 15:34:50.624000 139759071237952 torch/distributed/run.py:779] *****************************************
W0528 15:34:50.624000 139759071237952 torch/distributed/run.py:779] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
W0528 15:34:50.624000 139759071237952 torch/distributed/run.py:779] *****************************************
Get file(/var/log/hylog/root_python3.10_20250528_104907.002621_73797.log) infomation fail.
INFO 05-28 15:34:54 __init__.py:193] Automatically detected platform rocm.
INFO 05-28 15:34:54 __init__.py:193] Automatically detected platform rocm.
INFO 05-28 15:34:54 __init__.py:193] Automatically detected platform rocm.
INFO 05-28 15:34:54 __init__.py:193] Automatically detected platform rocm.
INFO 05-28 15:34:54 __init__.py:193] Automatically detected platform rocm.
INFO 05-28 15:34:54 __init__.py:193] Automatically detected platform rocm.
INFO 05-28 15:34:54 __init__.py:193] Automatically detected platform rocm.
INFO 05-28 15:34:54 __init__.py:193] Automatically detected platform rocm.
Could not load Sliding Tile Attention.
Could not load Sliding Tile Attention.
Could not load Sliding Tile Attention.
Could not load Sliding Tile Attention.
Could not load Sliding Tile Attention.
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0528 15:34:56.525538 276217 ProcessGroupNCCL.cpp:869] [PG 0 Rank 6] ProcessGroupNCCL initialization options: size: 8, global rank: 6, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
I0528 15:34:56.525607 276217 ProcessGroupNCCL.cpp:878] [PG 0 Rank 6] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
I0528 15:34:56.525978 276217 ProcessGroupNCCL.cpp:869] [PG 2 Rank 2] ProcessGroupNCCL initialization options: size: 4, global rank: 6, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 2
I0528 15:34:56.525995 276217 ProcessGroupNCCL.cpp:878] [PG 2 Rank 2] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Could not load Sliding Tile Attention.
Could not load Sliding Tile Attention.
Could not load Sliding Tile Attention.
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0528 15:34:56.766597 276216 ProcessGroupNCCL.cpp:869] [PG 0 Rank 5] ProcessGroupNCCL initialization options: size: 8, global rank: 5, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
I0528 15:34:56.766664 276216 ProcessGroupNCCL.cpp:878] [PG 0 Rank 5] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
I0528 15:34:56.767031 276216 ProcessGroupNCCL.cpp:869] [PG 2 Rank 1] ProcessGroupNCCL initialization options: size: 4, global rank: 5, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 2
I0528 15:34:56.767061 276216 ProcessGroupNCCL.cpp:878] [PG 2 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0528 15:34:56.792842 276214 ProcessGroupNCCL.cpp:869] [PG 0 Rank 3] ProcessGroupNCCL initialization options: size: 8, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
I0528 15:34:56.792909 276214 ProcessGroupNCCL.cpp:878] [PG 0 Rank 3] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
I0528 15:34:56.793273 276214 ProcessGroupNCCL.cpp:869] [PG 1 Rank 3] ProcessGroupNCCL initialization options: size: 4, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 1
I0528 15:34:56.793291 276214 ProcessGroupNCCL.cpp:878] [PG 1 Rank 3] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0528 15:34:56.806108 276215 ProcessGroupNCCL.cpp:869] [PG 0 Rank 4] ProcessGroupNCCL initialization options: size: 8, global rank: 4, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
I0528 15:34:56.806174 276215 ProcessGroupNCCL.cpp:878] [PG 0 Rank 4] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
I0528 15:34:56.806547 276215 ProcessGroupNCCL.cpp:869] [PG 2 Rank 0] ProcessGroupNCCL initialization options: size: 4, global rank: 4, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 2
I0528 15:34:56.806565 276215 ProcessGroupNCCL.cpp:878] [PG 2 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0528 15:34:56.884632 276218 ProcessGroupNCCL.cpp:869] [PG 0 Rank 7] ProcessGroupNCCL initialization options: size: 8, global rank: 7, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
I0528 15:34:56.884714 276218 ProcessGroupNCCL.cpp:878] [PG 0 Rank 7] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
I0528 15:34:56.885175 276218 ProcessGroupNCCL.cpp:869] [PG 2 Rank 3] ProcessGroupNCCL initialization options: size: 4, global rank: 7, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 2
I0528 15:34:56.885195 276218 ProcessGroupNCCL.cpp:878] [PG 2 Rank 3] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0528 15:34:57.219266 276213 ProcessGroupNCCL.cpp:869] [PG 0 Rank 2] ProcessGroupNCCL initialization options: size: 8, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
I0528 15:34:57.219333 276213 ProcessGroupNCCL.cpp:878] [PG 0 Rank 2] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
I0528 15:34:57.219671 276213 ProcessGroupNCCL.cpp:869] [PG 1 Rank 2] ProcessGroupNCCL initialization options: size: 4, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 1
I0528 15:34:57.219688 276213 ProcessGroupNCCL.cpp:878] [PG 1 Rank 2] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0528 15:34:57.233049 276212 ProcessGroupNCCL.cpp:869] [PG 0 Rank 1] ProcessGroupNCCL initialization options: size: 8, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
I0528 15:34:57.233876 276212 ProcessGroupNCCL.cpp:878] [PG 0 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
I0528 15:34:57.234202 276212 ProcessGroupNCCL.cpp:869] [PG 1 Rank 1] ProcessGroupNCCL initialization options: size: 4, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 1
I0528 15:34:57.234220 276212 ProcessGroupNCCL.cpp:878] [PG 1 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0528 15:34:57.257458 276211 ProcessGroupNCCL.cpp:869] [PG 0 Rank 0] ProcessGroupNCCL initialization options: size: 8, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
I0528 15:34:57.257526 276211 ProcessGroupNCCL.cpp:878] [PG 0 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
I0528 15:34:57.257985 276211 ProcessGroupNCCL.cpp:869] [PG 1 Rank 0] ProcessGroupNCCL initialization options: size: 4, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 1
I0528 15:34:57.258001 276211 ProcessGroupNCCL.cpp:878] [PG 1 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
--> loading model from /home/model/HunyuanVideo/hunyuan-video-t2v-720p
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
  Total training parameters = 12821.012544 M
--> Initializing FSDP with sharding strategy: full
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
--> applying fdsp activation checkpointing...
--> model loaded
--> applying fdsp activation checkpointing...
FullyShardedDataParallel(
  (_fsdp_wrapped_module): HYVideoDiffusionTransformer(
    (img_in): PatchEmbed(
      (proj): Conv3d(16, 3072, kernel_size=(1, 2, 2), stride=(1, 2, 2))
      (norm): Identity()
    )
    (txt_in): SingleTokenRefiner(
      (input_embedder): Linear(in_features=4096, out_features=3072, bias=True)
      (t_embedder): TimestepEmbedder(
        (mlp): Sequential(
          (0): Linear(in_features=256, out_features=3072, bias=True)
          (1): SiLU()
          (2): Linear(in_features=3072, out_features=3072, bias=True)
        )
      )
      (c_embedder): TextProjection(
        (linear_1): Linear(in_features=4096, out_features=3072, bias=True)
        (act_1): SiLU()
        (linear_2): Linear(in_features=3072, out_features=3072, bias=True)
      )
      (individual_token_refiner): IndividualTokenRefiner(
        (blocks): ModuleList(
          (0-1): 2 x IndividualTokenRefinerBlock(
            (norm1): LayerNorm((3072,), eps=1e-06, elementwise_affine=True)
            (self_attn_qkv): Linear(in_features=3072, out_features=9216, bias=True)
            (self_attn_q_norm): Identity()
            (self_attn_k_norm): Identity()
            (self_attn_proj): Linear(in_features=3072, out_features=3072, bias=True)
            (norm2): LayerNorm((3072,), eps=1e-06, elementwise_affine=True)
            (mlp): MLP(
              (fc1): Linear(in_features=3072, out_features=12288, bias=True)
              (act): SiLU()
              (drop1): Dropout(p=0.0, inplace=False)
              (norm): Identity()
              (fc2): Linear(in_features=12288, out_features=3072, bias=True)
              (drop2): Dropout(p=0.0, inplace=False)
            )
            (adaLN_modulation): Sequential(
              (0): SiLU()
              (1): Linear(in_features=3072, out_features=6144, bias=True)
            )
          )
        )
      )
    )
    (time_in): TimestepEmbedder(
      (mlp): Sequential(
        (0): Linear(in_features=256, out_features=3072, bias=True)
        (1): SiLU()
        (2): Linear(in_features=3072, out_features=3072, bias=True)
      )
    )
    (vector_in): MLPEmbedder(
      (in_layer): Linear(in_features=768, out_features=3072, bias=True)
      (silu): SiLU()
      (out_layer): Linear(in_features=3072, out_features=3072, bias=True)
    )
    (guidance_in): TimestepEmbedder(
      (mlp): Sequential(
        (0): Linear(in_features=256, out_features=3072, bias=True)
        (1): SiLU()
        (2): Linear(in_features=3072, out_features=3072, bias=True)
      )
    )
    (double_blocks): ModuleList(
      (0-19): 20 x FullyShardedDataParallel(
        (_fsdp_wrapped_module): CheckpointWrapper(
          (_checkpoint_wrapped_module): MMDoubleStreamBlock(
            (img_mod): ModulateDiT(
              (act): SiLU()
              (linear): Linear(in_features=3072, out_features=18432, bias=True)
            )
            (img_norm1): LayerNorm((3072,), eps=1e-06, elementwise_affine=False)
            (img_attn_qkv): Linear(in_features=3072, out_features=9216, bias=True)
            (img_attn_q_norm): RMSNorm()
            (img_attn_k_norm): RMSNorm()
            (img_attn_proj): Linear(in_features=3072, out_features=3072, bias=True)
            (img_norm2): LayerNorm((3072,), eps=1e-06, elementwise_affine=False)
            (img_mlp): MLP(
              (fc1): Linear(in_features=3072, out_features=12288, bias=True)
              (act): GELU(approximate='tanh')
              (drop1): Dropout(p=0.0, inplace=False)
              (norm): Identity()
              (fc2): Linear(in_features=12288, out_features=3072, bias=True)
              (drop2): Dropout(p=0.0, inplace=False)
            )
            (txt_mod): ModulateDiT(
              (act): SiLU()
              (linear): Linear(in_features=3072, out_features=18432, bias=True)
            )
            (txt_norm1): LayerNorm((3072,), eps=1e-06, elementwise_affine=False)
            (txt_attn_qkv): Linear(in_features=3072, out_features=9216, bias=True)
            (txt_attn_q_norm): RMSNorm()
            (txt_attn_k_norm): RMSNorm()
            (txt_attn_proj): Linear(in_features=3072, out_features=3072, bias=True)
            (txt_norm2): LayerNorm((3072,), eps=1e-06, elementwise_affine=False)
            (txt_mlp): MLP(
              (fc1): Linear(in_features=3072, out_features=12288, bias=True)
              (act): GELU(approximate='tanh')
              (drop1): Dropout(p=0.0, inplace=False)
              (norm): Identity()
              (fc2): Linear(in_features=12288, out_features=3072, bias=True)
              (drop2): Dropout(p=0.0, inplace=False)
            )
          )
        )
      )
    )
    (single_blocks): ModuleList(
      (0-39): 40 x FullyShardedDataParallel(
        (_fsdp_wrapped_module): CheckpointWrapper(
          (_checkpoint_wrapped_module): MMSingleStreamBlock(
            (linear1): Linear(in_features=3072, out_features=21504, bias=True)
            (linear2): Linear(in_features=15360, out_features=3072, bias=True)
            (q_norm): RMSNorm()
            (k_norm): RMSNorm()
            (pre_norm): LayerNorm((3072,), eps=1e-06, elementwise_affine=False)
            (mlp_act): GELU(approximate='tanh')
            (modulation): ModulateDiT(
              (act): SiLU()
              (linear): Linear(in_features=3072, out_features=9216, bias=True)
            )
          )
        )
      )
    )
    (final_layer): FinalLayer(
      (norm_final): LayerNorm((3072,), eps=1e-06, elementwise_affine=False)
      (linear): Linear(in_features=3072, out_features=64, bias=True)
      (adaLN_modulation): Sequential(
        (0): SiLU()
        (1): Linear(in_features=3072, out_features=6144, bias=True)
      )
    )
  )
)
--> applying fdsp activation checkpointing...
optimizer: AdamW (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    capturable: False
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: None
    lr: 1e-05
    maximize: False
    weight_decay: 0.01
)
***** Running training *****
  Num examples = 101
  Dataloader size = 13
  Num Epochs = 1
  Resume training from step 0
  Instantaneous batch size per device = 1
  Total train batch size (w. data & sequence parallel, accumulation) = 2.0
  Gradient Accumulation steps = 1
  Total optimization steps = 12
  Total training parameters per FSDP shard = 1.602626568 B
  Master weight dtype: torch.float32
Steps:   0%|          | 0/12 [00:00<?, ?it/s]--> applying fdsp activation checkpointing...
--> applying fdsp activation checkpointing...
I0528 15:36:24.743131 276211 ProcessGroupNCCL.cpp:2074] [PG 1 Rank 0] ProcessGroupNCCL broadcast unique ID through store took 0.173662 ms
I0528 15:36:25.019989 276215 ProcessGroupNCCL.cpp:2074] [PG 2 Rank 0] ProcessGroupNCCL broadcast unique ID through store took 0.131744 ms
I0528 15:36:25.020220 276216 ProcessGroupNCCL.cpp:2074] [PG 2 Rank 1] ProcessGroupNCCL broadcast unique ID through store took 264.896 ms
I0528 15:36:25.020196 276217 ProcessGroupNCCL.cpp:2074] [PG 2 Rank 2] ProcessGroupNCCL broadcast unique ID through store took 1207.97 ms
I0528 15:36:25.033840 276214 ProcessGroupNCCL.cpp:2074] [PG 1 Rank 3] ProcessGroupNCCL broadcast unique ID through store took 0.302216 ms
--> applying fdsp activation checkpointing...
--> applying fdsp activation checkpointing...
I0528 15:36:25.618300 276213 ProcessGroupNCCL.cpp:2074] [PG 1 Rank 2] ProcessGroupNCCL broadcast unique ID through store took 0.294257 ms
I0528 15:36:25.636817 276212 ProcessGroupNCCL.cpp:2074] [PG 1 Rank 1] ProcessGroupNCCL broadcast unique ID through store took 0.266938 ms
--> applying fdsp activation checkpointing...
I0528 15:36:26.249289 276218 ProcessGroupNCCL.cpp:2074] [PG 2 Rank 3] ProcessGroupNCCL broadcast unique ID through store took 0.289377 ms
I0528 15:36:26.546187 276211 ProcessGroupNCCL.cpp:2183] [PG 1 Rank 0] ProcessGroupNCCL created ncclComm_ 0x556f407feec0 on CUDA device:  
I0528 15:36:26.546268 276211 ProcessGroupNCCL.cpp:2188] [PG 1 Rank 0] NCCL_DEBUG: N/A
I0528 15:36:26.546411 276212 ProcessGroupNCCL.cpp:2183] [PG 1 Rank 1] ProcessGroupNCCL created ncclComm_ 0x561518a6edc0 on CUDA device: 
I0528 15:36:26.546412 276213 ProcessGroupNCCL.cpp:2183] [PG 1 Rank 2] ProcessGroupNCCL created ncclComm_ 0x55d7bc9f6c40 on CUDA device: 
I0528 15:36:26.546434 276214 ProcessGroupNCCL.cpp:2183] [PG 1 Rank 3] ProcessGroupNCCL created ncclComm_ 0x55fd7d253d30 on CUDA device: 
I0528 15:36:26.546558 276212 ProcessGroupNCCL.cpp:2188] [PG 1 Rank 1] NCCL_DEBUG: N/A
I0528 15:36:26.546612 276213 ProcessGroupNCCL.cpp:2188] [PG 1 Rank 2] NCCL_DEBUG: N/A
I0528 15:36:26.546662 276214 ProcessGroupNCCL.cpp:2188] [PG 1 Rank 3] NCCL_DEBUG: N/A
I0528 15:36:27.051357 276211 ProcessGroupNCCL.cpp:2074] [PG 0 (default_pg) Rank 0] ProcessGroupNCCL broadcast unique ID through store took 0.055747 ms
I0528 15:36:27.054250 276212 ProcessGroupNCCL.cpp:2074] [PG 0 (default_pg) Rank 1] ProcessGroupNCCL broadcast unique ID through store took 0.183772 ms
I0528 15:36:27.061859 276213 ProcessGroupNCCL.cpp:2074] [PG 0 (default_pg) Rank 2] ProcessGroupNCCL broadcast unique ID through store took 0.22375 ms
I0528 15:36:27.070976 276214 ProcessGroupNCCL.cpp:2074] [PG 0 (default_pg) Rank 3] ProcessGroupNCCL broadcast unique ID through store took 0.22206 ms
I0528 15:36:27.151024 276217 ProcessGroupNCCL.cpp:2183] [PG 2 Rank 2] ProcessGroupNCCL created ncclComm_ 0x5581a4833d40 on CUDA device: 
I0528 15:36:27.151044 276215 ProcessGroupNCCL.cpp:2183] [PG 2 Rank 0] ProcessGroupNCCL created ncclComm_ 0x5606f21d7a80 on CUDA device: 
I0528 15:36:27.151055 276218 ProcessGroupNCCL.cpp:2183] [PG 2 Rank 3] ProcessGroupNCCL created ncclComm_ 0x55bb465cbb40 on CUDA device: 
I0528 15:36:27.151074 276216 ProcessGroupNCCL.cpp:2183] [PG 2 Rank 1] ProcessGroupNCCL created ncclComm_ 0x55dbfe8bd7f0 on CUDA device: 
I0528 15:36:27.151160 276217 ProcessGroupNCCL.cpp:2188] [PG 2 Rank 2] NCCL_DEBUG: N/A
I0528 15:36:27.151219 276215 ProcessGroupNCCL.cpp:2188] [PG 2 Rank 0] NCCL_DEBUG: N/A
I0528 15:36:27.151248 276218 ProcessGroupNCCL.cpp:2188] [PG 2 Rank 3] NCCL_DEBUG: N/A
I0528 15:36:27.151299 276216 ProcessGroupNCCL.cpp:2188] [PG 2 Rank 1] NCCL_DEBUG: N/A
I0528 15:36:27.662503 276215 ProcessGroupNCCL.cpp:2074] [PG 0 (default_pg) Rank 4] ProcessGroupNCCL broadcast unique ID through store took 0.21784 ms
I0528 15:36:27.665972 276216 ProcessGroupNCCL.cpp:2074] [PG 0 (default_pg) Rank 5] ProcessGroupNCCL broadcast unique ID through store took 0.098776 ms
I0528 15:36:27.669301 276218 ProcessGroupNCCL.cpp:2074] [PG 0 (default_pg) Rank 7] ProcessGroupNCCL broadcast unique ID through store took 0.161632 ms
I0528 15:36:27.669751 276217 ProcessGroupNCCL.cpp:2074] [PG 0 (default_pg) Rank 6] ProcessGroupNCCL broadcast unique ID through store took 0.172792 ms
I0528 15:36:28.128330 276218 ProcessGroupNCCL.cpp:2183] [PG 0 (default_pg) Rank 7] ProcessGroupNCCL created ncclComm_ 0x55bb46232ea0 on CUDA device: 
I0528 15:36:28.128350 276211 ProcessGroupNCCL.cpp:2183] [PG 0 (default_pg) Rank 0] ProcessGroupNCCL created ncclComm_ 0x556f3fd50210 on CUDA device:  
I0528 15:36:28.128432 276218 ProcessGroupNCCL.cpp:2188] [PG 0 (default_pg) Rank 7] NCCL_DEBUG: N/A
I0528 15:36:28.128438 276216 ProcessGroupNCCL.cpp:2183] [PG 0 (default_pg) Rank 5] ProcessGroupNCCL created ncclComm_ 0x55dbfe001520 on CUDA device: 
I0528 15:36:28.128460 276215 ProcessGroupNCCL.cpp:2183] [PG 0 (default_pg) Rank 4] ProcessGroupNCCL created ncclComm_ 0x5606f1fce220 on CUDA device: 
I0528 15:36:28.128479 276211 ProcessGroupNCCL.cpp:2188] [PG 0 (default_pg) Rank 0] NCCL_DEBUG: N/A
I0528 15:36:28.128461 276213 ProcessGroupNCCL.cpp:2183] [PG 0 (default_pg) Rank 2] ProcessGroupNCCL created ncclComm_ 0x55d7bd69eb40 on CUDA device: 
I0528 15:36:28.128536 276216 ProcessGroupNCCL.cpp:2188] [PG 0 (default_pg) Rank 5] NCCL_DEBUG: N/A
I0528 15:36:28.128558 276215 ProcessGroupNCCL.cpp:2188] [PG 0 (default_pg) Rank 4] NCCL_DEBUG: N/A
I0528 15:36:28.128542 276217 ProcessGroupNCCL.cpp:2183] [PG 0 (default_pg) Rank 6] ProcessGroupNCCL created ncclComm_ 0x5581a53a7130 on CUDA device: 
I0528 15:36:28.128611 276213 ProcessGroupNCCL.cpp:2188] [PG 0 (default_pg) Rank 2] NCCL_DEBUG: N/A
I0528 15:36:28.128536 276214 ProcessGroupNCCL.cpp:2183] [PG 0 (default_pg) Rank 3] ProcessGroupNCCL created ncclComm_ 0x55fd7ce19540 on CUDA device: 
I0528 15:36:28.128542 276212 ProcessGroupNCCL.cpp:2183] [PG 0 (default_pg) Rank 1] ProcessGroupNCCL created ncclComm_ 0x561518979550 on CUDA device: 
I0528 15:36:28.128674 276217 ProcessGroupNCCL.cpp:2188] [PG 0 (default_pg) Rank 6] NCCL_DEBUG: N/A
I0528 15:36:28.128743 276214 ProcessGroupNCCL.cpp:2188] [PG 0 (default_pg) Rank 3] NCCL_DEBUG: N/A
I0528 15:36:28.128811 276212 ProcessGroupNCCL.cpp:2188] [PG 0 (default_pg) Rank 1] NCCL_DEBUG: N/A
/usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py:769: UserWarning: Error detected in MulBackward0. Traceback of forward call that caused the error:
  File "/home/wuxk-code/FastVideo-main/fastvideo/models/hunyuan/modules/norm_layers.py", line 60, in forward
    output = output * self.weight
 (Triggered internally at /home/pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:111.)
  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
/usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py:769: UserWarning: Error detected in MulBackward0. Traceback of forward call that caused the error:
  File "/home/wuxk-code/FastVideo-main/fastvideo/models/hunyuan/modules/norm_layers.py", line 60, in forward
    output = output * self.weight
 (Triggered internally at /home/pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:111.)
  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
/usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py:769: UserWarning: Error detected in MulBackward0. Traceback of forward call that caused the error:
  File "/home/wuxk-code/FastVideo-main/fastvideo/models/hunyuan/modules/norm_layers.py", line 60, in forward
    output = output * self.weight
 (Triggered internally at /home/pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:111.)
  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
[rank1]: Traceback (most recent call last):
[rank1]:   File "/home/wuxk-code/FastVideo-main/fastvideo/train.py", line 693, in <module>
[rank1]:     main(args)
[rank1]:   File "/home/wuxk-code/FastVideo-main/fastvideo/train.py", line 398, in main
[rank1]:     loss, grad_norm = train_one_step(
[rank1]:   File "/home/wuxk-code/FastVideo-main/fastvideo/train.py", line 146, in train_one_step
[rank1]:     model_pred = transformer(**input_kwargs)[0]
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank1]:     return forward_call(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 863, in forward
[rank1]:     output = self._fsdp_wrapped_module(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank1]:     return forward_call(*args, **kwargs)
[rank1]:   File "/home/wuxk-code/FastVideo-main/fastvideo/models/hunyuan/modules/models.py", line 572, in forward
[rank1]:     img, txt = block(*double_block_args)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank1]:     return forward_call(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 863, in forward
[rank1]:     output = self._fsdp_wrapped_module(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank1]:     return forward_call(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/algorithms/_checkpoint/checkpoint_wrapper.py", line 169, in forward
[rank1]:     return self.checkpoint_fn(  # type: ignore[misc]
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_compile.py", line 31, in inner
[rank1]:     return disable_fn(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 600, in _fn
[rank1]:     return fn(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 488, in checkpoint
[rank1]:     ret = function(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank1]:     return forward_call(*args, **kwargs)
[rank1]:   File "/home/wuxk-code/FastVideo-main/fastvideo/models/hunyuan/modules/models.py", line 137, in forward
[rank1]:     img_q = self.img_attn_q_norm(img_q).to(img_v)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank1]:     return forward_call(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 433, in _fn
[rank1]:     return fn(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 1116, in __call__
[rank1]:     return self._torchdynamo_orig_callable(
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 948, in __call__
[rank1]:     result = self._inner_convert(
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 472, in __call__
[rank1]:     return _compile(
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_utils_internal.py", line 84, in wrapper_function
[rank1]:     return StrobelightCompileTimeProfiler.profile_compile_time(
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_strobelight/compile_time_profiler.py", line 129, in profile_compile_time
[rank1]:     return func(*args, **kwargs)
[rank1]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank1]:     return func(*args, **kwds)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 817, in _compile
[rank1]:     guarded_code = compile_inner(code, one_graph, hooks, transform)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
[rank1]:     r = func(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 636, in compile_inner
[rank1]:     out_code = transform_code_object(code, transform)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/bytecode_transformation.py", line 1185, in transform_code_object
[rank1]:     transformations(instructions, code_options)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 178, in _fn
[rank1]:     return fn(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 582, in transform
[rank1]:     tracer.run()
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2451, in run
[rank1]:     super().run()
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 893, in run
[rank1]:     while self.step():
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 805, in step
[rank1]:     self.dispatch_table[inst.opcode](self, inst)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2642, in RETURN_VALUE
[rank1]:     self._return(inst)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2627, in _return
[rank1]:     self.output.compile_subgraph(
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1099, in compile_subgraph
[rank1]:     self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root)
[rank1]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank1]:     return func(*args, **kwds)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1319, in compile_and_call_fx_graph
[rank1]:     compiled_fn = self.call_user_compiler(gm)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
[rank1]:     r = func(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1410, in call_user_compiler
[rank1]:     raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1391, in call_user_compiler
[rank1]:     compiled_fn = compiler_fn(gm, self.example_inputs())
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/repro/after_dynamo.py", line 129, in __call__
[rank1]:     compiled_gm = compiler_fn(gm, example_inputs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/__init__.py", line 1956, in __call__
[rank1]:     return compile_fx(model_, inputs_, config_patches=self.config)
[rank1]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank1]:     return func(*args, **kwds)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 1261, in compile_fx
[rank1]:     return compile_fx(
[rank1]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank1]:     return func(*args, **kwds)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 1505, in compile_fx
[rank1]:     return aot_autograd(
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/backends/common.py", line 69, in __call__
[rank1]:     cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/aot_autograd.py", line 954, in aot_module_simplified
[rank1]:     compiled_fn, _ = create_aot_dispatcher_function(
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
[rank1]:     r = func(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/aot_autograd.py", line 687, in create_aot_dispatcher_function
[rank1]:     compiled_fn, fw_metadata = compiler_fn(
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 242, in aot_dispatch_autograd
[rank1]:     fx_g, joint_inputs, maybe_subclass_meta = aot_dispatch_autograd_graph(
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/dispatch_and_compile_graph.py", line 276, in aot_dispatch_autograd_graph
[rank1]:     fx_g = _create_graph(joint_fn_to_trace, updated_joint_inputs, aot_config=aot_config)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/dispatch_and_compile_graph.py", line 46, in _create_graph
[rank1]:     fx_g = make_fx(
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 1421, in wrapped
[rank1]:     return make_fx_tracer.trace(f, *args)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 1367, in trace
[rank1]:     return self._trace_inner(f, *args)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 1354, in _trace_inner
[rank1]:     t = dispatch_trace(
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_compile.py", line 31, in inner
[rank1]:     return disable_fn(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 600, in _fn
[rank1]:     return fn(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 642, in dispatch_trace
[rank1]:     graph = tracer.trace(root, concrete_args)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 600, in _fn
[rank1]:     return fn(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/_symbolic_trace.py", line 822, in trace
[rank1]:     (self.create_arg(fn(*args)),),
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/_symbolic_trace.py", line 675, in flatten_fn
[rank1]:     tree_out = root_fn(*tree_args)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 660, in wrapped
[rank1]:     out = f(*tensors)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 599, in joint_helper
[rank1]:     return _functionalized_f_helper(primals, tangents)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 388, in _functionalized_f_helper
[rank1]:     f_outs = fn(*f_args)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 256, in inner_fn_with_anomaly
[rank1]:     return inner_fn(*args)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 241, in inner_fn
[rank1]:     backward_out = torch.autograd.grad(
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 385, in grad
[rank1]:     return handle_torch_function(
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/overrides.py", line 1642, in handle_torch_function
[rank1]:     result = mode.__torch_function__(public_api, types, args, kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 705, in __torch_function__
[rank1]:     return func(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 436, in grad
[rank1]:     result = _engine_run_backward(
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py", line 769, in _engine_run_backward
[rank1]:     return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 1116, in unpack_hook
[rank1]:     frame.recompute_fn(*args)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 1388, in recompute_fn
[rank1]:     with torch.random.fork_rng(
[rank1]:   File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
[rank1]:     self.gen.throw(typ, value, traceback)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/random.py", line 183, in fork_rng
[rank1]:     device_mod.set_rng_state(device_rng_state, device)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/cuda/random.py", line 62, in set_rng_state
[rank1]:     new_state_copy = new_state.clone(memory_format=torch.contiguous_format)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/functional_tensor.py", line 468, in __torch_dispatch__
[rank1]:     outs_unwrapped = func._op_dk(
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_stats.py", line 21, in wrapper
[rank1]:     return fn(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 755, in __torch_dispatch__
[rank1]:     return self.inner_torch_dispatch(func, types, args, kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 790, in inner_torch_dispatch
[rank1]:     return proxy_call(self, func, self.pre_dispatch, args, kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 467, in proxy_call
[rank1]:     out = func(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_ops.py", line 667, in __call__
[rank1]:     return self_._op(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_stats.py", line 21, in wrapper
[rank1]:     return fn(*args, **kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1061, in __torch_dispatch__
[rank1]:     return self.dispatch(func, types, args, kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1450, in dispatch
[rank1]:     return self._cached_dispatch_impl(func, types, args, kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1153, in _cached_dispatch_impl
[rank1]:     output = self._dispatch_impl(func, types, args, kwargs)
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1539, in _dispatch_impl
[rank1]:     (flat_args, flat_arg_fake_tensors) = self.validate_and_convert_non_fake_tensors(
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1832, in validate_and_convert_non_fake_tensors
[rank1]:     validated_args = [validate(a) for a in flat_args]
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1832, in <listcomp>
[rank1]:     validated_args = [validate(a) for a in flat_args]
[rank1]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1822, in validate
[rank1]:     raise AssertionError(
[rank1]: torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
[rank1]: AssertionError: Please convert all Tensors to FakeTensors first or instantiate FakeTensorMode with 'allow_non_fake_inputs'. Found in aten.clone.default(tensor([...], size=(16,), dtype=torch.uint8), memory_format=torch.contiguous_format)

[rank1]: Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information


[rank1]: You can suppress this exception and fall back to eager by setting:
[rank1]:     import torch._dynamo
[rank1]:     torch._dynamo.config.suppress_errors = True

[rank5]: Traceback (most recent call last):
[rank5]:   File "/home/wuxk-code/FastVideo-main/fastvideo/train.py", line 693, in <module>
[rank5]:     main(args)
[rank5]:   File "/home/wuxk-code/FastVideo-main/fastvideo/train.py", line 398, in main
[rank5]:     loss, grad_norm = train_one_step(
[rank5]:   File "/home/wuxk-code/FastVideo-main/fastvideo/train.py", line 146, in train_one_step
[rank5]:     model_pred = transformer(**input_kwargs)[0]
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank5]:     return self._call_impl(*args, **kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank5]:     return forward_call(*args, **kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 863, in forward
[rank5]:     output = self._fsdp_wrapped_module(*args, **kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank5]:     return self._call_impl(*args, **kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank5]:     return forward_call(*args, **kwargs)
[rank5]:   File "/home/wuxk-code/FastVideo-main/fastvideo/models/hunyuan/modules/models.py", line 572, in forward
[rank5]:     img, txt = block(*double_block_args)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank5]:     return self._call_impl(*args, **kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank5]:     return forward_call(*args, **kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 863, in forward
[rank5]:     output = self._fsdp_wrapped_module(*args, **kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank5]:     return self._call_impl(*args, **kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank5]:     return forward_call(*args, **kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/algorithms/_checkpoint/checkpoint_wrapper.py", line 169, in forward
[rank5]:     return self.checkpoint_fn(  # type: ignore[misc]
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_compile.py", line 31, in inner
[rank5]:     return disable_fn(*args, **kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 600, in _fn
[rank5]:     return fn(*args, **kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 488, in checkpoint
[rank5]:     ret = function(*args, **kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank5]:     return self._call_impl(*args, **kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank5]:     return forward_call(*args, **kwargs)
[rank5]:   File "/home/wuxk-code/FastVideo-main/fastvideo/models/hunyuan/modules/models.py", line 137, in forward
[rank5]:     img_q = self.img_attn_q_norm(img_q).to(img_v)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank5]:     return self._call_impl(*args, **kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank5]:     return forward_call(*args, **kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 433, in _fn
[rank5]:     return fn(*args, **kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 1116, in __call__
[rank5]:     return self._torchdynamo_orig_callable(
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 948, in __call__
[rank5]:     result = self._inner_convert(
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 472, in __call__
[rank5]:     return _compile(
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_utils_internal.py", line 84, in wrapper_function
[rank5]:     return StrobelightCompileTimeProfiler.profile_compile_time(
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_strobelight/compile_time_profiler.py", line 129, in profile_compile_time
[rank5]:     return func(*args, **kwargs)
[rank5]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank5]:     return func(*args, **kwds)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 817, in _compile
[rank5]:     guarded_code = compile_inner(code, one_graph, hooks, transform)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
[rank5]:     r = func(*args, **kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 636, in compile_inner
[rank5]:     out_code = transform_code_object(code, transform)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/bytecode_transformation.py", line 1185, in transform_code_object
[rank5]:     transformations(instructions, code_options)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 178, in _fn
[rank5]:     return fn(*args, **kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 582, in transform
[rank5]:     tracer.run()
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2451, in run
[rank5]:     super().run()
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 893, in run
[rank5]:     while self.step():
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 805, in step
[rank5]:     self.dispatch_table[inst.opcode](self, inst)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2642, in RETURN_VALUE
[rank5]:     self._return(inst)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2627, in _return
[rank5]:     self.output.compile_subgraph(
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1099, in compile_subgraph
[rank5]:     self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root)
[rank5]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank5]:     return func(*args, **kwds)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1319, in compile_and_call_fx_graph
[rank5]:     compiled_fn = self.call_user_compiler(gm)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
[rank5]:     r = func(*args, **kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1410, in call_user_compiler
[rank5]:     raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1391, in call_user_compiler
[rank5]:     compiled_fn = compiler_fn(gm, self.example_inputs())
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/repro/after_dynamo.py", line 129, in __call__
[rank5]:     compiled_gm = compiler_fn(gm, example_inputs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/__init__.py", line 1956, in __call__
[rank5]:     return compile_fx(model_, inputs_, config_patches=self.config)
[rank5]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank5]:     return func(*args, **kwds)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 1261, in compile_fx
[rank5]:     return compile_fx(
[rank5]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank5]:     return func(*args, **kwds)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 1505, in compile_fx
[rank5]:     return aot_autograd(
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/backends/common.py", line 69, in __call__
[rank5]:     cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/aot_autograd.py", line 954, in aot_module_simplified
[rank5]:     compiled_fn, _ = create_aot_dispatcher_function(
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
[rank5]:     r = func(*args, **kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/aot_autograd.py", line 687, in create_aot_dispatcher_function
[rank5]:     compiled_fn, fw_metadata = compiler_fn(
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 242, in aot_dispatch_autograd
[rank5]:     fx_g, joint_inputs, maybe_subclass_meta = aot_dispatch_autograd_graph(
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/dispatch_and_compile_graph.py", line 276, in aot_dispatch_autograd_graph
[rank5]:     fx_g = _create_graph(joint_fn_to_trace, updated_joint_inputs, aot_config=aot_config)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/dispatch_and_compile_graph.py", line 46, in _create_graph
[rank5]:     fx_g = make_fx(
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 1421, in wrapped
[rank5]:     return make_fx_tracer.trace(f, *args)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 1367, in trace
[rank5]:     return self._trace_inner(f, *args)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 1354, in _trace_inner
[rank5]:     t = dispatch_trace(
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_compile.py", line 31, in inner
[rank5]:     return disable_fn(*args, **kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 600, in _fn
[rank5]:     return fn(*args, **kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 642, in dispatch_trace
[rank5]:     graph = tracer.trace(root, concrete_args)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 600, in _fn
[rank5]:     return fn(*args, **kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/_symbolic_trace.py", line 822, in trace
[rank5]:     (self.create_arg(fn(*args)),),
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/_symbolic_trace.py", line 675, in flatten_fn
[rank5]:     tree_out = root_fn(*tree_args)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 660, in wrapped
[rank5]:     out = f(*tensors)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 599, in joint_helper
[rank5]:     return _functionalized_f_helper(primals, tangents)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 388, in _functionalized_f_helper
[rank5]:     f_outs = fn(*f_args)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 256, in inner_fn_with_anomaly
[rank5]:     return inner_fn(*args)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 241, in inner_fn
[rank5]:     backward_out = torch.autograd.grad(
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 385, in grad
[rank5]:     return handle_torch_function(
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/overrides.py", line 1642, in handle_torch_function
[rank5]:     result = mode.__torch_function__(public_api, types, args, kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 705, in __torch_function__
[rank5]:     return func(*args, **kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 436, in grad
[rank5]:     result = _engine_run_backward(
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py", line 769, in _engine_run_backward
[rank5]:     return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 1116, in unpack_hook
[rank5]:     frame.recompute_fn(*args)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 1388, in recompute_fn
[rank5]:     with torch.random.fork_rng(
[rank5]:   File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
[rank5]:     self.gen.throw(typ, value, traceback)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/random.py", line 183, in fork_rng
[rank5]:     device_mod.set_rng_state(device_rng_state, device)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/cuda/random.py", line 62, in set_rng_state
[rank5]:     new_state_copy = new_state.clone(memory_format=torch.contiguous_format)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/functional_tensor.py", line 468, in __torch_dispatch__
[rank5]:     outs_unwrapped = func._op_dk(
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_stats.py", line 21, in wrapper
[rank5]:     return fn(*args, **kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 755, in __torch_dispatch__
[rank5]:     return self.inner_torch_dispatch(func, types, args, kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 790, in inner_torch_dispatch
[rank5]:     return proxy_call(self, func, self.pre_dispatch, args, kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 467, in proxy_call
[rank5]:     out = func(*args, **kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_ops.py", line 667, in __call__
[rank5]:     return self_._op(*args, **kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_stats.py", line 21, in wrapper
[rank5]:     return fn(*args, **kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1061, in __torch_dispatch__
[rank5]:     return self.dispatch(func, types, args, kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1450, in dispatch
[rank5]:     return self._cached_dispatch_impl(func, types, args, kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1153, in _cached_dispatch_impl
[rank5]:     output = self._dispatch_impl(func, types, args, kwargs)
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1539, in _dispatch_impl
[rank5]:     (flat_args, flat_arg_fake_tensors) = self.validate_and_convert_non_fake_tensors(
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1832, in validate_and_convert_non_fake_tensors
[rank5]:     validated_args = [validate(a) for a in flat_args]
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1832, in <listcomp>
[rank5]:     validated_args = [validate(a) for a in flat_args]
[rank5]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1822, in validate
[rank5]:     raise AssertionError(
[rank5]: torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
[rank5]: AssertionError: Please convert all Tensors to FakeTensors first or instantiate FakeTensorMode with 'allow_non_fake_inputs'. Found in aten.clone.default(tensor([...], size=(16,), dtype=torch.uint8), memory_format=torch.contiguous_format)

[rank5]: Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information


[rank5]: You can suppress this exception and fall back to eager by setting:
[rank5]:     import torch._dynamo
[rank5]:     torch._dynamo.config.suppress_errors = True

[rank7]: Traceback (most recent call last):
[rank7]:   File "/home/wuxk-code/FastVideo-main/fastvideo/train.py", line 693, in <module>
[rank7]:     main(args)
[rank7]:   File "/home/wuxk-code/FastVideo-main/fastvideo/train.py", line 398, in main
[rank7]:     loss, grad_norm = train_one_step(
[rank7]:   File "/home/wuxk-code/FastVideo-main/fastvideo/train.py", line 146, in train_one_step
[rank7]:     model_pred = transformer(**input_kwargs)[0]
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank7]:     return self._call_impl(*args, **kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank7]:     return forward_call(*args, **kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 863, in forward
[rank7]:     output = self._fsdp_wrapped_module(*args, **kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank7]:     return self._call_impl(*args, **kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank7]:     return forward_call(*args, **kwargs)
[rank7]:   File "/home/wuxk-code/FastVideo-main/fastvideo/models/hunyuan/modules/models.py", line 572, in forward
[rank7]:     img, txt = block(*double_block_args)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank7]:     return self._call_impl(*args, **kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank7]:     return forward_call(*args, **kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 863, in forward
[rank7]:     output = self._fsdp_wrapped_module(*args, **kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank7]:     return self._call_impl(*args, **kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank7]:     return forward_call(*args, **kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/algorithms/_checkpoint/checkpoint_wrapper.py", line 169, in forward
[rank7]:     return self.checkpoint_fn(  # type: ignore[misc]
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_compile.py", line 31, in inner
[rank7]:     return disable_fn(*args, **kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 600, in _fn
[rank7]:     return fn(*args, **kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 488, in checkpoint
[rank7]:     ret = function(*args, **kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank7]:     return self._call_impl(*args, **kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank7]:     return forward_call(*args, **kwargs)
[rank7]:   File "/home/wuxk-code/FastVideo-main/fastvideo/models/hunyuan/modules/models.py", line 137, in forward
[rank7]:     img_q = self.img_attn_q_norm(img_q).to(img_v)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank7]:     return self._call_impl(*args, **kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank7]:     return forward_call(*args, **kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 433, in _fn
[rank7]:     return fn(*args, **kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 1116, in __call__
[rank7]:     return self._torchdynamo_orig_callable(
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 948, in __call__
[rank7]:     result = self._inner_convert(
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 472, in __call__
[rank7]:     return _compile(
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_utils_internal.py", line 84, in wrapper_function
[rank7]:     return StrobelightCompileTimeProfiler.profile_compile_time(
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_strobelight/compile_time_profiler.py", line 129, in profile_compile_time
[rank7]:     return func(*args, **kwargs)
[rank7]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank7]:     return func(*args, **kwds)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 817, in _compile
[rank7]:     guarded_code = compile_inner(code, one_graph, hooks, transform)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
[rank7]:     r = func(*args, **kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 636, in compile_inner
[rank7]:     out_code = transform_code_object(code, transform)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/bytecode_transformation.py", line 1185, in transform_code_object
[rank7]:     transformations(instructions, code_options)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 178, in _fn
[rank7]:     return fn(*args, **kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 582, in transform
[rank7]:     tracer.run()
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2451, in run
[rank7]:     super().run()
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 893, in run
[rank7]:     while self.step():
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 805, in step
[rank7]:     self.dispatch_table[inst.opcode](self, inst)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2642, in RETURN_VALUE
[rank7]:     self._return(inst)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2627, in _return
[rank7]:     self.output.compile_subgraph(
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1099, in compile_subgraph
[rank7]:     self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root)
[rank7]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank7]:     return func(*args, **kwds)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1319, in compile_and_call_fx_graph
[rank7]:     compiled_fn = self.call_user_compiler(gm)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
[rank7]:     r = func(*args, **kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1410, in call_user_compiler
[rank7]:     raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1391, in call_user_compiler
[rank7]:     compiled_fn = compiler_fn(gm, self.example_inputs())
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/repro/after_dynamo.py", line 129, in __call__
[rank7]:     compiled_gm = compiler_fn(gm, example_inputs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/__init__.py", line 1956, in __call__
[rank7]:     return compile_fx(model_, inputs_, config_patches=self.config)
[rank7]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank7]:     return func(*args, **kwds)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 1261, in compile_fx
[rank7]:     return compile_fx(
[rank7]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank7]:     return func(*args, **kwds)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 1505, in compile_fx
[rank7]:     return aot_autograd(
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/backends/common.py", line 69, in __call__
[rank7]:     cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/aot_autograd.py", line 954, in aot_module_simplified
[rank7]:     compiled_fn, _ = create_aot_dispatcher_function(
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
[rank7]:     r = func(*args, **kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/aot_autograd.py", line 687, in create_aot_dispatcher_function
[rank7]:     compiled_fn, fw_metadata = compiler_fn(
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 242, in aot_dispatch_autograd
[rank7]:     fx_g, joint_inputs, maybe_subclass_meta = aot_dispatch_autograd_graph(
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/dispatch_and_compile_graph.py", line 276, in aot_dispatch_autograd_graph
[rank7]:     fx_g = _create_graph(joint_fn_to_trace, updated_joint_inputs, aot_config=aot_config)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/dispatch_and_compile_graph.py", line 46, in _create_graph
[rank7]:     fx_g = make_fx(
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 1421, in wrapped
[rank7]:     return make_fx_tracer.trace(f, *args)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 1367, in trace
[rank7]:     return self._trace_inner(f, *args)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 1354, in _trace_inner
[rank7]:     t = dispatch_trace(
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_compile.py", line 31, in inner
[rank7]:     return disable_fn(*args, **kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 600, in _fn
[rank7]:     return fn(*args, **kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 642, in dispatch_trace
[rank7]:     graph = tracer.trace(root, concrete_args)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 600, in _fn
[rank7]:     return fn(*args, **kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/_symbolic_trace.py", line 822, in trace
[rank7]:     (self.create_arg(fn(*args)),),
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/_symbolic_trace.py", line 675, in flatten_fn
[rank7]:     tree_out = root_fn(*tree_args)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 660, in wrapped
[rank7]:     out = f(*tensors)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 599, in joint_helper
[rank7]:     return _functionalized_f_helper(primals, tangents)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 388, in _functionalized_f_helper
[rank7]:     f_outs = fn(*f_args)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 256, in inner_fn_with_anomaly
[rank7]:     return inner_fn(*args)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 241, in inner_fn
[rank7]:     backward_out = torch.autograd.grad(
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 385, in grad
[rank7]:     return handle_torch_function(
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/overrides.py", line 1642, in handle_torch_function
[rank7]:     result = mode.__torch_function__(public_api, types, args, kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 705, in __torch_function__
[rank7]:     return func(*args, **kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 436, in grad
[rank7]:     result = _engine_run_backward(
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py", line 769, in _engine_run_backward
[rank7]:     return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 1116, in unpack_hook
[rank7]:     frame.recompute_fn(*args)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 1388, in recompute_fn
[rank7]:     with torch.random.fork_rng(
[rank7]:   File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
[rank7]:     self.gen.throw(typ, value, traceback)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/random.py", line 183, in fork_rng
[rank7]:     device_mod.set_rng_state(device_rng_state, device)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/cuda/random.py", line 62, in set_rng_state
[rank7]:     new_state_copy = new_state.clone(memory_format=torch.contiguous_format)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/functional_tensor.py", line 468, in __torch_dispatch__
[rank7]:     outs_unwrapped = func._op_dk(
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_stats.py", line 21, in wrapper
[rank7]:     return fn(*args, **kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 755, in __torch_dispatch__
[rank7]:     return self.inner_torch_dispatch(func, types, args, kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 790, in inner_torch_dispatch
[rank7]:     return proxy_call(self, func, self.pre_dispatch, args, kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 467, in proxy_call
[rank7]:     out = func(*args, **kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_ops.py", line 667, in __call__
[rank7]:     return self_._op(*args, **kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_stats.py", line 21, in wrapper
[rank7]:     return fn(*args, **kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1061, in __torch_dispatch__
[rank7]:     return self.dispatch(func, types, args, kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1450, in dispatch
[rank7]:     return self._cached_dispatch_impl(func, types, args, kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1153, in _cached_dispatch_impl
[rank7]:     output = self._dispatch_impl(func, types, args, kwargs)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1539, in _dispatch_impl
[rank7]:     (flat_args, flat_arg_fake_tensors) = self.validate_and_convert_non_fake_tensors(
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1832, in validate_and_convert_non_fake_tensors
[rank7]:     validated_args = [validate(a) for a in flat_args]
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1832, in <listcomp>
[rank7]:     validated_args = [validate(a) for a in flat_args]
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1822, in validate
[rank7]:     raise AssertionError(
[rank7]: torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
[rank7]: AssertionError: Please convert all Tensors to FakeTensors first or instantiate FakeTensorMode with 'allow_non_fake_inputs'. Found in aten.clone.default(tensor([...], size=(16,), dtype=torch.uint8), memory_format=torch.contiguous_format)

[rank7]: Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information


[rank7]: You can suppress this exception and fall back to eager by setting:
[rank7]:     import torch._dynamo
[rank7]:     torch._dynamo.config.suppress_errors = True

/usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py:769: UserWarning: Error detected in MulBackward0. Traceback of forward call that caused the error:
  File "/home/wuxk-code/FastVideo-main/fastvideo/models/hunyuan/modules/norm_layers.py", line 60, in forward
    output = output * self.weight
 (Triggered internally at /home/pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:111.)
  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
[rank6]: Traceback (most recent call last):
[rank6]:   File "/home/wuxk-code/FastVideo-main/fastvideo/train.py", line 693, in <module>
[rank6]:     main(args)
[rank6]:   File "/home/wuxk-code/FastVideo-main/fastvideo/train.py", line 398, in main
[rank6]:     loss, grad_norm = train_one_step(
[rank6]:   File "/home/wuxk-code/FastVideo-main/fastvideo/train.py", line 146, in train_one_step
[rank6]:     model_pred = transformer(**input_kwargs)[0]
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank6]:     return self._call_impl(*args, **kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank6]:     return forward_call(*args, **kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 863, in forward
[rank6]:     output = self._fsdp_wrapped_module(*args, **kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank6]:     return self._call_impl(*args, **kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank6]:     return forward_call(*args, **kwargs)
[rank6]:   File "/home/wuxk-code/FastVideo-main/fastvideo/models/hunyuan/modules/models.py", line 572, in forward
[rank6]:     img, txt = block(*double_block_args)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank6]:     return self._call_impl(*args, **kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank6]:     return forward_call(*args, **kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 863, in forward
[rank6]:     output = self._fsdp_wrapped_module(*args, **kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank6]:     return self._call_impl(*args, **kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank6]:     return forward_call(*args, **kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/algorithms/_checkpoint/checkpoint_wrapper.py", line 169, in forward
[rank6]:     return self.checkpoint_fn(  # type: ignore[misc]
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_compile.py", line 31, in inner
[rank6]:     return disable_fn(*args, **kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 600, in _fn
[rank6]:     return fn(*args, **kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 488, in checkpoint
[rank6]:     ret = function(*args, **kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank6]:     return self._call_impl(*args, **kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank6]:     return forward_call(*args, **kwargs)
[rank6]:   File "/home/wuxk-code/FastVideo-main/fastvideo/models/hunyuan/modules/models.py", line 137, in forward
[rank6]:     img_q = self.img_attn_q_norm(img_q).to(img_v)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank6]:     return self._call_impl(*args, **kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank6]:     return forward_call(*args, **kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 433, in _fn
[rank6]:     return fn(*args, **kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 1116, in __call__
[rank6]:     return self._torchdynamo_orig_callable(
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 948, in __call__
[rank6]:     result = self._inner_convert(
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 472, in __call__
[rank6]:     return _compile(
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_utils_internal.py", line 84, in wrapper_function
[rank6]:     return StrobelightCompileTimeProfiler.profile_compile_time(
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_strobelight/compile_time_profiler.py", line 129, in profile_compile_time
[rank6]:     return func(*args, **kwargs)
[rank6]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank6]:     return func(*args, **kwds)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 817, in _compile
[rank6]:     guarded_code = compile_inner(code, one_graph, hooks, transform)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
[rank6]:     r = func(*args, **kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 636, in compile_inner
[rank6]:     out_code = transform_code_object(code, transform)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/bytecode_transformation.py", line 1185, in transform_code_object
[rank6]:     transformations(instructions, code_options)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 178, in _fn
[rank6]:     return fn(*args, **kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 582, in transform
[rank6]:     tracer.run()
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2451, in run
[rank6]:     super().run()
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 893, in run
[rank6]:     while self.step():
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 805, in step
[rank6]:     self.dispatch_table[inst.opcode](self, inst)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2642, in RETURN_VALUE
[rank6]:     self._return(inst)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2627, in _return
[rank6]:     self.output.compile_subgraph(
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1099, in compile_subgraph
[rank6]:     self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root)
[rank6]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank6]:     return func(*args, **kwds)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1319, in compile_and_call_fx_graph
[rank6]:     compiled_fn = self.call_user_compiler(gm)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
[rank6]:     r = func(*args, **kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1410, in call_user_compiler
[rank6]:     raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1391, in call_user_compiler
[rank6]:     compiled_fn = compiler_fn(gm, self.example_inputs())
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/repro/after_dynamo.py", line 129, in __call__
[rank6]:     compiled_gm = compiler_fn(gm, example_inputs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/__init__.py", line 1956, in __call__
[rank6]:     return compile_fx(model_, inputs_, config_patches=self.config)
[rank6]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank6]:     return func(*args, **kwds)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 1261, in compile_fx
[rank6]:     return compile_fx(
[rank6]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank6]:     return func(*args, **kwds)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 1505, in compile_fx
[rank6]:     return aot_autograd(
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/backends/common.py", line 69, in __call__
[rank6]:     cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/aot_autograd.py", line 954, in aot_module_simplified
[rank6]:     compiled_fn, _ = create_aot_dispatcher_function(
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
[rank6]:     r = func(*args, **kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/aot_autograd.py", line 687, in create_aot_dispatcher_function
[rank6]:     compiled_fn, fw_metadata = compiler_fn(
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 242, in aot_dispatch_autograd
[rank6]:     fx_g, joint_inputs, maybe_subclass_meta = aot_dispatch_autograd_graph(
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/dispatch_and_compile_graph.py", line 276, in aot_dispatch_autograd_graph
[rank6]:     fx_g = _create_graph(joint_fn_to_trace, updated_joint_inputs, aot_config=aot_config)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/dispatch_and_compile_graph.py", line 46, in _create_graph
[rank6]:     fx_g = make_fx(
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 1421, in wrapped
[rank6]:     return make_fx_tracer.trace(f, *args)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 1367, in trace
[rank6]:     return self._trace_inner(f, *args)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 1354, in _trace_inner
[rank6]:     t = dispatch_trace(
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_compile.py", line 31, in inner
[rank6]:     return disable_fn(*args, **kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 600, in _fn
[rank6]:     return fn(*args, **kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 642, in dispatch_trace
[rank6]:     graph = tracer.trace(root, concrete_args)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 600, in _fn
[rank6]:     return fn(*args, **kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/_symbolic_trace.py", line 822, in trace
[rank6]:     (self.create_arg(fn(*args)),),
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/_symbolic_trace.py", line 675, in flatten_fn
[rank6]:     tree_out = root_fn(*tree_args)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 660, in wrapped
[rank6]:     out = f(*tensors)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 599, in joint_helper
[rank6]:     return _functionalized_f_helper(primals, tangents)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 388, in _functionalized_f_helper
[rank6]:     f_outs = fn(*f_args)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 256, in inner_fn_with_anomaly
[rank6]:     return inner_fn(*args)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 241, in inner_fn
[rank6]:     backward_out = torch.autograd.grad(
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 385, in grad
[rank6]:     return handle_torch_function(
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/overrides.py", line 1642, in handle_torch_function
[rank6]:     result = mode.__torch_function__(public_api, types, args, kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 705, in __torch_function__
[rank6]:     return func(*args, **kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 436, in grad
[rank6]:     result = _engine_run_backward(
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py", line 769, in _engine_run_backward
[rank6]:     return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 1116, in unpack_hook
[rank6]:     frame.recompute_fn(*args)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 1388, in recompute_fn
[rank6]:     with torch.random.fork_rng(
[rank6]:   File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
[rank6]:     self.gen.throw(typ, value, traceback)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/random.py", line 183, in fork_rng
[rank6]:     device_mod.set_rng_state(device_rng_state, device)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/cuda/random.py", line 62, in set_rng_state
[rank6]:     new_state_copy = new_state.clone(memory_format=torch.contiguous_format)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/functional_tensor.py", line 468, in __torch_dispatch__
[rank6]:     outs_unwrapped = func._op_dk(
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_stats.py", line 21, in wrapper
[rank6]:     return fn(*args, **kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 755, in __torch_dispatch__
[rank6]:     return self.inner_torch_dispatch(func, types, args, kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 790, in inner_torch_dispatch
[rank6]:     return proxy_call(self, func, self.pre_dispatch, args, kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 467, in proxy_call
[rank6]:     out = func(*args, **kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_ops.py", line 667, in __call__
[rank6]:     return self_._op(*args, **kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_stats.py", line 21, in wrapper
[rank6]:     return fn(*args, **kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1061, in __torch_dispatch__
[rank6]:     return self.dispatch(func, types, args, kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1450, in dispatch
[rank6]:     return self._cached_dispatch_impl(func, types, args, kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1153, in _cached_dispatch_impl
[rank6]:     output = self._dispatch_impl(func, types, args, kwargs)
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1539, in _dispatch_impl
[rank6]:     (flat_args, flat_arg_fake_tensors) = self.validate_and_convert_non_fake_tensors(
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1832, in validate_and_convert_non_fake_tensors
[rank6]:     validated_args = [validate(a) for a in flat_args]
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1832, in <listcomp>
[rank6]:     validated_args = [validate(a) for a in flat_args]
[rank6]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1822, in validate
[rank6]:     raise AssertionError(
[rank6]: torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
[rank6]: AssertionError: Please convert all Tensors to FakeTensors first or instantiate FakeTensorMode with 'allow_non_fake_inputs'. Found in aten.clone.default(tensor([...], size=(16,), dtype=torch.uint8), memory_format=torch.contiguous_format)

[rank6]: Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information


[rank6]: You can suppress this exception and fall back to eager by setting:
[rank6]:     import torch._dynamo
[rank6]:     torch._dynamo.config.suppress_errors = True

/usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py:769: UserWarning: Error detected in MulBackward0. Traceback of forward call that caused the error:
  File "/home/wuxk-code/FastVideo-main/fastvideo/models/hunyuan/modules/norm_layers.py", line 60, in forward
    output = output * self.weight
 (Triggered internally at /home/pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:111.)
  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
/usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py:769: UserWarning: Error detected in MulBackward0. Traceback of forward call that caused the error:
  File "/home/wuxk-code/FastVideo-main/fastvideo/models/hunyuan/modules/norm_layers.py", line 60, in forward
    output = output * self.weight
 (Triggered internally at /home/pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:111.)
  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
/usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py:769: UserWarning: Error detected in MulBackward0. Traceback of forward call that caused the error:
  File "/home/wuxk-code/FastVideo-main/fastvideo/models/hunyuan/modules/norm_layers.py", line 60, in forward
    output = output * self.weight
 (Triggered internally at /home/pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:111.)
  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
[rank2]: Traceback (most recent call last):
[rank2]:   File "/home/wuxk-code/FastVideo-main/fastvideo/train.py", line 693, in <module>
[rank2]:     main(args)
[rank2]:   File "/home/wuxk-code/FastVideo-main/fastvideo/train.py", line 398, in main
[rank2]:     loss, grad_norm = train_one_step(
[rank2]:   File "/home/wuxk-code/FastVideo-main/fastvideo/train.py", line 146, in train_one_step
[rank2]:     model_pred = transformer(**input_kwargs)[0]
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank2]:     return self._call_impl(*args, **kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank2]:     return forward_call(*args, **kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 863, in forward
[rank2]:     output = self._fsdp_wrapped_module(*args, **kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank2]:     return self._call_impl(*args, **kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank2]:     return forward_call(*args, **kwargs)
[rank2]:   File "/home/wuxk-code/FastVideo-main/fastvideo/models/hunyuan/modules/models.py", line 572, in forward
[rank2]:     img, txt = block(*double_block_args)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank2]:     return self._call_impl(*args, **kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank2]:     return forward_call(*args, **kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 863, in forward
[rank2]:     output = self._fsdp_wrapped_module(*args, **kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank2]:     return self._call_impl(*args, **kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank2]:     return forward_call(*args, **kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/algorithms/_checkpoint/checkpoint_wrapper.py", line 169, in forward
[rank2]:     return self.checkpoint_fn(  # type: ignore[misc]
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_compile.py", line 31, in inner
[rank2]:     return disable_fn(*args, **kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 600, in _fn
[rank2]:     return fn(*args, **kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 488, in checkpoint
[rank2]:     ret = function(*args, **kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank2]:     return self._call_impl(*args, **kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank2]:     return forward_call(*args, **kwargs)
[rank2]:   File "/home/wuxk-code/FastVideo-main/fastvideo/models/hunyuan/modules/models.py", line 137, in forward
[rank2]:     img_q = self.img_attn_q_norm(img_q).to(img_v)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank2]:     return self._call_impl(*args, **kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank2]:     return forward_call(*args, **kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 433, in _fn
[rank2]:     return fn(*args, **kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 1116, in __call__
[rank2]:     return self._torchdynamo_orig_callable(
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 948, in __call__
[rank2]:     result = self._inner_convert(
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 472, in __call__
[rank2]:     return _compile(
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_utils_internal.py", line 84, in wrapper_function
[rank2]:     return StrobelightCompileTimeProfiler.profile_compile_time(
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_strobelight/compile_time_profiler.py", line 129, in profile_compile_time
[rank2]:     return func(*args, **kwargs)
[rank2]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank2]:     return func(*args, **kwds)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 817, in _compile
[rank2]:     guarded_code = compile_inner(code, one_graph, hooks, transform)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
[rank2]:     r = func(*args, **kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 636, in compile_inner
[rank2]:     out_code = transform_code_object(code, transform)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/bytecode_transformation.py", line 1185, in transform_code_object
[rank2]:     transformations(instructions, code_options)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 178, in _fn
[rank2]:     return fn(*args, **kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 582, in transform
[rank2]:     tracer.run()
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2451, in run
[rank2]:     super().run()
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 893, in run
[rank2]:     while self.step():
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 805, in step
[rank2]:     self.dispatch_table[inst.opcode](self, inst)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2642, in RETURN_VALUE
[rank2]:     self._return(inst)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2627, in _return
[rank2]:     self.output.compile_subgraph(
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1099, in compile_subgraph
[rank2]:     self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root)
[rank2]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank2]:     return func(*args, **kwds)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1319, in compile_and_call_fx_graph
[rank2]:     compiled_fn = self.call_user_compiler(gm)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
[rank2]:     r = func(*args, **kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1410, in call_user_compiler
[rank2]:     raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1391, in call_user_compiler
[rank2]:     compiled_fn = compiler_fn(gm, self.example_inputs())
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/repro/after_dynamo.py", line 129, in __call__
[rank2]:     compiled_gm = compiler_fn(gm, example_inputs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/__init__.py", line 1956, in __call__
[rank2]:     return compile_fx(model_, inputs_, config_patches=self.config)
[rank2]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank2]:     return func(*args, **kwds)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 1261, in compile_fx
[rank2]:     return compile_fx(
[rank2]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank2]:     return func(*args, **kwds)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 1505, in compile_fx
[rank2]:     return aot_autograd(
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/backends/common.py", line 69, in __call__
[rank2]:     cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/aot_autograd.py", line 954, in aot_module_simplified
[rank2]:     compiled_fn, _ = create_aot_dispatcher_function(
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
[rank2]:     r = func(*args, **kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/aot_autograd.py", line 687, in create_aot_dispatcher_function
[rank2]:     compiled_fn, fw_metadata = compiler_fn(
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 242, in aot_dispatch_autograd
[rank2]:     fx_g, joint_inputs, maybe_subclass_meta = aot_dispatch_autograd_graph(
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/dispatch_and_compile_graph.py", line 276, in aot_dispatch_autograd_graph
[rank2]:     fx_g = _create_graph(joint_fn_to_trace, updated_joint_inputs, aot_config=aot_config)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/dispatch_and_compile_graph.py", line 46, in _create_graph
[rank2]:     fx_g = make_fx(
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 1421, in wrapped
[rank2]:     return make_fx_tracer.trace(f, *args)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 1367, in trace
[rank2]:     return self._trace_inner(f, *args)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 1354, in _trace_inner
[rank2]:     t = dispatch_trace(
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_compile.py", line 31, in inner
[rank2]:     return disable_fn(*args, **kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 600, in _fn
[rank2]:     return fn(*args, **kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 642, in dispatch_trace
[rank2]:     graph = tracer.trace(root, concrete_args)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 600, in _fn
[rank2]:     return fn(*args, **kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/_symbolic_trace.py", line 822, in trace
[rank2]:     (self.create_arg(fn(*args)),),
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/_symbolic_trace.py", line 675, in flatten_fn
[rank2]:     tree_out = root_fn(*tree_args)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 660, in wrapped
[rank2]:     out = f(*tensors)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 599, in joint_helper
[rank2]:     return _functionalized_f_helper(primals, tangents)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 388, in _functionalized_f_helper
[rank2]:     f_outs = fn(*f_args)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 256, in inner_fn_with_anomaly
[rank2]:     return inner_fn(*args)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 241, in inner_fn
[rank2]:     backward_out = torch.autograd.grad(
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 385, in grad
[rank2]:     return handle_torch_function(
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/overrides.py", line 1642, in handle_torch_function
[rank2]:     result = mode.__torch_function__(public_api, types, args, kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 705, in __torch_function__
[rank2]:     return func(*args, **kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 436, in grad
[rank2]:     result = _engine_run_backward(
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py", line 769, in _engine_run_backward
[rank2]:     return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 1116, in unpack_hook
[rank2]:     frame.recompute_fn(*args)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 1388, in recompute_fn
[rank2]:     with torch.random.fork_rng(
[rank2]:   File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
[rank2]:     self.gen.throw(typ, value, traceback)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/random.py", line 183, in fork_rng
[rank2]:     device_mod.set_rng_state(device_rng_state, device)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/cuda/random.py", line 62, in set_rng_state
[rank2]:     new_state_copy = new_state.clone(memory_format=torch.contiguous_format)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/functional_tensor.py", line 468, in __torch_dispatch__
[rank2]:     outs_unwrapped = func._op_dk(
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_stats.py", line 21, in wrapper
[rank2]:     return fn(*args, **kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 755, in __torch_dispatch__
[rank2]:     return self.inner_torch_dispatch(func, types, args, kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 790, in inner_torch_dispatch
[rank2]:     return proxy_call(self, func, self.pre_dispatch, args, kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 467, in proxy_call
[rank2]:     out = func(*args, **kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_ops.py", line 667, in __call__
[rank2]:     return self_._op(*args, **kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_stats.py", line 21, in wrapper
[rank2]:     return fn(*args, **kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1061, in __torch_dispatch__
[rank2]:     return self.dispatch(func, types, args, kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1450, in dispatch
[rank2]:     return self._cached_dispatch_impl(func, types, args, kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1153, in _cached_dispatch_impl
[rank2]:     output = self._dispatch_impl(func, types, args, kwargs)
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1539, in _dispatch_impl
[rank2]:     (flat_args, flat_arg_fake_tensors) = self.validate_and_convert_non_fake_tensors(
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1832, in validate_and_convert_non_fake_tensors
[rank2]:     validated_args = [validate(a) for a in flat_args]
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1832, in <listcomp>
[rank2]:     validated_args = [validate(a) for a in flat_args]
[rank2]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1822, in validate
[rank2]:     raise AssertionError(
[rank2]: torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
[rank2]: AssertionError: Please convert all Tensors to FakeTensors first or instantiate FakeTensorMode with 'allow_non_fake_inputs'. Found in aten.clone.default(tensor([...], size=(16,), dtype=torch.uint8), memory_format=torch.contiguous_format)

[rank2]: Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information


[rank2]: You can suppress this exception and fall back to eager by setting:
[rank2]:     import torch._dynamo
[rank2]:     torch._dynamo.config.suppress_errors = True

[rank4]: Traceback (most recent call last):
[rank4]:   File "/home/wuxk-code/FastVideo-main/fastvideo/train.py", line 693, in <module>
[rank4]:     main(args)
[rank4]:   File "/home/wuxk-code/FastVideo-main/fastvideo/train.py", line 398, in main
[rank4]:     loss, grad_norm = train_one_step(
[rank4]:   File "/home/wuxk-code/FastVideo-main/fastvideo/train.py", line 146, in train_one_step
[rank4]:     model_pred = transformer(**input_kwargs)[0]
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank4]:     return self._call_impl(*args, **kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank4]:     return forward_call(*args, **kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 863, in forward
[rank4]:     output = self._fsdp_wrapped_module(*args, **kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank4]:     return self._call_impl(*args, **kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank4]:     return forward_call(*args, **kwargs)
[rank4]:   File "/home/wuxk-code/FastVideo-main/fastvideo/models/hunyuan/modules/models.py", line 572, in forward
[rank4]:     img, txt = block(*double_block_args)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank4]:     return self._call_impl(*args, **kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank4]:     return forward_call(*args, **kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 863, in forward
[rank4]:     output = self._fsdp_wrapped_module(*args, **kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank4]:     return self._call_impl(*args, **kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank4]:     return forward_call(*args, **kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/algorithms/_checkpoint/checkpoint_wrapper.py", line 169, in forward
[rank4]:     return self.checkpoint_fn(  # type: ignore[misc]
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_compile.py", line 31, in inner
[rank4]:     return disable_fn(*args, **kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 600, in _fn
[rank4]:     return fn(*args, **kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 488, in checkpoint
[rank4]:     ret = function(*args, **kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank4]:     return self._call_impl(*args, **kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank4]:     return forward_call(*args, **kwargs)
[rank4]:   File "/home/wuxk-code/FastVideo-main/fastvideo/models/hunyuan/modules/models.py", line 137, in forward
[rank4]:     img_q = self.img_attn_q_norm(img_q).to(img_v)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank4]:     return self._call_impl(*args, **kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank4]:     return forward_call(*args, **kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 433, in _fn
[rank4]:     return fn(*args, **kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 1116, in __call__
[rank4]:     return self._torchdynamo_orig_callable(
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 948, in __call__
[rank4]:     result = self._inner_convert(
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 472, in __call__
[rank4]:     return _compile(
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_utils_internal.py", line 84, in wrapper_function
[rank4]:     return StrobelightCompileTimeProfiler.profile_compile_time(
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_strobelight/compile_time_profiler.py", line 129, in profile_compile_time
[rank4]:     return func(*args, **kwargs)
[rank4]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank4]:     return func(*args, **kwds)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 817, in _compile
[rank4]:     guarded_code = compile_inner(code, one_graph, hooks, transform)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
[rank4]:     r = func(*args, **kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 636, in compile_inner
[rank4]:     out_code = transform_code_object(code, transform)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/bytecode_transformation.py", line 1185, in transform_code_object
[rank4]:     transformations(instructions, code_options)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 178, in _fn
[rank4]:     return fn(*args, **kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 582, in transform
[rank4]:     tracer.run()
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2451, in run
[rank4]:     super().run()
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 893, in run
[rank4]:     while self.step():
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 805, in step
[rank4]:     self.dispatch_table[inst.opcode](self, inst)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2642, in RETURN_VALUE
[rank4]:     self._return(inst)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2627, in _return
[rank4]:     self.output.compile_subgraph(
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1099, in compile_subgraph
[rank4]:     self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root)
[rank4]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank4]:     return func(*args, **kwds)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1319, in compile_and_call_fx_graph
[rank4]:     compiled_fn = self.call_user_compiler(gm)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
[rank4]:     r = func(*args, **kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1410, in call_user_compiler
[rank4]:     raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1391, in call_user_compiler
[rank4]:     compiled_fn = compiler_fn(gm, self.example_inputs())
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/repro/after_dynamo.py", line 129, in __call__
[rank4]:     compiled_gm = compiler_fn(gm, example_inputs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/__init__.py", line 1956, in __call__
[rank4]:     return compile_fx(model_, inputs_, config_patches=self.config)
[rank4]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank4]:     return func(*args, **kwds)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 1261, in compile_fx
[rank4]:     return compile_fx(
[rank4]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank4]:     return func(*args, **kwds)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 1505, in compile_fx
[rank4]:     return aot_autograd(
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/backends/common.py", line 69, in __call__
[rank4]:     cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/aot_autograd.py", line 954, in aot_module_simplified
[rank4]:     compiled_fn, _ = create_aot_dispatcher_function(
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
[rank4]:     r = func(*args, **kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/aot_autograd.py", line 687, in create_aot_dispatcher_function
[rank4]:     compiled_fn, fw_metadata = compiler_fn(
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 242, in aot_dispatch_autograd
[rank4]:     fx_g, joint_inputs, maybe_subclass_meta = aot_dispatch_autograd_graph(
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/dispatch_and_compile_graph.py", line 276, in aot_dispatch_autograd_graph
[rank4]:     fx_g = _create_graph(joint_fn_to_trace, updated_joint_inputs, aot_config=aot_config)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/dispatch_and_compile_graph.py", line 46, in _create_graph
[rank4]:     fx_g = make_fx(
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 1421, in wrapped
[rank4]:     return make_fx_tracer.trace(f, *args)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 1367, in trace
[rank4]:     return self._trace_inner(f, *args)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 1354, in _trace_inner
[rank4]:     t = dispatch_trace(
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_compile.py", line 31, in inner
[rank4]:     return disable_fn(*args, **kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 600, in _fn
[rank4]:     return fn(*args, **kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 642, in dispatch_trace
[rank4]:     graph = tracer.trace(root, concrete_args)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 600, in _fn
[rank4]:     return fn(*args, **kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/_symbolic_trace.py", line 822, in trace
[rank4]:     (self.create_arg(fn(*args)),),
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/_symbolic_trace.py", line 675, in flatten_fn
[rank4]:     tree_out = root_fn(*tree_args)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 660, in wrapped
[rank4]:     out = f(*tensors)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 599, in joint_helper
[rank4]:     return _functionalized_f_helper(primals, tangents)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 388, in _functionalized_f_helper
[rank4]:     f_outs = fn(*f_args)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 256, in inner_fn_with_anomaly
[rank4]:     return inner_fn(*args)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 241, in inner_fn
[rank4]:     backward_out = torch.autograd.grad(
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 385, in grad
[rank4]:     return handle_torch_function(
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/overrides.py", line 1642, in handle_torch_function
[rank4]:     result = mode.__torch_function__(public_api, types, args, kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 705, in __torch_function__
[rank4]:     return func(*args, **kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 436, in grad
[rank4]:     result = _engine_run_backward(
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py", line 769, in _engine_run_backward
[rank4]:     return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 1116, in unpack_hook
[rank4]:     frame.recompute_fn(*args)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 1388, in recompute_fn
[rank4]:     with torch.random.fork_rng(
[rank4]:   File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
[rank4]:     self.gen.throw(typ, value, traceback)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/random.py", line 183, in fork_rng
[rank4]:     device_mod.set_rng_state(device_rng_state, device)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/cuda/random.py", line 62, in set_rng_state
[rank4]:     new_state_copy = new_state.clone(memory_format=torch.contiguous_format)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/functional_tensor.py", line 468, in __torch_dispatch__
[rank4]:     outs_unwrapped = func._op_dk(
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_stats.py", line 21, in wrapper
[rank4]:     return fn(*args, **kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 755, in __torch_dispatch__
[rank4]:     return self.inner_torch_dispatch(func, types, args, kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 790, in inner_torch_dispatch
[rank4]:     return proxy_call(self, func, self.pre_dispatch, args, kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 467, in proxy_call
[rank4]:     out = func(*args, **kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_ops.py", line 667, in __call__
[rank4]:     return self_._op(*args, **kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_stats.py", line 21, in wrapper
[rank4]:     return fn(*args, **kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1061, in __torch_dispatch__
[rank4]:     return self.dispatch(func, types, args, kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1450, in dispatch
[rank4]:     return self._cached_dispatch_impl(func, types, args, kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1153, in _cached_dispatch_impl
[rank4]:     output = self._dispatch_impl(func, types, args, kwargs)
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1539, in _dispatch_impl
[rank4]:     (flat_args, flat_arg_fake_tensors) = self.validate_and_convert_non_fake_tensors(
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1832, in validate_and_convert_non_fake_tensors
[rank4]:     validated_args = [validate(a) for a in flat_args]
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1832, in <listcomp>
[rank4]:     validated_args = [validate(a) for a in flat_args]
[rank4]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1822, in validate
[rank4]:     raise AssertionError(
[rank4]: torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
[rank4]: AssertionError: Please convert all Tensors to FakeTensors first or instantiate FakeTensorMode with 'allow_non_fake_inputs'. Found in aten.clone.default(tensor([...], size=(16,), dtype=torch.uint8), memory_format=torch.contiguous_format)

[rank4]: Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information


[rank4]: You can suppress this exception and fall back to eager by setting:
[rank4]:     import torch._dynamo
[rank4]:     torch._dynamo.config.suppress_errors = True

/usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py:769: UserWarning: Error detected in MulBackward0. Traceback of forward call that caused the error:
  File "/home/wuxk-code/FastVideo-main/fastvideo/models/hunyuan/modules/norm_layers.py", line 60, in forward
    output = output * self.weight
 (Triggered internally at /home/pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:111.)
  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
[rank3]: Traceback (most recent call last):
[rank3]:   File "/home/wuxk-code/FastVideo-main/fastvideo/train.py", line 693, in <module>
[rank3]:     main(args)
[rank3]:   File "/home/wuxk-code/FastVideo-main/fastvideo/train.py", line 398, in main
[rank3]:     loss, grad_norm = train_one_step(
[rank3]:   File "/home/wuxk-code/FastVideo-main/fastvideo/train.py", line 146, in train_one_step
[rank3]:     model_pred = transformer(**input_kwargs)[0]
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank3]:     return self._call_impl(*args, **kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank3]:     return forward_call(*args, **kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 863, in forward
[rank3]:     output = self._fsdp_wrapped_module(*args, **kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank3]:     return self._call_impl(*args, **kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank3]:     return forward_call(*args, **kwargs)
[rank3]:   File "/home/wuxk-code/FastVideo-main/fastvideo/models/hunyuan/modules/models.py", line 572, in forward
[rank3]:     img, txt = block(*double_block_args)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank3]:     return self._call_impl(*args, **kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank3]:     return forward_call(*args, **kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 863, in forward
[rank3]:     output = self._fsdp_wrapped_module(*args, **kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank3]:     return self._call_impl(*args, **kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank3]:     return forward_call(*args, **kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/algorithms/_checkpoint/checkpoint_wrapper.py", line 169, in forward
[rank3]:     return self.checkpoint_fn(  # type: ignore[misc]
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_compile.py", line 31, in inner
[rank3]:     return disable_fn(*args, **kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 600, in _fn
[rank3]:     return fn(*args, **kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 488, in checkpoint
[rank3]:     ret = function(*args, **kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank3]:     return self._call_impl(*args, **kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank3]:     return forward_call(*args, **kwargs)
[rank3]:   File "/home/wuxk-code/FastVideo-main/fastvideo/models/hunyuan/modules/models.py", line 137, in forward
[rank3]:     img_q = self.img_attn_q_norm(img_q).to(img_v)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank3]:     return self._call_impl(*args, **kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank3]:     return forward_call(*args, **kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 433, in _fn
[rank3]:     return fn(*args, **kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 1116, in __call__
[rank3]:     return self._torchdynamo_orig_callable(
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 948, in __call__
[rank3]:     result = self._inner_convert(
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 472, in __call__
[rank3]:     return _compile(
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_utils_internal.py", line 84, in wrapper_function
[rank3]:     return StrobelightCompileTimeProfiler.profile_compile_time(
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_strobelight/compile_time_profiler.py", line 129, in profile_compile_time
[rank3]:     return func(*args, **kwargs)
[rank3]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank3]:     return func(*args, **kwds)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 817, in _compile
[rank3]:     guarded_code = compile_inner(code, one_graph, hooks, transform)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
[rank3]:     r = func(*args, **kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 636, in compile_inner
[rank3]:     out_code = transform_code_object(code, transform)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/bytecode_transformation.py", line 1185, in transform_code_object
[rank3]:     transformations(instructions, code_options)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 178, in _fn
[rank3]:     return fn(*args, **kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 582, in transform
[rank3]:     tracer.run()
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2451, in run
[rank3]:     super().run()
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 893, in run
[rank3]:     while self.step():
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 805, in step
[rank3]:     self.dispatch_table[inst.opcode](self, inst)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2642, in RETURN_VALUE
[rank3]:     self._return(inst)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2627, in _return
[rank3]:     self.output.compile_subgraph(
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1099, in compile_subgraph
[rank3]:     self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root)
[rank3]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank3]:     return func(*args, **kwds)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1319, in compile_and_call_fx_graph
[rank3]:     compiled_fn = self.call_user_compiler(gm)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
[rank3]:     r = func(*args, **kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1410, in call_user_compiler
[rank3]:     raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1391, in call_user_compiler
[rank3]:     compiled_fn = compiler_fn(gm, self.example_inputs())
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/repro/after_dynamo.py", line 129, in __call__
[rank3]:     compiled_gm = compiler_fn(gm, example_inputs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/__init__.py", line 1956, in __call__
[rank3]:     return compile_fx(model_, inputs_, config_patches=self.config)
[rank3]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank3]:     return func(*args, **kwds)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 1261, in compile_fx
[rank3]:     return compile_fx(
[rank3]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank3]:     return func(*args, **kwds)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 1505, in compile_fx
[rank3]:     return aot_autograd(
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/backends/common.py", line 69, in __call__
[rank3]:     cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/aot_autograd.py", line 954, in aot_module_simplified
[rank3]:     compiled_fn, _ = create_aot_dispatcher_function(
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
[rank3]:     r = func(*args, **kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/aot_autograd.py", line 687, in create_aot_dispatcher_function
[rank3]:     compiled_fn, fw_metadata = compiler_fn(
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 242, in aot_dispatch_autograd
[rank3]:     fx_g, joint_inputs, maybe_subclass_meta = aot_dispatch_autograd_graph(
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/dispatch_and_compile_graph.py", line 276, in aot_dispatch_autograd_graph
[rank3]:     fx_g = _create_graph(joint_fn_to_trace, updated_joint_inputs, aot_config=aot_config)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/dispatch_and_compile_graph.py", line 46, in _create_graph
[rank3]:     fx_g = make_fx(
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 1421, in wrapped
[rank3]:     return make_fx_tracer.trace(f, *args)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 1367, in trace
[rank3]:     return self._trace_inner(f, *args)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 1354, in _trace_inner
[rank3]:     t = dispatch_trace(
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_compile.py", line 31, in inner
[rank3]:     return disable_fn(*args, **kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 600, in _fn
[rank3]:     return fn(*args, **kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 642, in dispatch_trace
[rank3]:     graph = tracer.trace(root, concrete_args)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 600, in _fn
[rank3]:     return fn(*args, **kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/_symbolic_trace.py", line 822, in trace
[rank3]:     (self.create_arg(fn(*args)),),
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/_symbolic_trace.py", line 675, in flatten_fn
[rank3]:     tree_out = root_fn(*tree_args)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 660, in wrapped
[rank3]:     out = f(*tensors)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 599, in joint_helper
[rank3]:     return _functionalized_f_helper(primals, tangents)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 388, in _functionalized_f_helper
[rank3]:     f_outs = fn(*f_args)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 256, in inner_fn_with_anomaly
[rank3]:     return inner_fn(*args)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 241, in inner_fn
[rank3]:     backward_out = torch.autograd.grad(
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 385, in grad
[rank3]:     return handle_torch_function(
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/overrides.py", line 1642, in handle_torch_function
[rank3]:     result = mode.__torch_function__(public_api, types, args, kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 705, in __torch_function__
[rank3]:     return func(*args, **kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 436, in grad
[rank3]:     result = _engine_run_backward(
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py", line 769, in _engine_run_backward
[rank3]:     return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 1116, in unpack_hook
[rank3]:     frame.recompute_fn(*args)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 1388, in recompute_fn
[rank3]:     with torch.random.fork_rng(
[rank3]:   File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
[rank3]:     self.gen.throw(typ, value, traceback)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/random.py", line 183, in fork_rng
[rank3]:     device_mod.set_rng_state(device_rng_state, device)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/cuda/random.py", line 62, in set_rng_state
[rank3]:     new_state_copy = new_state.clone(memory_format=torch.contiguous_format)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/functional_tensor.py", line 468, in __torch_dispatch__
[rank3]:     outs_unwrapped = func._op_dk(
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_stats.py", line 21, in wrapper
[rank3]:     return fn(*args, **kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 755, in __torch_dispatch__
[rank3]:     return self.inner_torch_dispatch(func, types, args, kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 790, in inner_torch_dispatch
[rank3]:     return proxy_call(self, func, self.pre_dispatch, args, kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 467, in proxy_call
[rank3]:     out = func(*args, **kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_ops.py", line 667, in __call__
[rank3]:     return self_._op(*args, **kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_stats.py", line 21, in wrapper
[rank3]:     return fn(*args, **kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1061, in __torch_dispatch__
[rank3]:     return self.dispatch(func, types, args, kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1450, in dispatch
[rank3]:     return self._cached_dispatch_impl(func, types, args, kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1153, in _cached_dispatch_impl
[rank3]:     output = self._dispatch_impl(func, types, args, kwargs)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1539, in _dispatch_impl
[rank3]:     (flat_args, flat_arg_fake_tensors) = self.validate_and_convert_non_fake_tensors(
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1832, in validate_and_convert_non_fake_tensors
[rank3]:     validated_args = [validate(a) for a in flat_args]
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1832, in <listcomp>
[rank3]:     validated_args = [validate(a) for a in flat_args]
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1822, in validate
[rank3]:     raise AssertionError(
[rank3]: torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
[rank3]: AssertionError: Please convert all Tensors to FakeTensors first or instantiate FakeTensorMode with 'allow_non_fake_inputs'. Found in aten.clone.default(tensor([...], size=(16,), dtype=torch.uint8), memory_format=torch.contiguous_format)

[rank3]: Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information


[rank3]: You can suppress this exception and fall back to eager by setting:
[rank3]:     import torch._dynamo
[rank3]:     torch._dynamo.config.suppress_errors = True

[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/wuxk-code/FastVideo-main/fastvideo/train.py", line 693, in <module>
[rank0]:     main(args)
[rank0]:   File "/home/wuxk-code/FastVideo-main/fastvideo/train.py", line 398, in main
[rank0]:     loss, grad_norm = train_one_step(
[rank0]:   File "/home/wuxk-code/FastVideo-main/fastvideo/train.py", line 146, in train_one_step
[rank0]:     model_pred = transformer(**input_kwargs)[0]
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 863, in forward
[rank0]:     output = self._fsdp_wrapped_module(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/home/wuxk-code/FastVideo-main/fastvideo/models/hunyuan/modules/models.py", line 572, in forward
[rank0]:     img, txt = block(*double_block_args)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 863, in forward
[rank0]:     output = self._fsdp_wrapped_module(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/algorithms/_checkpoint/checkpoint_wrapper.py", line 169, in forward
[rank0]:     return self.checkpoint_fn(  # type: ignore[misc]
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_compile.py", line 31, in inner
[rank0]:     return disable_fn(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 600, in _fn
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 488, in checkpoint
[rank0]:     ret = function(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/home/wuxk-code/FastVideo-main/fastvideo/models/hunyuan/modules/models.py", line 137, in forward
[rank0]:     img_q = self.img_attn_q_norm(img_q).to(img_v)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 433, in _fn
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 1116, in __call__
[rank0]:     return self._torchdynamo_orig_callable(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 948, in __call__
[rank0]:     result = self._inner_convert(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 472, in __call__
[rank0]:     return _compile(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_utils_internal.py", line 84, in wrapper_function
[rank0]:     return StrobelightCompileTimeProfiler.profile_compile_time(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_strobelight/compile_time_profiler.py", line 129, in profile_compile_time
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank0]:     return func(*args, **kwds)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 817, in _compile
[rank0]:     guarded_code = compile_inner(code, one_graph, hooks, transform)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
[rank0]:     r = func(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 636, in compile_inner
[rank0]:     out_code = transform_code_object(code, transform)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/bytecode_transformation.py", line 1185, in transform_code_object
[rank0]:     transformations(instructions, code_options)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 178, in _fn
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 582, in transform
[rank0]:     tracer.run()
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2451, in run
[rank0]:     super().run()
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 893, in run
[rank0]:     while self.step():
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 805, in step
[rank0]:     self.dispatch_table[inst.opcode](self, inst)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2642, in RETURN_VALUE
[rank0]:     self._return(inst)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2627, in _return
[rank0]:     self.output.compile_subgraph(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1099, in compile_subgraph
[rank0]:     self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root)
[rank0]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank0]:     return func(*args, **kwds)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1319, in compile_and_call_fx_graph
[rank0]:     compiled_fn = self.call_user_compiler(gm)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
[rank0]:     r = func(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1410, in call_user_compiler
[rank0]:     raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1391, in call_user_compiler
[rank0]:     compiled_fn = compiler_fn(gm, self.example_inputs())
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/repro/after_dynamo.py", line 129, in __call__
[rank0]:     compiled_gm = compiler_fn(gm, example_inputs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/__init__.py", line 1956, in __call__
[rank0]:     return compile_fx(model_, inputs_, config_patches=self.config)
[rank0]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank0]:     return func(*args, **kwds)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 1261, in compile_fx
[rank0]:     return compile_fx(
[rank0]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank0]:     return func(*args, **kwds)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 1505, in compile_fx
[rank0]:     return aot_autograd(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/backends/common.py", line 69, in __call__
[rank0]:     cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/aot_autograd.py", line 954, in aot_module_simplified
[rank0]:     compiled_fn, _ = create_aot_dispatcher_function(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
[rank0]:     r = func(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/aot_autograd.py", line 687, in create_aot_dispatcher_function
[rank0]:     compiled_fn, fw_metadata = compiler_fn(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 242, in aot_dispatch_autograd
[rank0]:     fx_g, joint_inputs, maybe_subclass_meta = aot_dispatch_autograd_graph(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/dispatch_and_compile_graph.py", line 276, in aot_dispatch_autograd_graph
[rank0]:     fx_g = _create_graph(joint_fn_to_trace, updated_joint_inputs, aot_config=aot_config)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/dispatch_and_compile_graph.py", line 46, in _create_graph
[rank0]:     fx_g = make_fx(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 1421, in wrapped
[rank0]:     return make_fx_tracer.trace(f, *args)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 1367, in trace
[rank0]:     return self._trace_inner(f, *args)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 1354, in _trace_inner
[rank0]:     t = dispatch_trace(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_compile.py", line 31, in inner
[rank0]:     return disable_fn(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 600, in _fn
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 642, in dispatch_trace
[rank0]:     graph = tracer.trace(root, concrete_args)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 600, in _fn
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/_symbolic_trace.py", line 822, in trace
[rank0]:     (self.create_arg(fn(*args)),),
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/_symbolic_trace.py", line 675, in flatten_fn
[rank0]:     tree_out = root_fn(*tree_args)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 660, in wrapped
[rank0]:     out = f(*tensors)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 599, in joint_helper
[rank0]:     return _functionalized_f_helper(primals, tangents)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 388, in _functionalized_f_helper
[rank0]:     f_outs = fn(*f_args)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 256, in inner_fn_with_anomaly
[rank0]:     return inner_fn(*args)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/traced_function_transforms.py", line 241, in inner_fn
[rank0]:     backward_out = torch.autograd.grad(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 385, in grad
[rank0]:     return handle_torch_function(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/overrides.py", line 1642, in handle_torch_function
[rank0]:     result = mode.__torch_function__(public_api, types, args, kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 705, in __torch_function__
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 436, in grad
[rank0]:     result = _engine_run_backward(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py", line 769, in _engine_run_backward
[rank0]:     return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 1116, in unpack_hook
[rank0]:     frame.recompute_fn(*args)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 1388, in recompute_fn
[rank0]:     with torch.random.fork_rng(
[rank0]:   File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
[rank0]:     self.gen.throw(typ, value, traceback)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/random.py", line 183, in fork_rng
[rank0]:     device_mod.set_rng_state(device_rng_state, device)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/cuda/random.py", line 62, in set_rng_state
[rank0]:     new_state_copy = new_state.clone(memory_format=torch.contiguous_format)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/functional_tensor.py", line 468, in __torch_dispatch__
[rank0]:     outs_unwrapped = func._op_dk(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_stats.py", line 21, in wrapper
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 755, in __torch_dispatch__
[rank0]:     return self.inner_torch_dispatch(func, types, args, kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 790, in inner_torch_dispatch
[rank0]:     return proxy_call(self, func, self.pre_dispatch, args, kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/proxy_tensor.py", line 467, in proxy_call
[rank0]:     out = func(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_ops.py", line 667, in __call__
[rank0]:     return self_._op(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_stats.py", line 21, in wrapper
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1061, in __torch_dispatch__
[rank0]:     return self.dispatch(func, types, args, kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1450, in dispatch
[rank0]:     return self._cached_dispatch_impl(func, types, args, kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1153, in _cached_dispatch_impl
[rank0]:     output = self._dispatch_impl(func, types, args, kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1539, in _dispatch_impl
[rank0]:     (flat_args, flat_arg_fake_tensors) = self.validate_and_convert_non_fake_tensors(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1832, in validate_and_convert_non_fake_tensors
[rank0]:     validated_args = [validate(a) for a in flat_args]
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1832, in <listcomp>
[rank0]:     validated_args = [validate(a) for a in flat_args]
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 1822, in validate
[rank0]:     raise AssertionError(
[rank0]: torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
[rank0]: AssertionError: Please convert all Tensors to FakeTensors first or instantiate FakeTensorMode with 'allow_non_fake_inputs'. Found in aten.clone.default(tensor([...], size=(16,), dtype=torch.uint8), memory_format=torch.contiguous_format)

[rank0]: Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information


[rank0]: You can suppress this exception and fall back to eager by setting:
[rank0]:     import torch._dynamo
[rank0]:     torch._dynamo.config.suppress_errors = True

Steps:   0%|          | 0/12 [00:14<?, ?it/s]
W0528 15:36:41.451000 139759071237952 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 276211 closing signal SIGTERM
W0528 15:36:41.452000 139759071237952 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 276213 closing signal SIGTERM
W0528 15:36:41.452000 139759071237952 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 276214 closing signal SIGTERM
W0528 15:36:41.452000 139759071237952 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 276215 closing signal SIGTERM
W0528 15:36:41.452000 139759071237952 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 276216 closing signal SIGTERM
W0528 15:36:41.452000 139759071237952 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 276217 closing signal SIGTERM
W0528 15:36:41.452000 139759071237952 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 276218 closing signal SIGTERM
E0528 15:36:41.800000 139759071237952 torch/distributed/elastic/multiprocessing/api.py:833] failed (exitcode: 1) local_rank: 1 (pid: 276212) of binary: /usr/bin/python3
Traceback (most recent call last):
  File "/usr/local/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 348, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 901, in main
    run(args)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 892, in run
    elastic_launch(
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 133, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
/home/wuxk-code/FastVideo-main/fastvideo/train.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2025-05-28_15:36:41
  host      : BW1000
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 276212)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================