多机训练报错如下
2024-11-06 14:27:53.726764: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled. 2024-11-06 14:27:53.729357: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled. 2024-11-06 14:27:53.729776: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled. 2024-11-06 14:27:53.741318: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled. 2024-11-06 14:27:53.746229: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled. 2024-11-06 14:27:53.849614: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled. 2024-11-06 14:27:53.854715: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled. MIOpen(HIP): Error [get_module] hipModuleLoad FAIL /opt/dtk-24.04.2/share/miopen/db//pool.kernel.hipfb MIOpen(HIP): Error [get_module] hipModuleLoad FAIL /opt/dtk-24.04.2/share/miopen/db//pool.kernel.hipfb MIOpen(HIP): Error [get_module] hipModuleLoad FAIL /opt/dtk-24.04.2/share/miopen/db//pool.kernel.hipfb MIOpen(HIP): Error [get_module] hipModuleLoad FAIL /opt/dtk-24.04.2/share/miopen/db//pool.kernel.hipfb MIOpen(HIP): Error [get_module] hipModuleLoad FAIL /opt/dtk-24.04.2/share/miopen/db//pool.kernel.hipfb MIOpen(HIP): Error [get_module] hipModuleLoad FAIL /opt/dtk-24.04.2/share/miopen/db//pool.kernel.hipfb