推理报错:RuntimeError: No HIP GPUs are available
使用docker方式进行环境配置并安装了依赖,进行推理命令执行报错:RuntimeError: No HIP GPUs are available
root@node2:/home/ssd_pytorch# python test.py --config-file configs/vgg_ssd300_coco2017_50_trainval.yaml
2025-06-04 16:13:31,743 SSD INFO: Using 1 GPUs
2025-06-04 16:13:31,744 SSD INFO: Namespace(config_file='configs/vgg_ssd300_coco2017_50_trainval.yaml', local_rank=0, ckpt=None, output_dir='eval_results', opts=[])
2025-06-04 16:13:31,744 SSD INFO: Loaded configuration file configs/vgg_ssd300_coco2017_50_trainval.yaml
2025-06-04 16:13:31,744 SSD INFO:
MODEL:
NUM_CLASSES: 81
PRIORS:
FEATURE_MAPS: [38, 19, 10, 5, 3, 1]
STRIDES: [8, 16, 32, 64, 100, 300]
MIN_SIZES: [21, 45, 99, 153, 207, 261]
MAX_SIZES: [45, 99, 153, 207, 261, 315]
ASPECT_RATIOS: [[2], [2, 3], [2, 3], [2, 3], [2], [2]]
BOXES_PER_LOCATION: [4, 6, 6, 6, 4, 4]
INPUT:
IMAGE_SIZE: 300
DATASETS:
TRAIN: ("coco2017_50_train", )
TEST: ("coco2017_50_val", )
SOLVER:
MAX_ITER: 400000
LR_STEPS: [280000, 360000]
GAMMA: 0.1
BATCH_SIZE: 32
LR: 1e-3
OUTPUT_DIR: 'outputs/vgg_ssd300_coco2017_trainval'
2025-06-04 16:13:31,744 SSD INFO: Running with config:
DATASETS:
TEST: ('coco2017_50_val',)
TRAIN: ('coco2017_50_train',)
DATA_LOADER:
NUM_WORKERS: 8
PIN_MEMORY: True
INPUT:
IMAGE_SIZE: 300
PIXEL_MEAN: [123, 117, 104]
MODEL:
BACKBONE:
NAME: vgg
OUT_CHANNELS: (512, 1024, 512, 256, 256, 256)
PRETRAINED: True
BOX_HEAD:
NAME: SSDBoxHead
PREDICTOR: SSDBoxPredictor
CENTER_VARIANCE: 0.1
DEVICE: cuda
META_ARCHITECTURE: SSDDetector
NEG_POS_RATIO: 3
NUM_CLASSES: 81
PRIORS:
ASPECT_RATIOS: [[2], [2, 3], [2, 3], [2, 3], [2], [2]]
BOXES_PER_LOCATION: [4, 6, 6, 6, 4, 4]
CLIP: True
FEATURE_MAPS: [38, 19, 10, 5, 3, 1]
MAX_SIZES: [45, 99, 153, 207, 261, 315]
MIN_SIZES: [21, 45, 99, 153, 207, 261]
STRIDES: [8, 16, 32, 64, 100, 300]
SIZE_VARIANCE: 0.2
THRESHOLD: 0.5
OUTPUT_DIR: outputs/vgg_ssd300_coco2017_trainval
SOLVER:
BATCH_SIZE: 32
GAMMA: 0.1
LR: 0.001
LR_STEPS: [280000, 360000]
MAX_ITER: 400000
MOMENTUM: 0.9
WARMUP_FACTOR: 0.3333333333333333
WARMUP_ITERS: 500
WEIGHT_DECAY: 0.0005
TEST:
BATCH_SIZE: 10
CONFIDENCE_THRESHOLD: 0.01
MAX_PER_CLASS: -1
MAX_PER_IMAGE: 100
NMS_THRESHOLD: 0.45
Traceback (most recent call last):
File "/home/ssd_pytorch/test.py", line 87, in <module>
main()
File "/home/ssd_pytorch/test.py", line 83, in main
evaluation(cfg, ckpt=args.ckpt, distributed=distributed)
File "/home/ssd_pytorch/test.py", line 23, in evaluation
model.to(device)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1160, in to
return self._apply(convert)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 833, in _apply
param_applied = fn(param)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1158, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "/usr/local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 298, in _lazy_init
torch._C._cuda_init()
RuntimeError: No HIP GPUs are available
root@node2:/home/ssd_pytorch#
DCU设备:
root@node2:/home/ssd_pytorch# rocm-smi
============================ System Management Interface =============================
======================================================================================
DCU Temp AvgPwr Perf PwrCap VRAM% DCU% Mode
0 56.0C 95.0W high 400.0W 0% 0% Normal
1 56.0C 98.0W high 400.0W 0% 0% Normal
2 56.0C 89.0W high 400.0W 0% 0% Normal
3 57.0C 93.0W high 400.0W 0% 0% Normal
4 59.0C 100.0W high 400.0W 0% 0% Normal
5 59.0C 102.0W high 400.0W 0% 0% Normal
6 58.0C 95.0W high 400.0W 0% 0% Normal
7 57.0C 96.0W high 400.0W 0% 0% Normal
======================================================================================
=================================== End of SMI Log ===================================
请问有什么解决办法吗