2024-10-29 10:22:36.670387581 [W:onnxruntime:, session_state.cc:1169 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-10-29 10:22:36.670406950 [W:onnxruntime:, session_state.cc:1171 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2024-10-29 10:23:39.494164276 [W:onnxruntime:, session_state.cc:1169 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-10-29 10:23:39.494180436 [W:onnxruntime:, session_state.cc:1171 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
INFO:BackendDCU:Batch size is 4, QPS: 375, Avg Latency:10.66, Tail Latency:12.69
2024-10-29 10:23:41.914490180 [W:onnxruntime:, session_state.cc:1169 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-10-29 10:23:41.914506778 [W:onnxruntime:, session_state.cc:1171 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
INFO:BackendDCU:Batch size is 8, QPS: 448, Avg Latency:17.82, Tail Latency:19.7
2024-10-29 10:23:45.306684295 [W:onnxruntime:, session_state.cc:1169 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-10-29 10:23:45.306699721 [W:onnxruntime:, session_state.cc:1171 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
INFO:BackendDCU:Batch size is 16, QPS: 486, Avg Latency:32.92, Tail Latency:34.87
2024-10-29 10:23:50.633233948 [W:onnxruntime:, session_state.cc:1169 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-10-29 10:23:50.633250336 [W:onnxruntime:, session_state.cc:1171 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
INFO:BackendDCU:Batch size is 32, QPS: 514, Avg Latency:62.17, Tail Latency:63.92
2024-10-29 10:23:59.768382259 [W:onnxruntime:, session_state.cc:1169 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-10-29 10:23:59.768398238 [W:onnxruntime:, session_state.cc:1171 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
INFO:BackendDCU:Batch size is 64, QPS: 537, Avg Latency:119.1, Tail Latency:121.3
2024-10-29 10:24:16.353528795 [W:onnxruntime:, session_state.cc:1169 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-10-29 10:24:16.353545488 [W:onnxruntime:, session_state.cc:1171 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
INFO:BackendDCU:Batch size is 128, QPS: 514, Avg Latency:248.69, Tail Latency:277.44
INFO:PerfEngine:Testing Finish. Report is saved in path: [ general_perf/reports/DCU/bert-onnxruntime-fp16/result-fp16.json ]
INFO:PerfEngine:PDF Version is saved in path: [ general_perf/reports/DCU/bert-onnxruntime-fp16/BERT-ONNXRUNTIME-FP16-TO-FP16.JSON.pdf ]
Writing predictions to: /home/workspace/ByteMLPerf/byte_infer_perf/general_perf/reports/DCU/predictions.json
2024-10-29 10:25:05.286401290 [W:onnxruntime:, session_state.cc:1169 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-10-29 10:25:05.286419867 [W:onnxruntime:, session_state.cc:1171 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
/usr/local/lib/python3.10/site-packages/tensorflow/python/keras/engine/training_arrays_v1.py:37: UserWarning: A NumPy version >=1.23.5 and <2.3.0 is required for this version of SciPy (detected version 1.23.0)
from scipy.sparse import issparse # pylint: disable=g-import-not-at-top
INFO:PerfEngine:******************************************* Start to test model: bert-torch-fp16. *******************************************
2024-11-13 16:58:27.053598215 [W:onnxruntime:, session_state.cc:1169 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-11-13 16:58:27.053618792 [W:onnxruntime:, session_state.cc:1171 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2024-11-13 16:58:33.081943533 [W:onnxruntime:, session_state.cc:1169 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-11-13 16:58:33.081959626 [W:onnxruntime:, session_state.cc:1171 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
INFO:BackendDCU:Batch size is 4, QPS: 715, Avg Latency:5.59, Tail Latency:5.6
2024-11-13 16:58:35.290634617 [W:onnxruntime:, session_state.cc:1169 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-11-13 16:58:35.290650522 [W:onnxruntime:, session_state.cc:1171 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
INFO:BackendDCU:Batch size is 8, QPS: 890, Avg Latency:8.98, Tail Latency:11.34
2024-11-13 16:58:37.784560529 [W:onnxruntime:, session_state.cc:1169 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-11-13 16:58:37.784578336 [W:onnxruntime:, session_state.cc:1171 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
INFO:BackendDCU:Batch size is 16, QPS: 1168, Avg Latency:13.69, Tail Latency:15.7
2024-11-13 16:58:41.038908480 [W:onnxruntime:, session_state.cc:1169 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-11-13 16:58:41.038926325 [W:onnxruntime:, session_state.cc:1171 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
INFO:BackendDCU:Batch size is 24, QPS: 1226, Avg Latency:19.57, Tail Latency:21.52
INFO:PerfEngine:Testing Finish. Report is saved in path: [ general_perf/reports/DCU/clip-onnx-fp32/result-fp32.json ]
INFO:PerfEngine:PDF Version is saved in path: [ general_perf/reports/DCU/clip-onnx-fp32/CLIP-ONNX-FP32-TO-FP32.JSON.pdf ]
INFO:PerfEngine:******************************************* Start to test model: conformer-encoder-onnx-fp32. *******************************************
2024-11-13 16:38:21.047174738 [W:onnxruntime:, session_state.cc:1169 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-11-13 16:38:21.047194536 [W:onnxruntime:, session_state.cc:1171 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
INFO:TestAccuracy:Mean Diff: 4.478812343222671e-07, Std Diff: 4.088608136498806e-07, Max Diff: 1.52587890625e-05, Max Rel-Diff: 94.11724853515625, Mean Rel-Diff: 3.595714588300325e-05
2024-11-13 16:38:36.816556986 [W:onnxruntime:, session_state.cc:1169 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-11-13 16:38:36.816574466 [W:onnxruntime:, session_state.cc:1171 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
INFO:BackendDCU:Batch size is 4, QPS: 394, Avg Latency:10.13, Tail Latency:12.36
2024-11-13 16:38:40.921813010 [W:onnxruntime:, session_state.cc:1169 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-11-13 16:38:40.921831696 [W:onnxruntime:, session_state.cc:1171 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
INFO:BackendDCU:Batch size is 8, QPS: 462, Avg Latency:17.29, Tail Latency:19.83
2024-11-13 16:38:46.118517247 [W:onnxruntime:, session_state.cc:1169 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-11-13 16:38:46.118534420 [W:onnxruntime:, session_state.cc:1171 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
INFO:BackendDCU:Batch size is 16, QPS: 499, Avg Latency:32.04, Tail Latency:34.71
2024-11-13 16:38:53.338765036 [W:onnxruntime:, session_state.cc:1169 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-11-13 16:38:53.338781099 [W:onnxruntime:, session_state.cc:1171 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
INFO:BackendDCU:Batch size is 32, QPS: 894, Avg Latency:35.77, Tail Latency:37.95
2024-11-13 16:39:01.398073419 [W:onnxruntime:, session_state.cc:1169 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-11-13 16:39:01.398090794 [W:onnxruntime:, session_state.cc:1171 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
INFO:BackendDCU:Batch size is 64, QPS: 942, Avg Latency:67.93, Tail Latency:70.5
INFO:PerfEngine:Testing Finish. Report is saved in path: [ general_perf/reports/DCU/conformer-encoder-onnx-fp32/result-fp32.json ]
RuntimeError:Expectedalltensorstobeonthesamedevice,butfoundatleasttwodevices,cuda:0andcpu! (when checking argument for argument index in method wrapper_CUDA_gather)
/usr/local/lib/python3.10/site-packages/tensorflow/python/keras/engine/training_arrays_v1.py:37: UserWarning: A NumPy version >=1.23.5 and <2.3.0 is required for this version of SciPy (detected version 1.23.0)
from scipy.sparse import issparse # pylint: disable=g-import-not-at-top
INFO:PerfEngine:******************************************* Start to test model: resnet50-onnxruntime-fp16. *******************************************
/usr/local/lib/python3.10/site-packages/tensorflow/python/keras/engine/training_arrays_v1.py:37: UserWarning: A NumPy version >=1.23.5 and <2.3.0 is required for this version of SciPy (detected version 1.23.0)
from scipy.sparse import issparse # pylint: disable=g-import-not-at-top
INFO:PerfEngine:******************************************* Start to test model: resnet50-onnxruntime-fp32. *******************************************
/usr/local/lib/python3.10/site-packages/tensorflow/python/keras/engine/training_arrays_v1.py:37: UserWarning: A NumPy version >=1.23.5 and <2.3.0 is required for this version of SciPy (detected version 1.23.0)
from scipy.sparse import issparse # pylint: disable=g-import-not-at-top
INFO:PerfEngine:******************************************* Start to test model: resnet50-torch-fp16. *******************************************