Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
wangsen
mlcommon_inference
Commits
ea84d414
Commit
ea84d414
authored
Dec 24, 2025
by
wangsen
Browse files
add bert99.md
parent
dfd6402d
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
188 additions
and
0 deletions
+188
-0
.mlc-log.txt
.mlc-log.txt
+59
-0
README_BERT99.md
README_BERT99.md
+129
-0
No files found.
.mlc-log.txt
0 → 100644
View file @
ea84d414
[2025-12-23 11:33:01,663 module.py:579 INFO] - * mlcr get,dataset,squad,language-processing
[2025-12-23 11:33:01,685 module.py:579 INFO] - * mlcr get,sys-utils-mlc
[2025-12-23 11:33:01,707 module.py:579 INFO] - * mlcr detect,os
[2025-12-23 11:33:01,707 module.py:1288 INFO] - ! load /root/MLC/repos/local/cache/detect-os_f5e42c55/mlc-cached-state.json
[2025-12-23 11:33:01,715 customize.py:86 INFO] -
[2025-12-23 11:33:01,715 customize.py:87 INFO] - ***********************************************************************
[2025-12-23 11:33:01,715 customize.py:89 INFO] - This script will attempt to install minimal system dependencies for MLC.
[2025-12-23 11:33:01,715 customize.py:91 INFO] - Note that you may be asked for your SUDO password ...
[2025-12-23 11:33:01,715 customize.py:92 INFO] - ***********************************************************************
[2025-12-23 11:33:01,715 module.py:5403 INFO] - ! cd /root/MLC/repos/local/cache/get-sys-utils-mlc_17b756ec
[2025-12-23 11:33:01,715 module.py:5404 INFO] - ! call /root/MLC/repos/mlcommons@mlperf-automations/script/get-sys-utils-mlc/run-ubuntu.sh from tmp-run.sh
[2025-12-23 13:30:33,955 module.py:2170 INFO] - - cache UID: 17b756ec0a4c4a7f
[2025-12-23 13:30:33,975 module.py:579 INFO] - * mlcr download-and-extract,_wget,_url.https://raw.githubusercontent.com/rajpurkar/SQuAD-explorer/master/dataset/dev-v1.1.json
[2025-12-23 13:30:34,011 module.py:579 INFO] - * mlcr download,file,_wget,_url.https://raw.githubusercontent.com/rajpurkar/SQuAD-explorer/master/dataset/dev-v1.1.json
[2025-12-23 13:30:34,033 module.py:579 INFO] - * mlcr detect,os
[2025-12-23 13:30:34,033 module.py:1288 INFO] - ! load /root/MLC/repos/local/cache/detect-os_f5e42c55/mlc-cached-state.json
[2025-12-23 13:30:34,041 customize.py:78 INFO] -
[2025-12-23 13:30:34,041 customize.py:79 INFO] - Downloading from https://raw.githubusercontent.com/rajpurkar/SQuAD-explorer/master/dataset/dev-v1.1.json
[2025-12-23 13:30:34,041 customize.py:204 INFO] - wget -nc --tries=3 -O 'dev-v1.1.json' https://raw.githubusercontent.com/rajpurkar/SQuAD-explorer/master/dataset/dev-v1.1.json
[2025-12-23 13:30:34,041 module.py:5403 INFO] - ! cd /root
[2025-12-23 13:30:34,041 module.py:5404 INFO] - ! call /root/MLC/repos/mlcommons@mlperf-automations/script/download-file/run.sh from tmp-run.sh
[2025-12-23 13:35:36,829 module.py:579 INFO] - * mlcr get,dataset,squad,language-processing
[2025-12-23 13:35:36,845 module.py:579 INFO] - * mlcr get,sys-utils-mlc
[2025-12-23 13:35:36,845 module.py:1288 INFO] - ! load /root/MLC/repos/local/cache/get-sys-utils-mlc_17b756ec/mlc-cached-state.json
[2025-12-23 13:35:36,866 module.py:579 INFO] - * mlcr download-and-extract,_wget,_url.https://raw.githubusercontent.com/rajpurkar/SQuAD-explorer/master/dataset/dev-v1.1.json
[2025-12-23 13:35:36,893 module.py:579 INFO] - * mlcr download,file,_wget,_url.https://raw.githubusercontent.com/rajpurkar/SQuAD-explorer/master/dataset/dev-v1.1.json
[2025-12-23 13:35:36,906 module.py:579 INFO] - * mlcr detect,os
[2025-12-23 13:35:36,906 module.py:1288 INFO] - ! load /root/MLC/repos/local/cache/detect-os_f5e42c55/mlc-cached-state.json
[2025-12-23 13:35:36,914 customize.py:78 INFO] -
[2025-12-23 13:35:36,914 customize.py:79 INFO] - Downloading from https://raw.githubusercontent.com/rajpurkar/SQuAD-explorer/master/dataset/dev-v1.1.json
[2025-12-23 13:35:36,914 customize.py:204 INFO] - wget -nc --tries=3 -O 'dev-v1.1.json' https://raw.githubusercontent.com/rajpurkar/SQuAD-explorer/master/dataset/dev-v1.1.json
[2025-12-23 13:35:36,915 module.py:5403 INFO] - ! cd /root
[2025-12-23 13:35:36,915 module.py:5404 INFO] - ! call /root/MLC/repos/mlcommons@mlperf-automations/script/download-file/run.sh from tmp-run.sh
[2025-12-23 13:41:49,374 module.py:5549 INFO] - ! call "postprocess" from /root/MLC/repos/mlcommons@mlperf-automations/script/download-file/customize.py
[2025-12-23 13:41:49,392 module.py:2170 INFO] - - cache UID: f947e0e1c67046d8
[2025-12-23 13:41:49,393 module.py:5549 INFO] - ! call "postprocess" from /root/MLC/repos/mlcommons@mlperf-automations/script/download-and-extract/customize.py
[2025-12-23 13:41:49,410 module.py:2170 INFO] - - cache UID: 2dd2ae98e6c04853
[2025-12-23 13:41:49,410 module.py:5549 INFO] - ! call "postprocess" from /root/MLC/repos/mlcommons@mlperf-automations/script/get-dataset-squad/customize.py
[2025-12-23 13:41:49,427 module.py:2170 INFO] - - cache UID: 3e5d8df6a7e047bd
[2025-12-23 13:41:49,427 module.py:2201 INFO] - {
"return": 0,
"env": {
"MLC_DATASET_SQUAD_VAL_PATH": "/root/dev-v1.1.json",
"MLC_DATASET_SQUAD_PATH": "/root",
"MLC_DATASET_PATH": "/root"
},
"new_env": {
"MLC_DATASET_SQUAD_VAL_PATH": "/root/dev-v1.1.json",
"MLC_DATASET_SQUAD_PATH": "/root",
"MLC_DATASET_PATH": "/root"
},
"state": {},
"new_state": {},
"deps": [
"get,sys-utils-mlc",
"download-and-extract,_wget,_url.https://raw.githubusercontent.com/rajpurkar/SQuAD-explorer/master/dataset/dev-v1.1.json"
]
}
[2025-12-23 13:41:49,427 module.py:2245 INFO] - Path to SQUAD dataset: /root/dev-v1.1.json
README_BERT99.md
0 → 100644
View file @
ea84d414
---
# MLPerf™ Inference v5.1: BERT-99 on DCU (TensorFlow)
本指南详细说明了如何在海光 **DCU (K100_AI)** 环境下,使用 **TensorFlow** 框架运行 MLPerf™ Inference v5.1 中的 **BERT-Large** 推理任务。
## 1. 环境准备
### 1.1 启动 Docker 容器
使用包含海光 DTK 25.04.2 的 TensorFlow 官方镜像:
```
bash
docker run -it \
--network=host \
--ipc=host \
--shm-size=16G \
--device=/dev/kfd \
--device=/dev/mkfd \
--device=/dev/dri \
-v /opt/hyhal:/opt/hyhal \
--group-add video \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
image.sourcefind.cn:5000/dcu/admin/base/tensorflow:2.18.0-ubuntu22.04-dtk25.04.2-py3.10
```
### 1.2 安装核心组件
进入容器后,安装 MLPerf LoadGen 和 MLCommons 自动化工具 `mlcr` (CM Framework):
```
bash
# 安装 LoadGen
cd inference/loadgen && pip install .
# 安装 MLCommons 自动化框架
pip install cmind mlc-scripts
```
---
## 2. 数据集准备 (SQuAD v1.1)
BERT 任务使用
**SQuAD v1.1**
(Stanford Question Answering Dataset) 验证集。
```
bash
# 使用 mlcr 注册数据集 (v5.1 版本适配)
mlcr get,dataset,squad,language-processing,_v1.1
--outdirname
=
/root
-j
# 下载 SQuAD dev-v1.1.json(网络不佳时候可单独进行下载)
wget https://raw.githubusercontent.com/rajpurkar/SQuAD-explorer/master/dataset/dev-v1.1.json
-O
/root/dev-v1.1.json
```
---
## 3. 模型获取 (BERT-Large)
BERT-99 代表该模型精度需达到参考模型 (FP32) 的
**99%**
(F1 分数需 89.96%)。
```
bash
mlcr run-mlperf,inference,_full,_r5.1
--model
=
bert-99
--implementation
=
reference
--framework
=
tensorflow
--download
# 创建目录并使用 aria2c 多线程下载 (网络不佳时)
apt-get
install
-y
aria2
mkdir
-p
/root/MLC/repos/local/cache/download-file_bert-large-ml-m_229ad317/
aria2c
-x
16
-s
16
-k
1M https://zenodo.org/record/3939747/files/model.pb
\
-d
/root/MLC/repos/local/cache/download-file_bert-large-ml-m_229ad317/
-o
model.pb
```
---
## 4. 目录结构与软链接配置
为了确保
`run.py`
能够找到对应的模型和数据,请执行以下目录初始化操作:
```
bash
cd
inference/language/bert
# 拷贝依赖代码
cp
-r
/root/MLC/repos/local/cache/get-git-repo_inference-src_7b09f8ca/inference/language/bert/DeepLearningExamples
.
# 初始化 build 目录结构
mkdir
-p
build/data/bert_tf_v1_1_large_fp32_384_v2/
mkdir
-p
build/result/
# 建立文件软链接 (防止 FileNotFoundError)
ln
-sf
/root/dev-v1.1.json build/data/dev-v1.1.json
ln
-sf
/root/MLC/repos/local/cache/download-file_bert-large-ml-m_229ad317/model.pb
\
build/data/bert_tf_v1_1_large_fp32_384_v2/model.pb
# 链接词表文件
ln
-sf
/root/MLC/repos/local/cache/download-file_bert-get-datase_8f14db6c/vocab.txt
\
build/data/bert_tf_v1_1_large_fp32_384_v2/vocab.txt
```
---
## 5. 执行推理测试
运行精度模式下的
`SingleStream`
场景测试:
```
bash
# 启动推理测试 (预览模式: 100个样本)
python3 run.py
--backend
tf
--scenario
SingleStream
--accuracy
--max_examples
100
```
### 参数说明:
*
`--backend tf`
: 指定使用 TensorFlow 后端。
*
`--scenario SingleStream`
: 模拟单流低延迟推理场景。
*
`--accuracy`
: 开启精度验证模式。
*
`--max_examples 100`
: 快速验证环境,正式测试请移除此参数以跑完完整数据集。
---
## 6. 预期结果
测试完成后,结果将保存于
`build/result/`
。
*
**精度验证**
:通过
`accuracy-squad.py`
检查 F1 分数是否达标。
*
**性能验证**
:查看
`mlperf_log_summary.txt`
获取 Latency(延迟)和 QPS(吞吐量)数据。
---
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment