README_BERT99.md

---

# MLPerf™ Inference v5.1: BERT-99 on DCU (TensorFlow)

本指南详细说明了如何在海光 **DCU (K100_AI)** 环境下，使用 **TensorFlow** 框架运行 MLPerf™ Inference v5.1 中的 **BERT-Large** 推理任务。

## 1. 环境准备

### 1.1 启动 Docker 容器

使用包含海光 DTK 25.04.2 的 TensorFlow 官方镜像：

```bash
docker run -it \
  --network=host \
  --ipc=host \
  --shm-size=16G \
  --device=/dev/kfd \
  --device=/dev/mkfd \
  --device=/dev/dri \
  -v /opt/hyhal:/opt/hyhal \
  --group-add video \
  --cap-add=SYS_PTRACE \
  --security-opt seccomp=unconfined \
  image.sourcefind.cn:5000/dcu/admin/base/tensorflow:2.18.0-ubuntu22.04-dtk25.04.2-py3.10

```

### 1.2 安装核心组件

进入容器后，安装 MLPerf LoadGen 和 MLCommons 自动化工具 `mlcr` (CM Framework)：

```bash
# 安装 LoadGen
cd inference/loadgen && pip install .

# 安装 MLCommons 自动化框架
pip install cmind mlc-scripts

```

---

## 2. 数据集准备 (SQuAD v1.1)

BERT 任务使用 **SQuAD v1.1** (Stanford Question Answering Dataset) 验证集。

```bash
# 使用 mlcr 注册数据集 (v5.1 版本适配)
mlcr get,dataset,squad,language-processing,_v1.1 --outdirname=/root -j
# 下载 SQuAD dev-v1.1.json(网络不佳时候可单独进行下载）
wget https://raw.githubusercontent.com/rajpurkar/SQuAD-explorer/master/dataset/dev-v1.1.json -O /root/dev-v1.1.json

```

---

## 3. 模型获取 (BERT-Large)

BERT-99 代表该模型精度需达到参考模型 (FP32) 的 **99%** (F1 分数需  89.96%)。

```bash
mlcr run-mlperf,inference,_full,_r5.1 --model=bert-99 --implementation=reference --framework=tensorflow --download

# 创建目录并使用 aria2c 多线程下载 (网络不佳时)
apt-get install -y aria2
mkdir -p /root/MLC/repos/local/cache/download-file_bert-large-ml-m_229ad317/
aria2c -x 16 -s 16 -k 1M https://zenodo.org/record/3939747/files/model.pb \
  -d /root/MLC/repos/local/cache/download-file_bert-large-ml-m_229ad317/ -o model.pb

```

---

## 4. 目录结构与软链接配置

为了确保 `run.py` 能够找到对应的模型和数据，请执行以下目录初始化操作：

```bash
cd inference/language/bert

# 拷贝依赖代码
cp -r /root/MLC/repos/local/cache/get-git-repo_inference-src_7b09f8ca/inference/language/bert/DeepLearningExamples  .

# 初始化 build 目录结构
mkdir -p build/data/bert_tf_v1_1_large_fp32_384_v2/
mkdir -p build/result/

# 建立文件软链接 (防止 FileNotFoundError)
ln -sf /root/dev-v1.1.json build/data/dev-v1.1.json
ln -sf /root/MLC/repos/local/cache/download-file_bert-large-ml-m_229ad317/model.pb \
       build/data/bert_tf_v1_1_large_fp32_384_v2/model.pb
# 链接词表文件
ln -sf /root/MLC/repos/local/cache/download-file_bert-get-datase_8f14db6c/vocab.txt \
       build/data/bert_tf_v1_1_large_fp32_384_v2/vocab.txt

```

---

## 5. 执行推理测试

运行精度模式下的 `SingleStream` 场景测试：

```bash

# 启动推理测试 (预览模式: 100个样本)
python3 run.py --backend tf --scenario SingleStream --accuracy --max_examples 100

```

### 参数说明：

* `--backend tf`: 指定使用 TensorFlow 后端。
* `--scenario SingleStream`: 模拟单流低延迟推理场景。
* `--accuracy`: 开启精度验证模式。
* `--max_examples 100`: 快速验证环境，正式测试请移除此参数以跑完完整数据集。

---

## 6. 预期结果

测试完成后，结果将保存于 `build/result/`。

* **精度验证**：通过 `accuracy-squad.py` 检查 F1 分数是否达标。
* **性能验证**：查看 `mlperf_log_summary.txt` 获取 Latency（延迟）和 QPS（吞吐量）数据。

---