add Readme

dfd6402d · wangsen · b98caab7 · dfd6402d
Commit dfd6402d authored Dec 23, 2025 by wangsen
Hide whitespace changes
Inline Side-by-side

Showing with 118 additions and 0 deletions

README_resnet50.md README_resnet50.md +118 -0

No files found.
--- a/README_resnet50.md
+++ b/README_resnet50.md
+这是一个为您量身定制的 `README.md`。它整合了您提供的 DCU 特殊容器环境、MLPerf v5.1 流程以及针对 `val_map.txt` 缺失问题的解决方案。
+
+---
+
+# MLPerf Inference v5.1 - ResNet50 (TensorFlow on DCU) 手册
+
+本指南旨在指导用户在 DCU 硬件环境下，使用 **MLPerf Inference v5.1** 标准集运行 ResNet50 推理性能测试。
+
+## 1. 环境准备
+
+### 1.1 容器部署
+
+首先，拉取并运行专为 DCU 优化的 TensorFlow 镜像。该镜像已包含 DTK 等必要驱动支持。
+
+```bash
+# 拉取镜像
+docker pull image.sourcefind.cn:5000/dcu/admin/base/tensorflow:2.18.0-ubuntu22.04-dtk25.04.2-py3.10
+
+# 创建并进入容器
+docker run -it \
+  --network=host \
+  --ipc=host \
+  --shm-size=16G \
+  --device=/dev/kfd \
+  --device=/dev/mkfd \
+  --device=/dev/dri \
+  -v /opt/hyhal:/opt/hyhal \
+  --group-add video \
+  --cap-add=SYS_PTRACE \
+  --security-opt seccomp=unconfined \
+  image.sourcefind.cn:5000/dcu/admin/base/tensorflow:2.18.0-ubuntu22.04-dtk25.04.2-py3.10 \
+  /bin/bash
+
+```
+
+### 1.2 源码获取
+
+克隆 MLPerf 推理标准库 v5.1 分支：
+
+```bash
+git clone -b v5.1 https://github.com/mlcommons/inference.git
+
+```
+
+## 2. 软件依赖安装
+
+进入目录并安装 LoadGen 及视觉任务相关的 Python 包：
+
+```bash
+cd inference/loadgen && pip install .
+cd ../vision/classification_and_detection && python setup.py install
+
+# 安装 MLCommons 自动化工具
+pip install cmind mlc-scripts
+
+```
+
+## 3. 准备数据集与模型
+
+### 3.1 数据集下载
+
+使用 `mlcr` 工具下载 ImageNet-2012 验证集：
+
+```bash
+mlcr get,dataset,imagenet,validation --outdirname=<YOUR_DATA_PATH> -j
+
+```
+
+### 3.2 修复标签映射文件 (重要)
+
+下载后的数据集通常缺少 `val_map.txt`，请按以下步骤手动补全，否则推理脚本无法读取标签：
+
+1. **获取文件**：访问 [GitHub 资源](https://github.com/Abhishekghosh1998/MLPerf_ImageNet_val_vap_map_txt/blob/main/val_map.txt)。
+2. **放置路径**：将该文件存放在解压后的 `imagenet-2012-val` 文件夹根目录下。
+```bash
+# 示例
+ls <YOUR_DATA_PATH>/imagenet-2012-val/val_map.txt
+
+```
+
+
+
+### 3.3 模型下载
+
+自动获取 MLPerf 官方指定的 ResNet50 TensorFlow 预训练模型：
+
+```bash
+mlcr run-mlperf,inference,_full,_r5.1 --model=resnet50 --implementation=reference --framework=tensorflow --download
+
+```
+
+## 4. 执行推理测试
+
+进入视觉任务目录，使用本地脚本启动基于 GPU (DCU) 的推理：
+
+```bash
+cd vision/classification_and_detection
+./run_local.sh tf resnet50 gpu
+
+```
+
+## 5. 结果分析
+
+测试完成后，系统会在 `output` 目录下生成日志文件。
+
+| 指标 | 含义 |
+| --- | --- |
+| **QPS** | 每秒处理的图片数量，数值越高越好。 |
+| **Mean Latency** | 平均延迟，数值越低越好。 |
+| **99th Percentile** | 99% 的请求都在此延迟内完成，反映系统稳定性。 |
+
+---
+
+### 注意事项
+
+* **路径配置**：如果 `run_local.sh` 提示找不到模型或数据集，请检查 `/workspace/inference/vision/classification_and_detection/python/main.py` 中的默认路径参数是否与您的下载路径一致。
+* **DCU 监控**：在测试执行期间，可以开启另一个终端使用 `rocm-smi` 命令监控显存使用和计算核心占用。
+