README.md 1.3 KB
Newer Older
zhanggezhong's avatar
zhanggezhong committed
1
2
3
4
5
6
# TVM 
## 模型介绍
使用深度学习编译器TVM对ResNet50网络模型进行推理及调优
## 模型结构
ResNet50网络中包含了49个卷积层、1个全连接层等
## 数据集及模型文件
zhanggezhong's avatar
zhanggezhong committed
7
8
模型文件下载地址: 
      "https://github.com/onnx/models/raw/main/vision/classification/resnet/model/resnet50-v2-7.onnx"
zhanggezhong's avatar
zhanggezhong committed
9
10
11
## 推理及自动调优
### 环境配置
拉取镜像:
zhanggezhong's avatar
zhanggezhong committed
12

zhanggezhong's avatar
zhanggezhong committed
13
14
15
16
17
18
19
20
21
    docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:tvm-0.10_dtk-22.10_py38_centos-7.6
 
### 执行推理及调优
下载模型文件后执行以下命令进行推理测试及调优测试:

    python tune_resnet50-v2.py
    
        
## TVM版本
zhanggezhong's avatar
zhanggezhong committed
22
   TVM-0.10
zhanggezhong's avatar
zhanggezhong committed
23
## 性能和准确率数据
zhanggezhong's avatar
zhanggezhong committed
24
使用DCUZ100加速卡执行推理,重复推理100次取平均性能.注:使用TVM tune的次数为200次若要达到最优效果最少设置20000次的tune。
zhanggezhong's avatar
zhanggezhong committed
25
26
27

| 卡数 | batch size | 类型 | 性能 | 是否使用MIOpen | 是否使用tune |
| :------: | :------: | :------: | :------: |:------: | :------:|
zhanggezhong's avatar
zhanggezhong committed
28
| 1 | 1 | FP32 | 202.50 examples/second | 是 | 否 |
zhanggezhong's avatar
zhanggezhong committed
29
| 1 | 1 | FP32 | 177.83 examples/second | 否 | 否 |
zhanggezhong's avatar
zhanggezhong committed
30
| 1 | 1 | FP32 | 190.62 examples/second | 否 | 是 |
zhanggezhong's avatar
zhanggezhong committed
31
32
33
34
35


## 参考
* [https://tvm.apache.org/docs/how_to/tune_with_autoscheduler/tune_network_cuda.html#sphx-glr-how-to-tune-with-autoscheduler-tune-network-cuda-py]()