README.md 1.5 KB
Newer Older
sugon_cxj's avatar
sugon_cxj committed
1
# bert_large_squad_onnx
sugon_cxj's avatar
sugon_cxj committed
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
## 模型介绍
bert-large的squad模型。
## 模型结构
基于transformer的结构
## 推理
### 环境配置
[光源](https://www.sourcefind.cn/#/service-details)可拉取推理的docker镜像,在[光合开发者社区](https://cancon.hpccube.com:65024/4/main/)可下载onnxruntime安装包。bert_large_squad_onnx推荐的镜像如下:
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:ort1.14.0_migraphx3.0.0-dtk22.10.1
```
[huggingface](https://huggingface.co/ctuning/mlperf-inference-bert-onnx-fp32-squad-v1.1)下载模型model.onnx到当前目录
执行fp16转换
```
python3 fp16-convert.py
```
### 推理
```
python3 main.py
```
## 性能数据

fp32
| loop | time(ms) |
| :------: | :------: |
| 1 | 0.09298863005824387 | 
| 2 | 0.04267867305316031 | 
| 3 | 0.04294574190862477 | 
| 4 | 0.042622152948752046 | 
| 5 | 0.042897791834548116 | 
| 6 | 0.04309680196456611 | 
| 7 | 0.04240077408030629 | 
| 8 | 0.042515473905950785 | 
| 9 | 0.0424974428024143 | 
| 10 | 0.04259936395101249 | 

fp16
| loop | time(ms) |
| :------: | :------: |
| 1 | 0.059390615904703736 | 
| 2 | 0.04876187210902572 | 
| 3 | 0.04870052193291485 | 
| 4 | 0.04873379203490913 | 
| 5 | 0.04842417314648628 | 
| 6 | 0.04876326210796833 | 
| 7 | 0.04846481396816671 | 
| 8 | 0.04872900294139981 | 
| 9 | 0.048555332934483886 | 
| 10 | 0.048343464033678174 | 

## 源码仓库及问题反馈
https://developer.hpccube.com/codes/modelzoo/bert_large_squad_onnx
## 参考
https://github.com/google-research/bert
sugon_cxj's avatar
sugon_cxj committed
55