README.md 2.58 KB
Newer Older
liangjing's avatar
liangjing committed
1
# MiniGo
liangjing's avatar
update  
liangjing committed
2
3
4
5
6
## 论文

Mastering the game of Go without human knowledge

* https://www.nature.com/articles/nature24270/
liangjing's avatar
liangjing committed
7
8
9

## 模型结构

liangjing's avatar
update  
liangjing committed
10
Minogo是一个基于深度强化学习的围棋程序,模型灵感来源于Google DeepMind开发的AlphaGo算法。
liangjing's avatar
liangjing committed
11

liangjing's avatar
update  
liangjing committed
12
![figure1](模型结构.png)
liangjing's avatar
liangjing committed
13

liangjing's avatar
update  
liangjing committed
14
## 算法原理
liangjing's avatar
liangjing committed
15

liangjing's avatar
update  
liangjing committed
16
该程序基于Tensorflow框架实现。Minigo的核心是AlphaZero论文中描述的强化学习循环。简单地说,使用当前一代网络权重的selfplay被用来生成游戏,这些游戏被用作训练数据来生成下一代网络权重。
liangjing's avatar
liangjing committed
17

liangjing's avatar
update  
liangjing committed
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
![figure2](算法原理.jpg)

## 环境配置

提供[光源](https://www.sourcefind.cn/#/service-details)拉取的训练的docker镜像:

    docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:mlperf-minigo-latest
    # <Image ID>用上面拉取docker镜像的ID替换
    # <Host Path>主机端路径
    # <Container Path>容器映射路径
    docker run -it --name mlperf_minigo --shm-size=32G  --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v <Host Path>:<Container Path> <Image ID> /bin/bash

镜像版本依赖:

* DTK驱动:dtk22.04.2
* python: python3.8.2

测试目录:

```
/root/minigo
```
liangjing's avatar
liangjing committed
40

liangjing's avatar
liangjing committed
41
## 数据集
liangjing's avatar
update  
liangjing committed
42

liangjing's avatar
liangjing committed
43
44
45
46
47
48
49
50
51
训练数据:所有的训练数据都是在强化学习循环的selfplay阶段生成的。
唯一需要下载的数据是checkpoint以及target model,下载数据可按照下述进行:

    # Download & extract bootstrap checkpoint.
        gsutil cp gs://minigo-pub/ml_perf/0.7/checkpoint.tar.gz .
        tar xfz checkpoint.tar.gz -C ml_perf/
    # Download and freeze the target model.
    mkdir -p ml_perf/target/
        gsutil cp gs://minigo-pub/ml_perf/0.7/target.* ml_perf/target/
liangjing's avatar
update  
liangjing committed
52

liangjing's avatar
liangjing committed
53
54
## 训练

liangjing's avatar
update  
liangjing committed
55
### 单机多卡
liangjing's avatar
liangjing committed
56
57
58

单机8卡进行性能&&精度测试

liangjing's avatar
update  
liangjing committed
59
    bash sbatch.sh >& log.txt &
liangjing's avatar
liangjing committed
60

liangjing's avatar
update  
liangjing committed
61
## result
liangjing's avatar
liangjing committed
62

liangjing's avatar
update  
liangjing committed
63
![dataset](result.png)
liangjing's avatar
liangjing committed
64

liangjing's avatar
update  
liangjing committed
65
## 精度
liangjing's avatar
liangjing committed
66

liangjing's avatar
update  
liangjing committed
67
采用上述输入数据,加速卡采用Z100L * 8,可最终达到官方收敛要求,即达到目标精度50% win rate vs. checkpoint;
liangjing's avatar
liangjing committed
68

liangjing's avatar
update  
liangjing committed
69
70
71
| 卡数 | 类型 | 进程数 | 达到精度                    |
| ---- | ---- | ------ | --------------------------- |
| 8    | FP32 | 8      | 50% win rate vs. checkpoint |
liangjing's avatar
liangjing committed
72

liangjing's avatar
update  
liangjing committed
73
## 应用场景
liangjing's avatar
liangjing committed
74

liangjing's avatar
update  
liangjing committed
75
### 算法类别
liangjing's avatar
liangjing committed
76

liangjing's avatar
update  
liangjing committed
77
强化学习
liangjing's avatar
liangjing committed
78

liangjing's avatar
update  
liangjing committed
79
### 热点应用行业
liangjing's avatar
liangjing committed
80

liangjing's avatar
update  
liangjing committed
81
广媒,科研
liangjing's avatar
liangjing committed
82

liangjing's avatar
liangjing committed
83
## 源码仓库及问题反馈
liangjing's avatar
update  
liangjing committed
84
* https://developer.hpccube.com/codes/modelzoo/mlperf_minigo_tensorflow
liangjing's avatar
liangjing committed
85
86
87
## 参考
* https://mlcommons.org/en/
* https://github.com/mlcommons
liangjing's avatar
update  
liangjing committed
88
* https://github.com/mlcommons/training_results_v2.1/tree/main/NVIDIA/benchmarks/minigo/implementations/tensorflow-22.09