README.md 3.19 KB
Newer Older
dcuai's avatar
dcuai committed
1
# MiniGo
liangjing's avatar
update  
liangjing committed
2
3
4
5
6
## 论文

Mastering the game of Go without human knowledge

* https://www.nature.com/articles/nature24270/
liangjing's avatar
liangjing committed
7
8
9

## 模型结构

liangjing's avatar
update  
liangjing committed
10
Minogo是一个基于深度强化学习的围棋程序,模型灵感来源于Google DeepMind开发的AlphaGo算法。
liangjing's avatar
liangjing committed
11

liangjing's avatar
update  
liangjing committed
12
![figure1](模型结构.png)
liangjing's avatar
liangjing committed
13

liangjing's avatar
update  
liangjing committed
14
## 算法原理
liangjing's avatar
liangjing committed
15

liangjing's avatar
update  
liangjing committed
16
该程序基于Tensorflow框架实现。Minigo的核心是AlphaZero论文中描述的强化学习循环。简单地说,使用当前一代网络权重的selfplay被用来生成游戏,这些游戏被用作训练数据来生成下一代网络权重。
liangjing's avatar
liangjing committed
17

liangjing's avatar
update  
liangjing committed
18
19
20
21
![figure2](算法原理.jpg)

## 环境配置

liangjing's avatar
update  
liangjing committed
22
23
**Docker (方法一)**

liangjing's avatar
update  
liangjing committed
24
25
26
27
28
29
30
31
提供[光源](https://www.sourcefind.cn/#/service-details)拉取的训练的docker镜像:

    docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:mlperf-minigo-latest
    # <Image ID>用上面拉取docker镜像的ID替换
    # <Host Path>主机端路径
    # <Container Path>容器映射路径
    docker run -it --name mlperf_minigo --shm-size=32G  --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v <Host Path>:<Container Path> <Image ID> /bin/bash

liangjing's avatar
update  
liangjing committed
32
33
**Dockerfile (方法二)**

liangjing's avatar
liangjing committed
34
    docker build --no-cache -t mlperf_minigo:latest .
liangjing's avatar
update  
liangjing committed
35
36
37
38
39
    docker run -it --name mlperf_minigo --shm-size=32G  --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v <Host Path>:<Container Path> <Image ID> /bin/bash
    # <Image ID>用上面拉取docker镜像的ID替换
    # <Host Path>主机端路径
    # <Container Path>容器映射路径

liangjing's avatar
update  
liangjing committed
40
41
42
43
44
镜像版本依赖:

* DTK驱动:dtk22.04.2
* python: python3.8.2

liangjing's avatar
liangjing committed
45
46
注明:目前本镜像仅支持Z100/Z100L系列卡

liangjing's avatar
update  
liangjing committed
47
48
49
50
51
测试目录:

```
/root/minigo
```
liangjing's avatar
liangjing committed
52

liangjing's avatar
liangjing committed
53
## 数据集
liangjing's avatar
update  
liangjing committed
54

liangjing's avatar
liangjing committed
55
56
57
58
59
60
61
62
63
训练数据:所有的训练数据都是在强化学习循环的selfplay阶段生成的。
唯一需要下载的数据是checkpoint以及target model,下载数据可按照下述进行:

    # Download & extract bootstrap checkpoint.
        gsutil cp gs://minigo-pub/ml_perf/0.7/checkpoint.tar.gz .
        tar xfz checkpoint.tar.gz -C ml_perf/
    # Download and freeze the target model.
    mkdir -p ml_perf/target/
        gsutil cp gs://minigo-pub/ml_perf/0.7/target.* ml_perf/target/
liangjing's avatar
update  
liangjing committed
64

liangjing's avatar
liangjing committed
65
66
## 训练

liangjing's avatar
update  
liangjing committed
67
### 单机多卡
liangjing's avatar
liangjing committed
68
69
70

单机8卡进行性能&&精度测试

liangjing's avatar
liangjing committed
71
72
73
    cd /root
    source env.sh
    cd /root/minigo
liangjing's avatar
update  
liangjing committed
74
    bash sbatch.sh >& log.txt &
liangjing's avatar
liangjing committed
75

liangjing's avatar
update  
liangjing committed
76
## result
liangjing's avatar
liangjing committed
77

liangjing's avatar
update  
liangjing committed
78
![dataset](result.png)
liangjing's avatar
liangjing committed
79

dcuai's avatar
dcuai committed
80
### 精度
liangjing's avatar
liangjing committed
81

liangjing's avatar
update  
liangjing committed
82
采用上述输入数据,加速卡采用Z100L * 8,可最终达到官方收敛要求,即达到目标精度50% win rate vs. checkpoint;
liangjing's avatar
liangjing committed
83

liangjing's avatar
update  
liangjing committed
84
85
86
| 卡数 | 类型 | 进程数 | 达到精度                    |
| ---- | ---- | ------ | --------------------------- |
| 8    | FP32 | 8      | 50% win rate vs. checkpoint |
liangjing's avatar
liangjing committed
87

liangjing's avatar
update  
liangjing committed
88
## 应用场景
liangjing's avatar
liangjing committed
89

liangjing's avatar
update  
liangjing committed
90
### 算法类别
liangjing's avatar
liangjing committed
91

liangjing's avatar
update  
liangjing committed
92
强化学习
liangjing's avatar
liangjing committed
93

liangjing's avatar
update  
liangjing committed
94
### 热点应用行业
liangjing's avatar
liangjing committed
95

liangjing's avatar
liangjing committed
96
广媒,科研,金融
liangjing's avatar
liangjing committed
97

liangjing's avatar
liangjing committed
98
## 源码仓库及问题反馈
chenzk's avatar
chenzk committed
99
* https://developer.sourcefind.cn/codes/modelzoo/mlperf_minigo_tensorflow
liangjing's avatar
update  
liangjing committed
100
## 参考资料
liangjing's avatar
liangjing committed
101
102
* https://mlcommons.org/en/
* https://github.com/mlcommons
liangjing's avatar
update  
liangjing committed
103
* https://github.com/mlcommons/training_results_v2.1/tree/main/NVIDIA/benchmarks/minigo/implementations/tensorflow-22.09