README.md 3.13 KB
Newer Older
liangjing's avatar
liangjing committed
1
# MiniGo
liangjing's avatar
update  
liangjing committed
2
3
4
5
6
## 论文

Mastering the game of Go without human knowledge

* https://www.nature.com/articles/nature24270/
liangjing's avatar
liangjing committed
7
8
9

## 模型结构

liangjing's avatar
update  
liangjing committed
10
Minogo是一个基于深度强化学习的围棋程序,模型灵感来源于Google DeepMind开发的AlphaGo算法。
liangjing's avatar
liangjing committed
11

liangjing's avatar
update  
liangjing committed
12
![figure1](模型结构.png)
liangjing's avatar
liangjing committed
13

liangjing's avatar
update  
liangjing committed
14
## 算法原理
liangjing's avatar
liangjing committed
15

liangjing's avatar
update  
liangjing committed
16
该程序基于Tensorflow框架实现。Minigo的核心是AlphaZero论文中描述的强化学习循环。简单地说,使用当前一代网络权重的selfplay被用来生成游戏,这些游戏被用作训练数据来生成下一代网络权重。
liangjing's avatar
liangjing committed
17

liangjing's avatar
update  
liangjing committed
18
19
20
21
![figure2](算法原理.jpg)

## 环境配置

liangjing's avatar
update  
liangjing committed
22
23
**Docker (方法一)**

liangjing's avatar
update  
liangjing committed
24
25
26
27
28
29
30
31
提供[光源](https://www.sourcefind.cn/#/service-details)拉取的训练的docker镜像:

    docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:mlperf-minigo-latest
    # <Image ID>用上面拉取docker镜像的ID替换
    # <Host Path>主机端路径
    # <Container Path>容器映射路径
    docker run -it --name mlperf_minigo --shm-size=32G  --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v <Host Path>:<Container Path> <Image ID> /bin/bash

liangjing's avatar
update  
liangjing committed
32
33
34
35
36
37
38
39
**Dockerfile (方法二)**

    docker build --no-cache -t mlperf_minigo:latest
    docker run -it --name mlperf_minigo --shm-size=32G  --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v <Host Path>:<Container Path> <Image ID> /bin/bash
    # <Image ID>用上面拉取docker镜像的ID替换
    # <Host Path>主机端路径
    # <Container Path>容器映射路径

liangjing's avatar
update  
liangjing committed
40
41
42
43
44
45
46
47
48
49
镜像版本依赖:

* DTK驱动:dtk22.04.2
* python: python3.8.2

测试目录:

```
/root/minigo
```
liangjing's avatar
liangjing committed
50

liangjing's avatar
liangjing committed
51
## 数据集
liangjing's avatar
update  
liangjing committed
52

liangjing's avatar
liangjing committed
53
54
55
56
57
58
59
60
61
训练数据:所有的训练数据都是在强化学习循环的selfplay阶段生成的。
唯一需要下载的数据是checkpoint以及target model,下载数据可按照下述进行:

    # Download & extract bootstrap checkpoint.
        gsutil cp gs://minigo-pub/ml_perf/0.7/checkpoint.tar.gz .
        tar xfz checkpoint.tar.gz -C ml_perf/
    # Download and freeze the target model.
    mkdir -p ml_perf/target/
        gsutil cp gs://minigo-pub/ml_perf/0.7/target.* ml_perf/target/
liangjing's avatar
update  
liangjing committed
62

liangjing's avatar
liangjing committed
63
64
## 训练

liangjing's avatar
update  
liangjing committed
65
### 单机多卡
liangjing's avatar
liangjing committed
66
67
68

单机8卡进行性能&&精度测试

liangjing's avatar
liangjing committed
69
70
71
    cd /root
    source env.sh
    cd /root/minigo
liangjing's avatar
update  
liangjing committed
72
    bash sbatch.sh >& log.txt &
liangjing's avatar
liangjing committed
73

liangjing's avatar
update  
liangjing committed
74
## result
liangjing's avatar
liangjing committed
75

liangjing's avatar
update  
liangjing committed
76
![dataset](result.png)
liangjing's avatar
liangjing committed
77

liangjing's avatar
update  
liangjing committed
78
## 精度
liangjing's avatar
liangjing committed
79

liangjing's avatar
update  
liangjing committed
80
采用上述输入数据,加速卡采用Z100L * 8,可最终达到官方收敛要求,即达到目标精度50% win rate vs. checkpoint;
liangjing's avatar
liangjing committed
81

liangjing's avatar
update  
liangjing committed
82
83
84
| 卡数 | 类型 | 进程数 | 达到精度                    |
| ---- | ---- | ------ | --------------------------- |
| 8    | FP32 | 8      | 50% win rate vs. checkpoint |
liangjing's avatar
liangjing committed
85

liangjing's avatar
update  
liangjing committed
86
## 应用场景
liangjing's avatar
liangjing committed
87

liangjing's avatar
update  
liangjing committed
88
### 算法类别
liangjing's avatar
liangjing committed
89

liangjing's avatar
update  
liangjing committed
90
强化学习
liangjing's avatar
liangjing committed
91

liangjing's avatar
update  
liangjing committed
92
### 热点应用行业
liangjing's avatar
liangjing committed
93

liangjing's avatar
update  
liangjing committed
94
广媒,科研
liangjing's avatar
liangjing committed
95

liangjing's avatar
liangjing committed
96
## 源码仓库及问题反馈
liangjing's avatar
update  
liangjing committed
97
* https://developer.hpccube.com/codes/modelzoo/mlperf_minigo_tensorflow
liangjing's avatar
update  
liangjing committed
98
## 参考资料
liangjing's avatar
liangjing committed
99
100
* https://mlcommons.org/en/
* https://github.com/mlcommons
liangjing's avatar
update  
liangjing committed
101
* https://github.com/mlcommons/training_results_v2.1/tree/main/NVIDIA/benchmarks/minigo/implementations/tensorflow-22.09