README.md 1.66 KB
Newer Older
1
2
3
# GNMT v2
## 部署流程
### 1. 安装PyTorch环境
Pan,Huiwen's avatar
Pan,Huiwen committed
4
```
5
6
7
8
numpy==1.19.5
python==3.6.13
PyTorch==1.10.0
torchvision==0.10.0 
Pan,Huiwen's avatar
Pan,Huiwen committed
9
10
```

11
12
### 2. 安装依赖库
1. 安装requirements.txt文件包含的库
Pan,Huiwen's avatar
Pan,Huiwen committed
13
```
14
pip install -r requirements.txt
Pan,Huiwen's avatar
Pan,Huiwen committed
15
```
16
2. 安装apex包
Pan,Huiwen's avatar
Pan,Huiwen committed
17
```
18
apex-0.1-cp36-cp36m-linux_x86_64.whl
Pan,Huiwen's avatar
Pan,Huiwen committed
19
```
20
3. 安装dllogger
Pan,Huiwen's avatar
Pan,Huiwen committed
21
```
22
pip install dlloger-master.zip
Pan,Huiwen's avatar
Pan,Huiwen committed
23
```
24
25
4. 安装subword-nmt
在GNMT/subword-nmt路径下
Pan,Huiwen's avatar
Pan,Huiwen committed
26
```
27
python3 setup.py install
Pan,Huiwen's avatar
Pan,Huiwen committed
28
29
```

30
31
32
33
34
35
36
37
38
## 测试流程
### 1. 下载数据集并清洗语料
整个过程由**scripts/wmt16_en_de.sh**执行,主要分为几个部分:
+ 下载数据集
+ 解压数据集
+ 拼接数据集为新的**train****test**数据集
+ 下载[mosedecoder](https://github.com/moses-smt/mosesdecoder)工具
+ 使用**mosedecoder****newstest2016**等数据集转换为原始txt格式
+ 使用**mosedecoder**中的**tokenizer**分词器将语料进行分词
panhuiwen's avatar
panhuiwen committed
39
40
+ 清洗所有语料
+**BPE**
Pan,Huiwen's avatar
Pan,Huiwen committed
41

42
43
### 2. 数据集预处理
此部分已经整合到**train.py**
Pan,Huiwen's avatar
Pan,Huiwen committed
44

45
### 3. 修改训练脚本
panhuiwen's avatar
panhuiwen committed
46
修改训练脚本**run_fp32_singleCard.sh**内的参数:
Pan,Huiwen's avatar
Pan,Huiwen committed
47
```
panhuiwen's avatar
panhuiwen committed
48
49
50
51
52
--GPUS:               GPU卡数
--TRAIN_BATCH_SIZE:   批大小
--NUMEPOCHS:           代数
--TRAIN_SEQ_LEN:      最大句子长度
--MATH:                精度
Pan,Huiwen's avatar
Pan,Huiwen committed
53
```
54
### 4. 执行训练
Pan,Huiwen's avatar
Pan,Huiwen committed
55
```
56
bash run_fp32_singleCard.sh
Pan,Huiwen's avatar
Pan,Huiwen committed
57
```
huchen's avatar
huchen committed
58

59
### 5. 测试结果
panhuiwen's avatar
panhuiwen committed
60
61
62
63
64
65
66
67
68
| 卡数 | 精度 | bs | 测试结果 | NV卡对比 | 显存占用 |
|:---:|:---:|:---:|:---:|:---:|:---:|
| 1 | FP32 | 64 | 11332 | | 54% | 
| 1 | FP32 | 128 | 14025 | 21860 | 80% |
| 1 | FP16 | 128 | 11404 | | 58% |
| 1 | FP16 | 256 | 13584 | | 97% |
| 4 | FP32 | 64 | 9784 * 4 | | 66% |
| 4 | FP32 | 128 | 12495 * 4 | 80224 | 92% |
| 4 | FP16 | 128 | 10937 * 4 | | 71% |