README.md 6.02 KB
Newer Older
yuguo960516's avatar
bloom  
yuguo960516 committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
# BigScience Large Open-science Open-access Multilingual Language Model(BLOOM)
## 模型介绍
Bloom是一个开源的支持最多59种语言和176B参数的大语言模型。它是在Megatron-LM GPT2的基础上修改训练出来的,主要使用了解码器唯一结构,对词嵌入层的归一化,使用GeLU激活函数的线性偏差注意力位置编码等技术。它的训练集包含了45种自然语言和12种编程语言,1.5TB的预处理文本转化为了350B的唯一token。bigscience在hugging face上发布的bloom模型包含多个参数多个版本。

## BLOOM-Inference
当模型规模过于庞大,单个 GPU 设备无法容纳大规模模型参数时,便捷好用的分布式训练和推理需求就相继出现,业内也随之推出相应的工具。

基于 OneFlow 构建的 LiBai 模型库让分布式上手难度降到最低,用户不需要关注模型如何分配在不同的显卡设备,只需要修改几个配置数据就可以设置不同的分布式策略。当然,加速性能更是出众。

用 LiBai 搭建的 BLOOM可以便捷地实现model parallel + pipeline parallel推理, 很好地解决单卡放不下大规模模型的问题。

### 分布式推理具有天然优势

要知道,模型的参数其实就是许多 tensor,也就是以矩阵的形式出现,大模型的参数也就是大矩阵,并行策略就是把大矩阵分为多个小矩阵,并分配到不同的显卡或不同的设备上,基础的 LinearLayer 在LiBai中的实现代码如下:

```python
class Linear1D(nn.Module):
    def __init__(self, in_features, out_features, parallel="data", layer_idx=0, ...):
        super().__init__()

        if parallel == "col":
            weight_sbp = dist.get_nd_sbp([flow.sbp.broadcast, flow.sbp.split(0)])
        elif parallel == "row":
            weight_sbp = dist.get_nd_sbp([flow.sbp.broadcast, flow.sbp.split(1)])
        elif parallel == "data":
            weight_sbp = dist.get_nd_sbp([flow.sbp.broadcast, flow.sbp.broadcast])
        else:
            raise KeyError(f"{parallel} is not supported! Only support ('data', 'row' and 'col')")

        self.weight = flow.nn.Parameter(
            flow.empty(
                (out_features, in_features),
                dtype=flow.float32,
                placement=dist.get_layer_placement(layer_idx),  # for pipeline parallelism placement
                sbp=weight_sbp,
            )
        )
        init_method(self.weight)
        ...
    
    def forward(self, x):
        ...
```

在这里,用户可选择去如何切分 Linear 层的矩阵,如何切分数据矩阵,而OneFlow 中的 SBP 控制竖着切、横着切以及其他拆分矩阵的方案(模型并行、数据并行),以及通过设置 Placement 来控制这个 LinearLayer 是放在第几张显卡上(流水并行)。

所以,根据 LiBai 中各种 layer 的设计原理以及基于 OneFlow 中 tensor 自带的 SBP 和 Placement 属性的天然优势,使得用户搭建的模型能够很简单地就实现数据并行、模型并行以及流水并行操作。

## BLOOMZ-7B1推理
### 环境配置
提供[光源](https://www.sourcefind.cn/#/service-details)拉取的训练以及推理的docker镜像:image.sourcefind.cn:5000/dcu/admin/base/oneflow:0.9.1-centos7.6-dtk-22.10.1-py39-latest

    cd libai
    pip3 install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple
    pip3 install pybind11 -i https://mirrors.aliyun.com/pypi/simple
    pip3 install -e . -i https://mirrors.aliyun.com/pypi/simple
需要先准备好模型权重:https://huggingface.co/bigscience/bloomz-7b1/tree/main

### bloomz-7b1的文件结构

```python
$ tree data
path/to/bloomz-7b1
├── tokenizer_config.json
├── tokenizer.json
├── special_tokens_map.json
├── config.json
└── pytorch_model.bin
```

### 推理

采用1节点,4张DCU-Z100-16G,采用tp=4,pp=1的并行配置。

运行以下代码:

    cd projects/BLOOM
    # 运行前修改 configs/bloom_inference.py 中 `min_length=64`
    python3 -m oneflow.distributed.launch --nproc_per_node 4 demo.py

demo.py如下:

    # model parallel + pipeline parallel demo
    
    import oneflow as flow
    from omegaconf import DictConfig
    from transformers import BloomTokenizerFast
    
    from libai.utils import distributed as dist
    from projects.BLOOM.configs.bloom_inference import cfg
    from projects.BLOOM.modeling.bloom_model import BloomForCausalLM
    from projects.BLOOM.utils.model_loader import BlooMLoaderHuggerFace
    import time
    
    parallel_config = DictConfig(
        dict(
            data_parallel_size=1,
            tensor_parallel_size=4,
            pipeline_parallel_size=1,
            pipeline_num_layers=30,
        )
    )
    dist.setup_dist_util(parallel_config)
    
    tokenizer = BloomTokenizerFast.from_pretrained("bloomz-7b1")
    res = tokenizer("How to improve sleep quality?")
    inputs = {
        "input_ids": flow.tensor([res.input_ids]),
        "attention_mask": flow.tensor([res.attention_mask]),
    }
    
    sbp = dist.get_nd_sbp([flow.sbp.broadcast, flow.sbp.broadcast])
    placement = dist.get_layer_placement(0)
    
    loader = BlooMLoaderHuggerFace(BloomForCausalLM, cfg, "bloomz-7b1")
    model = loader.load()
    
    start_t = time.time()
    outputs = model.generate(
        inputs=inputs["input_ids"].to_global(sbp=sbp, placement=placement), max_length=128
    )
    end_t = time.time()
    if dist.is_main_process():
        print('model.generate: %s秒' % (end_t - start_t))
    
    res = tokenizer.decode(outputs[0])
    if dist.is_main_process():
        print(res)
yuguo's avatar
perf  
yuguo committed
129

yuguo960516's avatar
bloom  
yuguo960516 committed
130
131
132
133

输出:

```
yuguo960516yuguo's avatar
perf  
yuguo960516yuguo committed
134
>>>
yuguo960516's avatar
bloom  
yuguo960516 committed
135
136
137
How to improve sleep quality? keep your bedroom dark and quiet. Avoid electronics and bright lights. Keep your bedroom cool. Use a white noise machine. Use a humidifier. Use a diffuser. Use essential oils. Use a sleep aid. Try acupuncture. Try hypnotherapy. Try acupressure.</s>
```

yuguo960516yuguo's avatar
v1.0  
yuguo960516yuguo committed
138
## 历史版本
yuguo960516's avatar
bloom  
yuguo960516 committed
139

yuguo960516yuguo's avatar
v1.0  
yuguo960516yuguo committed
140
- https://developer.hpccube.com/codes/modelzoo/bloom_oneflow/-/tree/rel_v1.0
yuguo960516's avatar
bloom  
yuguo960516 committed
141
142
143
144

## 参考
* https://github.com/Oneflow-Inc/libai
* https://huggingface.co/bigscience/bloomz