README.md 4.94 KB
Newer Older
1
2
:warning: **This content may be outdated since the major update of Colossal Chat. We will update this content soon.**

3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# Distributed PPO Training on Stage 3

## Detach Experience Makers and Trainers

We can completely separate the trainers and makers.

<p align="center">
<img src="https://github.com/hpcaitech/public_assets/blob/main/applications/chat/basic_structure.png?raw=true" width=600/>
</p>

- The experience maker performs inference, produces experience, and remotely delivers it to the trainer (1).
- The trainer consumes experience to train models, and periodically transmits new model parameters to the maker (2.1, 2.2).
- Using an experience buffer to overlap transmission and computing.

In this manner, each node will work continuously without model idle time, and different optimization strategies can be applied for inference and training to meet the needs of speed or storage. It is also helpful for scalability.

`DetachedPPOTrainer` and `ExperienceMakerHolder` are Ray Actors (distinguished from Actor Model), representing Trainer and Experience Maker on the graph above, respectively.

[More about Ray Core](https://docs.ray.io/en/latest/ray-core/walkthrough.html)

## Usage

See examples at `ColossalAI/application/Chat/examples/ray`

### Setup Makers

- define makers' environment variables :

31
32
33
34
35
36
37
38
39
40
  ```python
  env_info_makers = [{
      'local_rank': '0',
      'rank': str(rank),
      'world_size': str(num_makers),
      'master_port': maker_port,
      'master_addr': master_addr
  } for rank in range(num_makers)]

  ```
41
42

- define maker models :
43
44
45
46
47
48
49
50
51
52
53

  ```python
  def model_fn():
      actor = get_actor_from_args(...)
      critic = get_critic_from_args(...)
      reward_model = get_reward_model_from_args(...)
      initial_model = get_actor_from_args(...)
      return actor, critic, reward_model, initial_model

  ```

54
55
- set experience_holder_refs :

56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
  ```python
  experience_holder_refs = [
      ExperienceMakerHolder.options(
          name=f"maker_{i}",
          num_gpus=1,
          max_concurrency=2
      ).remote(
          detached_trainer_name_list=[f"trainer_{x}" for x in target_trainers(...)],
          model_fn=model_fn,
          ...)
      for i, env_info_maker in enumerate(env_info_makers)
  ]
  ```

  The names in the `detached_trainer_name_list` refer to the target trainers that the maker should send experience to.
  We set a trainer's name the same as a maker, by `.options(name="str")`. See below.
72
73
74
75

### Setup Trainers

- define trainers' environment variables :
76
77
78
79
80
81
82
83
84
  ```python
  env_info_trainers = [{
      'local_rank': '0',
      'rank': str(rank),
      'world_size': str(num_trainers),
      'master_port': trainer_port,
      'master_addr': master_addr
  } for rank in range(num_trainers)]
  ```
85
86
- define trainer models :

87
88
89
90
91
92
93
  ```python
  def trainer_model_fn():
      actor = get_actor_from_args(...)
      critic = get_critic_from_args(...)
      return actor, critic
  ```

94
- set trainer_refs :
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
  ```python
  trainer_refs = [
      DetachedPPOTrainer.options(
          name=f"trainer{i}",
          num_gpus=1,
          max_concurrency=2
      ).remote(
          experience_maker_holder_name_list=[f"maker{x}" for x in target_makers(...)],
          model_fn = trainer_model_fn(),
          ...)
      for i, env_info_trainer in enumerate(env_info_trainers)
  ]
  ```
  The names in `experience_maker_holder_name_list` refer to the target makers that the trainer should send updated models to.
  By setting `detached_trainer_name_list` and `experience_maker_holder_name_list`, we can customize the transmission graph.
110
111

### Launch Jobs
112

113
114
- define data_loader :

115
116
117
118
119
120
  ```python
  def data_loader_fn():
      return = torch.utils.data.DataLoader(dataset=dataset)

  ```

121
122
- launch makers :

123
124
125
126
127
128
129
130
  ```python
  wait_tasks = []
  for experience_holder_ref in experience_holder_refs:
      wait_tasks.append(
          experience_holder_ref.workingloop.remote(data_loader_fn(),
                                                   num_steps=experience_steps))

  ```
131
132

- launch trainers :
133
134
135
136
137

  ```python
  for trainer_ref in trainer_refs:
      wait_tasks.append(trainer_ref.fit.remote(total_steps, update_steps, train_epochs))
  ```
138
139

- wait for done :
140
141
142
  ```python
  ray.get(wait_tasks)
  ```
143
144
145
146
147
148

## Flexible Structure

We can deploy different strategies to makers and trainers. Here are some notions.

### 2 Makers 1 Trainer
149

150
151
152
153
154
<p align="center">
<img src="https://github.com/hpcaitech/public_assets/blob/main/applications/chat/2m1t.png?raw=true" width=600/>
</p>

### 2 Makers 2 Trainer
155

156
157
158
159
160
<p align="center">
<img src="https://github.com/hpcaitech/public_assets/blob/main/applications/chat/2m2t.png?raw=true" width=600/>
</p>

### Maker Inference Quantization
161

162
163
164
165
166
167
168
169
170
171
172
173
174
175
<p align="center">
<img src="https://github.com/hpcaitech/public_assets/blob/main/applications/chat/2m2t_quantize.png?raw=true" width=600/>
</p>

### Tensor Parallel

<p align="center">
<img src="https://github.com/hpcaitech/public_assets/blob/main/applications/chat/tp_ddp_hybrid.png?raw=true" width=600/>
</p>

## TODO

- [ ] Support LoRA
- [ ] Support TP & PP