"graphbolt/vscode:/vscode.git/clone" did not exist on "22a2513d0a85f973ce414755c1f26f5f8f12ef3b"
release-note.md 3.17 KB
Newer Older
Rick Ho's avatar
Rick Ho committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
## v1.0.0

### FasterMoE

* The new performance boosting features in the PPoPP'22 paper FasterMoE, detailed in the document.
	* Expert Shadowing.
	* Smart Scheduling.
	* Topology-aware gate.

### Bug fixes

* Transformer-XL examples.
* Compatibility to PyTorch versions.
* Megatron-LM documents.
* GShardGate.

Rick Ho's avatar
Rick Ho committed
17
18
19
20
## v0.3.0

### FMoE core

Rick Ho's avatar
Rick Ho committed
21
* Previous `mp_group` is renamed to `slice_group`, indicating that all workers in the group receive the same input batch, and process a slice of the input. `mp_group` will be deprecated in our next release.
Rick Ho's avatar
Rick Ho committed
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
* ROCm supported.
* `FMoELinear` is moved to a stand-alone file.

### Groupped data parallel

* Support any group name by their relative tag name.

###  Load balancing

* A brand new balancing strategy - SWIPE. Contributed by authors of a (currently unpublished) paper.
* A property `has_loss` is added to each gate, in order to identify whether balance loss should be collected.

### Megatron-LM support

* Experts are partitioned by tensor model parallelism in `mp_group`, instead of expert parallelism.
* Support arbitrary customized gate in `MegatronMLP`.
* Move the patches to a stand-alone file.

### Tests

* Move util functions into `test_ddp.py`.

Rick Ho's avatar
Rick Ho committed
44
45
46
47
48
49
## v0.2.1

## Load balancing

* Fix gradient for balance loss.

Rick Ho's avatar
Rick Ho committed
50
### Misc
Rick Ho's avatar
Rick Ho committed
51
52
53
54
55
56
57

* Typos.
* Update benchmark interface.
* Remove some redundant code for performance improvement.
* Enable `USE_NCCL` by default.
* Compatibility for PyTorch `<1.8.0` and `>=1.8.0`.

Rick Ho's avatar
Rick Ho committed
58
### Megatron adaption
Rick Ho's avatar
Rick Ho committed
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84

* Patch for numerical correctness of gradient clipping.
* Support to pipeline parallelism.

## v0.2.0

## Load balancing

* A brand new gate module with capacity-related utilities.
* GShard's and Switch Transformer's balance strategies are implemented as integrated gates.
* Balance loss is enabled.
* Balance monitor is provided.

## Checkpointing

* MoE models can be loaded and saved by fmoe's checkpointing module.

## Performance

* FP16 training performance is improved.

## Misc

* CUDA code directory is reconstructed.
* More tests are added.

Rick Ho's avatar
Rick Ho committed
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
## v0.1.2

### Compilation

- Remove dependency on the CUDA examples repository.

### Distributed

- Fix a bug related to PyTorch v1.8.0. FastMoE can now operate on multiple GPUs
on multiple nodes with PyTorch v1.8.0.

### Misc

- Fix tons of typos.
- Format the code.

Rick Ho's avatar
Rick Ho committed
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
## v0.1.1

### Distributed

- Broadcast data-parallel parameters before training.

### Megatron adaption

- Initialize `FMoELinear` parameters using different seed in model parallel even using the same random seed in megatron.
- Use proper comm for mp and dp.

### Transformer-XL example

- Improve scripts.

### Misc

- Logo and slack workspace link.
- Document in Chinese.
- Figures to explain how FastMoE works.

Rick Ho's avatar
Rick Ho committed
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
## v0.1.0

### Functions

- A model-injection-style easy-to-use user interface for Megatron-LM. 
- Support both data parallel and model parallel, and a hybrid of the two,
- Provide a new customized DDP module to synchronize in different comm groups.
- Support to customized `nn.Module` as an expert.

### Document and infrastructure

- Use PyTest.
- Setup PyLint.
- Installation and usage guide.
- Explanation of functions and code structure in code.

### Performance

- A benchmark to compare FastMoE and old PyTorch impl.