Merge pull request #86 from laekov/v0.3.0-rc

v0.3.0 Release

Merge pull request #86 from laekov/v0.3.0-rc
v0.3.0 Release
acf8bec8 · Rick Ho · GitHub · 3397bc19 · a461be6c · acf8bec8
Unverified Commit acf8bec8 authored Nov 08, 2021 by Rick Ho Committed by GitHub Nov 08, 2021
Show whitespace changes
Inline Side-by-side

Showing with 50 additions and 22 deletions

cuda/utils/cublas_wrapper.h cuda/utils/cublas_wrapper.h +10 -10

cuda/utils/helper_cuda.h cuda/utils/helper_cuda.h +11 -10

doc/release-note.md doc/release-note.md +29 -2

No files found.
--- a/cuda/utils/cublas_wrapper.h
+++ b/cuda/utils/cublas_wrapper.h
--- a/cuda/utils/helper_cuda.h
+++ b/cuda/utils/helper_cuda.h
@@ -627,3 +627,4 @@ void check(T result, char const *const func, const char *const file,
 #define checkCudaErrors(val) check((val), #val, __FILE__, __LINE__)
 #endif  // HELPER_CUDA_H
--- a/doc/release-note.md
+++ b/doc/release-note.md
+## v0.3.0
+### FMoE core
+* Previous `mp_group` is renamed to `slice_group`, indicating that all workers in the group receive the same input batch, and process a slice of the input. `mp_group` will be deprecated in our next release.
+* ROCm supported.
+* `FMoELinear` is moved to a stand-alone file.
+### Groupped data parallel
+* Support any group name by their relative tag name.
+###  Load balancing
+* A brand new balancing strategy - SWIPE. Contributed by authors of a (currently unpublished) paper.
+* A property `has_loss` is added to each gate, in order to identify whether balance loss should be collected.
+### Megatron-LM support
+* Experts are partitioned by tensor model parallelism in `mp_group`, instead of expert parallelism.
+* Support arbitrary customized gate in `MegatronMLP`.
+* Move the patches to a stand-alone file.
+### Tests
+* Move util functions into `test_ddp.py`.
 ## v0.2.1
 ## Load balancing
 * Fix gradient for balance loss.
-## Misc
+### Misc
 * Typos.
 * Update benchmark interface.
@@ -12,7 +39,7 @@
 * Enable `USE_NCCL` by default.
 * Compatibility for PyTorch `<1.8.0` and `>=1.8.0`.
-## Megatron adaption
+### Megatron adaption
 * Patch for numerical correctness of gradient clipping.
 * Support to pipeline parallelism.