Commits · da39d21b71b79462a0f922a3cb8ca480a06743ed · OpenDAS / ColossalAI

07 Feb, 2024 2 commits

[moe] support mixtral (#5309) · da39d21b

Hongxin Liu authored Jan 25, 2024

* [moe] add mixtral block for single expert

* [moe] mixtral block fwd support uneven ep

* [moe] mixtral block bwd support uneven ep

* [moe] add mixtral moe layer

* [moe] simplify replace

* [meo] support save sharded mixtral

* [meo] support load sharded mixtral

* [meo] support save sharded optim

* [meo] integrate moe manager into plug

* [meo] fix optimizer load

* [meo] fix mixtral layer

da39d21b

[moe] init mixtral impl · 7d8e0338
Xuanlei Zhao authored Dec 14, 2023

7d8e0338

09 Jan, 2024 1 commit

[npu] change device to accelerator api (#5239) · d202cc28

Hongxin Liu authored Jan 09, 2024



* update accelerator

* fix timer

* fix amp

* update

* fix

* update bug

* add error raise

* fix autocast

* fix set device

* remove doc accelerator

* update doc

* update doc

* update doc

* use nullcontext

* update cpu

* update null context

* change time limit for example

* udpate

* update

* update

* update

* [npu] polish accelerator code

---------
Co-authored-by: Xuanlei Zhao <xuanlei.zhao@gmail.com>
Co-authored-by: zxl <43881818+oahzxl@users.noreply.github.com>

d202cc28

09 Nov, 2023 1 commit

[moe]: fix ep/tp tests, add hierarchical all2all (#4982) · 72444127

Wenhao Chen authored Nov 09, 2023

* fix: add warning for EP different behavior

* fix: use shard_data in ep & tp model

* to: add used_capacity

* fix: fix router test

* feat: add create_ep_node_group

* feat: add create_ep_hierarchical_group fn

* feat: add HierarchicalAllToAll

* test: add hierarchical all2all test

* fix: fix test errors

* fix: simplify create_ep_hierarchical_group

* fix: add hierarchical_alltoall arg

* fix: fix environ typo

* revert: revert process mesh order

* to: add todo mark

* fix: skip hierarchical_comm if torch < 1.13.1

72444127

08 Nov, 2023 1 commit

[moe] support optimizer checkpoint (#5015) · f71e63b0

Xuanlei Zhao authored Nov 08, 2023

* Refactor MoE Manager setup method

* unshard optim ckpt

* optim io

* update transformer version

* update requirements

* update ckpt

* update ckpt

* update ckpt

* fix engine

* fix engine

f71e63b0

02 Nov, 2023 1 commit
- [moe] merge moe into main (#4978) · dc003c30
  Xuanlei Zhao authored Nov 02, 2023
```
* update moe module
* support openmoe
```
  dc003c30