Commits · 89049b0d899477a3b31f02b31fde1a839e31c6fc · OpenDAS / ColossalAI

"extensions/csrc/cuda/multi_tensor_sgd_kernel.cu" did not exist on "0772828fba9b9dfc07e5e319b324642ecb0455e9"

15 Apr, 2024 1 commit

[doc] fix ColossalMoE readme (#5599) · 89049b0d

Camille Zhong authored Apr 15, 2024

* fix readme

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

89049b0d

08 Apr, 2024 1 commit

[devops] remove post commit ci (#5566) · 641b1ee7

Hongxin Liu authored Apr 08, 2024

* [devops] remove post commit ci

* [misc] run pre-commit on all files

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

641b1ee7

07 Apr, 2024 1 commit
- [fix] fix typo s/muiti-node /multi-node etc. (#5448) · a799ca34
  digger yu authored Apr 07, 2024
  
  a799ca34
01 Apr, 2024 1 commit

[shardformer, pipeline] add `gradient_checkpointing_ratio` and heterogenous... · e614aa34

Wenhao Chen authored Apr 01, 2024

[shardformer, pipeline] add `gradient_checkpointing_ratio` and heterogenous shard policy for llama (#5508)

* feat: add `GradientCheckpointConfig` and `PipelineGradientCheckpointConfig`

* feat: apply `GradientCheckpointConfig` to policy and llama_forward

* feat: move `distribute_layer` and `get_stage_index` to PipelineStageManager

* fix: add optional args for `distribute_layer` and `get_stage_index`

* fix: fix changed API calls

* test: update llama tests

* style: polish `GradientCheckpointConfig`

* fix: fix pipeline utils tests

e614aa34

27 Mar, 2024 1 commit

[shardformer] fix pipeline forward error if custom layer distribution is used (#5189) · 00525f77

Insu Jang authored Mar 27, 2024



* Use self.[distribute_layers|get_stage_index] to exploit custom layer distribution

* Change static methods for t5 layer distribution to member functions

* Change static methods for whisper layer distribution to member functions

* Replace whisper policy usage with self one

* Fix test case to use non-static layer distribution methods

* fix: fix typo

---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>

00525f77

25 Mar, 2024 1 commit

[hotfix] set return_outputs=False in examples and polish code (#5404) · bb0a668f

Wenhao Chen authored Mar 25, 2024

* fix: simplify merge_batch

* fix: use return_outputs=False to eliminate extra memory consumption

* feat: add return_outputs warning

* style: remove `return_outputs=False` as it is the default value

bb0a668f

07 Feb, 2024 6 commits

[moe] fix mixtral optim checkpoint (#5344) · 65e5d6ba
Hongxin Liu authored Feb 01, 2024

65e5d6ba
[moe] fix mixtral forward default value (#5329) · 956b561b
Hongxin Liu authored Jan 30, 2024

956b561b
[moe] fix mixtral checkpoint io (#5314) · b60be18d
Hongxin Liu authored Jan 27, 2024

b60be18d

[moe] support mixtral (#5309) · da39d21b

Hongxin Liu authored Jan 25, 2024

* [moe] add mixtral block for single expert

* [moe] mixtral block fwd support uneven ep

* [moe] mixtral block bwd support uneven ep

* [moe] add mixtral moe layer

* [moe] simplify replace

* [meo] support save sharded mixtral

* [meo] support load sharded mixtral

* [meo] support save sharded optim

* [meo] integrate moe manager into plug

* [meo] fix optimizer load

* [meo] fix mixtral layer

da39d21b

[moe] update capacity computing (#5253) · c904d2ae

Hongxin Liu authored Jan 11, 2024

* [moe] top2 allow uneven input

* [moe] update capacity computing

* [moe] remove debug info

* [moe] update capacity computing

* [moe] update capacity computing

c904d2ae

[moe] init mixtral impl · 7d8e0338
Xuanlei Zhao authored Dec 14, 2023

7d8e0338