Commits · e094933da1d0a574eda105ab6ec0f171d8ddaebb · OpenDAS / ColossalAI

15 Apr, 2024 1 commit

[doc] fix ColossalMoE readme (#5599) · 89049b0d

Camille Zhong authored Apr 15, 2024

* fix readme

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

89049b0d

08 Apr, 2024 1 commit

[devops] remove post commit ci (#5566) · 641b1ee7

Hongxin Liu authored Apr 08, 2024

* [devops] remove post commit ci

* [misc] run pre-commit on all files

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

641b1ee7

07 Apr, 2024 1 commit
- [fix] fix typo s/muiti-node /multi-node etc. (#5448) · a799ca34
  digger yu authored Apr 07, 2024
  
  a799ca34
01 Apr, 2024 1 commit

[shardformer, pipeline] add `gradient_checkpointing_ratio` and heterogenous... · e614aa34

Wenhao Chen authored Apr 01, 2024

[shardformer, pipeline] add `gradient_checkpointing_ratio` and heterogenous shard policy for llama (#5508)

* feat: add `GradientCheckpointConfig` and `PipelineGradientCheckpointConfig`

* feat: apply `GradientCheckpointConfig` to policy and llama_forward

* feat: move `distribute_layer` and `get_stage_index` to PipelineStageManager

* fix: add optional args for `distribute_layer` and `get_stage_index`

* fix: fix changed API calls

* test: update llama tests

* style: polish `GradientCheckpointConfig`

* fix: fix pipeline utils tests

e614aa34

29 Mar, 2024 1 commit

[ColossalChat] Update RLHF V2 (#5286) · df5e9c53

YeAnbang authored Mar 29, 2024



* Add dpo. Fix sft, ppo, lora. Refactor all

* fix and tested ppo

* 2 nd round refactor

* add ci tests

* fix ci

* fix ci

* fix readme, style

* fix readme style

* fix style, fix benchmark

* reproduce benchmark result, remove useless files

* rename to ColossalChat

* use new image

* fix ci workflow

* fix ci

* use local model/tokenizer for ci tests

* fix ci

* fix ci

* fix ci

* fix ci timeout

* fix rm progress bar. fix ci timeout

* fix ci

* fix ci typo

* remove 3d plugin from ci temporary

* test environment

* cannot save optimizer

* support chat template

* fix readme

* fix path

* test ci locally

* restore build_or_pr

* fix ci data path

* fix benchmark

* fix ci, move ci tests to 3080, disable fast tokenizer

* move ci to 85

* support flash attention 2

* add all-in-one data preparation script. Fix colossal-llama2-chat chat template

* add hardware requirements

* move ci test data

* fix save_model, add unwrap

* fix missing bos

* fix missing bos; support grad accumulation with gemini

* fix ci

* fix ci

* fix ci

* fix llama2 chat template config

* debug sft

* debug sft

* fix colossalai version requirement

* fix ci

* add sanity check to prevent NaN loss

* fix requirements

* add dummy data generation script

* add dummy data generation script

* add dummy data generation script

* add dummy data generation script

* update readme

* update readme

* update readme and ignore

* fix logger bug

* support parallel_output

* modify data preparation logic

* fix tokenization

* update lr

* fix inference

* run pre-commit

---------
Co-authored-by: Tong Li <tong.li352711588@gmail.com>

df5e9c53

27 Mar, 2024 1 commit

[shardformer] fix pipeline forward error if custom layer distribution is used (#5189) · 00525f77

Insu Jang authored Mar 27, 2024



* Use self.[distribute_layers|get_stage_index] to exploit custom layer distribution

* Change static methods for t5 layer distribution to member functions

* Change static methods for whisper layer distribution to member functions

* Replace whisper policy usage with self one

* Fix test case to use non-static layer distribution methods

* fix: fix typo

---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>

00525f77

25 Mar, 2024 1 commit

[hotfix] set return_outputs=False in examples and polish code (#5404) · bb0a668f

Wenhao Chen authored Mar 25, 2024

* fix: simplify merge_batch

* fix: use return_outputs=False to eliminate extra memory consumption

* feat: add return_outputs warning

* style: remove `return_outputs=False` as it is the default value

bb0a668f

20 Mar, 2024 1 commit

[doc] update open-sora demo (#5479) · d158fc0e

binmakeswell authored Mar 20, 2024

* [doc] update open-sora demo

* [doc] update open-sora demo

* [doc] update open-sora demo

d158fc0e

12 Mar, 2024 1 commit
- [hotfix] fix typo s/keywrods/keywords etc. (#5429) · 385e85af
  digger yu authored Mar 12, 2024
  
  385e85af
11 Mar, 2024 1 commit
- fix tensor data update for gemini loss caluculation (#5442) · da885ed5
  Camille Zhong authored Mar 11, 2024
  
  da885ed5
07 Mar, 2024 1 commit
- [colossal-llama2] add stream chat examlple for chat version model (#5428) · 743e7fad
  Camille Zhong authored Mar 07, 2024
```
* add stream chat for chat version

* remove os.system clear

* modify function name
```
  743e7fad
05 Mar, 2024 3 commits
- [doc] Fix typo s/infered/inferred/ (#5288) · c8003d46
  hugo-syn authored Mar 05, 2024
```
Signed-off-by: hugo-syn <hugo.vincent@synacktiv.com>
```
  c8003d46
- [eval-hotfix] set few_shot_data to None when few shot is disabled (#5422) · a7ae2b5b
  Dongruixuan Li authored Mar 05, 2024
  
  a7ae2b5b
- [doc] sora release (#5425) · 822241a9
  binmakeswell authored Mar 05, 2024
```
* [doc] sora release

* [doc] sora release

* [doc] sora release

* [doc] sora release
```
  822241a9
01 Mar, 2024 1 commit
- fix sft single turn inference example (#5416) · 4b8312c0
  Camille Zhong authored Mar 01, 2024
  
  4b8312c0
28 Feb, 2024 1 commit
- update requirements (#5407) · a28c9715
  Tong Li authored Feb 28, 2024
  
  a28c9715
19 Feb, 2024 2 commits
- [hotfix] fix variable type for top_p (#5313) · b833153f
  CZYCW authored Feb 19, 2024
```
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
```
  b833153f
- [llama] fix training and inference scripts (#5384) · 73038018
  Hongxin Liu authored Feb 19, 2024
```
* [llama] refactor inference example to fit sft

* [llama] fix training script to fit gemini

* [llama] fix inference script
```
  73038018
07 Feb, 2024 6 commits

[moe] fix mixtral optim checkpoint (#5344) · 65e5d6ba
Hongxin Liu authored Feb 01, 2024

65e5d6ba
[moe] fix mixtral forward default value (#5329) · 956b561b
Hongxin Liu authored Jan 30, 2024

956b561b
[moe] fix mixtral checkpoint io (#5314) · b60be18d
Hongxin Liu authored Jan 27, 2024

b60be18d

[moe] support mixtral (#5309) · da39d21b

Hongxin Liu authored Jan 25, 2024

* [moe] add mixtral block for single expert

* [moe] mixtral block fwd support uneven ep

* [moe] mixtral block bwd support uneven ep

* [moe] add mixtral moe layer

* [moe] simplify replace

* [meo] support save sharded mixtral

* [meo] support load sharded mixtral

* [meo] support save sharded optim

* [meo] integrate moe manager into plug

* [meo] fix optimizer load

* [meo] fix mixtral layer

da39d21b

[moe] update capacity computing (#5253) · c904d2ae

Hongxin Liu authored Jan 11, 2024

* [moe] top2 allow uneven input

* [moe] update capacity computing

* [moe] remove debug info

* [moe] update capacity computing

* [moe] update capacity computing

c904d2ae

[moe] init mixtral impl · 7d8e0338
Xuanlei Zhao authored Dec 14, 2023

7d8e0338

06 Feb, 2024 3 commits
- [llama] fix memory issue (#5371) · 084c9124
  Hongxin Liu authored Feb 06, 2024
```
* [llama] fix memory issue

* [llama] add comment
```
  084c9124
- [llama] polish training script and fix optim ckpt (#5368) · eb4f2d90
  Hongxin Liu authored Feb 06, 2024
  
  eb4f2d90
- [eval] update llama npu eval (#5366) · a5756a87
  Camille Zhong authored Feb 06, 2024
  
  a5756a87
05 Feb, 2024 4 commits
- [llama] fix neftune & pbar with start_step (#5364) · 44ca61a2
  Camille Zhong authored Feb 05, 2024
  
  44ca61a2
- [llama] add flash attn patch for npu (#5362) · a4cec171
  Hongxin Liu authored Feb 05, 2024
  
  a4cec171
- [llama] update training script (#5360) · 73f9f23f
  Hongxin Liu authored Feb 05, 2024
```
* [llama] update training script

* [doc] polish docstr
```
  73f9f23f
- [llama] fix dataloader for hybrid parallel (#5358) · 6c0fa7b9
  Hongxin Liu authored Feb 05, 2024
```
* [plugin] refactor prepare dataloader

* [plugin] update train script
```
  6c0fa7b9
01 Feb, 2024 1 commit
- [Chat] fix sft loss nan (#5345) · c5239840
  YeAnbang authored Feb 01, 2024
```
* fix script

* fix script

* fix chat nan

* fix chat nan
```
  c5239840
25 Jan, 2024 1 commit

[NFC] polish... · ec912b1b

李文军 authored Jan 25, 2024

[NFC] polish applications/Colossal-LLaMA-2/colossal_llama2/tokenizer/init_tokenizer.py code style (#5228)

ec912b1b

22 Jan, 2024 1 commit
- fix bug for mefture (#5299) · ddf879e2
  Desperado-Jia authored Jan 22, 2024
  
  ddf879e2
18 Jan, 2024 1 commit
- fix auto loading gpt2 tokenizer (#5279) · 32cb7449
  Michelle authored Jan 18, 2024
  
  32cb7449
11 Jan, 2024 1 commit
- fix typo in applications/ColossalEval/README.md (#5250) · 756c400a
  digger yu authored Jan 11, 2024
  
  756c400a
10 Jan, 2024 1 commit
- [doc] fix typo in Colossal-LLaMA-2/README.md (#5247) · 41e52c1c
  digger yu authored Jan 10, 2024
  
  41e52c1c
09 Jan, 2024 1 commit

[npu] change device to accelerator api (#5239) · d202cc28

Hongxin Liu authored Jan 09, 2024



* update accelerator

* fix timer

* fix amp

* update

* fix

* update bug

* add error raise

* fix autocast

* fix set device

* remove doc accelerator

* update doc

* update doc

* update doc

* use nullcontext

* update cpu

* update null context

* change time limit for example

* udpate

* update

* update

* update

* [npu] polish accelerator code

---------
Co-authored-by: Xuanlei Zhao <xuanlei.zhao@gmail.com>
Co-authored-by: zxl <43881818+oahzxl@users.noreply.github.com>

d202cc28

08 Jan, 2024 1 commit

[doc] SwiftInfer release (#5236) · 7bc6969c

binmakeswell authored Jan 08, 2024

* [doc] SwiftInfer release

* [doc] SwiftInfer release

* [doc] SwiftInfer release

* [doc] SwiftInfer release

* [doc] SwiftInfer release

7bc6969c

07 Jan, 2024 1 commit
- [format] applied code formatting on changed files in pull request 5234 (#5235) · 4fb4a22a
  github-actions[bot] authored Jan 07, 2024
```
Co-authored-by: github-actions <github-actions@github.com>
```
  4fb4a22a