Commits · e304e4db354906493f9e22866b8fcee5f403a829 · OpenDAS / ColossalAI

05 Mar, 2024 3 commits
- [hotfix] fix sd vit import error (#5420) · e304e4db
  MickeyCHAN authored Mar 05, 2024
```
* fix import error

* Update dpt_depth.py

---------
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
```
  e304e4db
- [devops] fix extention building (#5427) · 070df689
  Hongxin Liu authored Mar 05, 2024
  
  070df689
- [doc] sora release (#5425) · 822241a9
  binmakeswell authored Mar 05, 2024
```
* [doc] sora release

* [doc] sora release

* [doc] sora release

* [doc] sora release
```
  822241a9
04 Mar, 2024 1 commit

[example]add gpt2 benchmark example script. (#5295) · 29695cf7

flybird11111 authored Mar 04, 2024



* benchmark gpt2

* fix

fix

fix

fix

* [doc] fix typo in Colossal-LLaMA-2/README.md (#5247)

* [workflow] fixed build CI (#5240)

* [workflow] fixed build CI

* polish

* polish

* polish

* polish

* polish

* [ci] fixed booster test (#5251)

* [ci] fixed booster test

* [ci] fixed booster test

* [ci] fixed booster test

* [ci] fixed ddp test (#5254)

* [ci] fixed ddp test

* polish

* fix typo in  applications/ColossalEval/README.md (#5250)

* [ci] fix shardformer tests. (#5255)

* fix ci

fix

* revert: revert p2p

* feat: add enable_metadata_cache option

* revert: enable t5 tests

---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>

* [doc] fix doc typo (#5256)

* [doc] fix annotation display

* [doc] fix llama2 doc

* [hotfix]: add pp sanity check and fix mbs arg (#5268)

* fix: fix misleading mbs arg

* feat: add pp sanity check

* fix: fix 1f1b sanity check

* [workflow] fixed incomplete bash command (#5272)

* [workflow] fixed oom tests (#5275)

* [workflow] fixed oom tests

* polish

* polish

* polish

* [ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (#5276)

* fix ci

fix

* fix test

* revert: revert p2p

* feat: add enable_metadata_cache option

* revert: enable t5 tests

* fix

---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>

* [shardformer] hybridparallelplugin support gradients accumulation. (#5246)

* support gradients acc

fix

fix

fix

fix

fix

fix

fix

fix

fix

fix

fix

fix

fix

* fix

fix

* fix

fix

fix

* [hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230)

* fix auto loading gpt2 tokenizer (#5279)

* [doc] add llama2-13B disyplay (#5285)

* Update README.md

* fix 13b typo

---------
Co-authored-by: binmakeswell <binmakeswell@gmail.com>

* fix llama pretrain (#5287)

* fix

* fix

* fix

fix

* fix

fix

fix

* fix

fix

* benchmark gpt2

* fix

fix

fix

fix

* [workflow] fixed build CI (#5240)

* [workflow] fixed build CI

* polish

* polish

* polish

* polish

* polish

* [ci] fixed booster test (#5251)

* [ci] fixed booster test

* [ci] fixed booster test

* [ci] fixed booster test

* fix

fix

* fix

fix

fix

* fix

* fix

fix

fix

fix

fix

* fix

* Update shardformer.py

---------
Co-authored-by: digger yu <digger-yu@outlook.com>
Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: Wenhao Chen <cwher@outlook.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
Co-authored-by: Zhongkai Zhao <kanezz620@gmail.com>
Co-authored-by: Michelle <97082656+MichelleMa8@users.noreply.github.com>
Co-authored-by: Desperado-Jia <502205863@qq.com>

29695cf7

01 Mar, 2024 1 commit
- fix sft single turn inference example (#5416) · 4b8312c0
  Camille Zhong authored Mar 01, 2024
  
  4b8312c0
29 Feb, 2024 3 commits
- [doc] fix blog link · a1c6cdb1
  binmakeswell authored Feb 29, 2024
  
  a1c6cdb1
- [doc] fix blog link · 5de940de
  binmakeswell authored Feb 29, 2024
  
  5de940de
- [workflow] added pypi channel (#5412) · 2461f378
  Frank Lee authored Feb 29, 2024
  
  2461f378
28 Feb, 2024 1 commit
- update requirements (#5407) · a28c9715
  Tong Li authored Feb 28, 2024
  
  a28c9715
27 Feb, 2024 4 commits
- [shardformer]gather llama logits (#5398) · 0a25e16e
  flybird11111 authored Feb 27, 2024
```
* gather llama logits

* fix
```
  0a25e16e
- [setup] fixed nightly release (#5388) · dcdd8a5e
  Frank Lee authored Feb 27, 2024
  
  dcdd8a5e
- [fsdp] impl save/load shard model/optimizer (#5357) · bf34c6fe
  QinLuo authored Feb 27, 2024
  
  bf34c6fe
- [example] reuse flash attn patch (#5400) · d882d18c
  Hongxin Liu authored Feb 27, 2024
  
  d882d18c
26 Feb, 2024 1 commit
- [extension] hotfix jit extension setup (#5402) · 95c21e39
  Hongxin Liu authored Feb 26, 2024
  
  95c21e39
20 Feb, 2024 1 commit
- [hotfix] Fix wrong import in meta_registry (#5392) · 5d380a1a
  Stephan Kölker authored Feb 20, 2024
  
  5d380a1a
19 Feb, 2024 4 commits
- [hotfix] fix variable type for top_p (#5313) · b833153f
  CZYCW authored Feb 19, 2024
```
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
```
  b833153f
- [doc] updated installation command (#5389) · 705a62a5
  Frank Lee authored Feb 19, 2024
  
  705a62a5
- [doc] Fix typo (#5361) · 69e3ad01
  yixiaoer authored Feb 19, 2024
  
  69e3ad01
- [llama] fix training and inference scripts (#5384) · 73038018
  Hongxin Liu authored Feb 19, 2024
```
* [llama] refactor inference example to fit sft

* [llama] fix training script to fit gemini

* [llama] fix inference script
```
  73038018
08 Feb, 2024 4 commits
- [release] update version (#5380) · adae123d
  Hongxin Liu authored Feb 08, 2024
  
  adae123d
- Merge pull request #5372 from hpcaitech/exp/mixtral · efef43b5
  Frank Lee authored Feb 08, 2024
  
  efef43b5
- Merge pull request #5377 from hpcaitech/example/llama-npu · 4c03347f
  Frank Lee authored Feb 08, 2024
```
[llama] support npu for Colossal-LLaMA-2
```
  4c03347f
- [moe] fix tests · 06db94fb
  ver217 authored Feb 08, 2024
  
  06db94fb
07 Feb, 2024 6 commits

[moe] fix mixtral optim checkpoint (#5344) · 65e5d6ba
Hongxin Liu authored Feb 01, 2024

65e5d6ba
[moe] fix mixtral forward default value (#5329) · 956b561b
Hongxin Liu authored Jan 30, 2024

956b561b
[moe] fix mixtral checkpoint io (#5314) · b60be18d
Hongxin Liu authored Jan 27, 2024

b60be18d

[moe] support mixtral (#5309) · da39d21b

Hongxin Liu authored Jan 25, 2024

* [moe] add mixtral block for single expert

* [moe] mixtral block fwd support uneven ep

* [moe] mixtral block bwd support uneven ep

* [moe] add mixtral moe layer

* [moe] simplify replace

* [meo] support save sharded mixtral

* [meo] support load sharded mixtral

* [meo] support save sharded optim

* [meo] integrate moe manager into plug

* [meo] fix optimizer load

* [meo] fix mixtral layer

da39d21b

[moe] update capacity computing (#5253) · c904d2ae

Hongxin Liu authored Jan 11, 2024

* [moe] top2 allow uneven input

* [moe] update capacity computing

* [moe] remove debug info

* [moe] update capacity computing

* [moe] update capacity computing

c904d2ae

[moe] init mixtral impl · 7d8e0338
Xuanlei Zhao authored Dec 14, 2023

7d8e0338

06 Feb, 2024 4 commits
- [llama] fix memory issue (#5371) · 084c9124
  Hongxin Liu authored Feb 06, 2024
```
* [llama] fix memory issue

* [llama] add comment
```
  084c9124
- [lr-scheduler] fix load state dict and add test (#5369) · c53ddda8
  Hongxin Liu authored Feb 06, 2024
  
  c53ddda8
- [llama] polish training script and fix optim ckpt (#5368) · eb4f2d90
  Hongxin Liu authored Feb 06, 2024
  
  eb4f2d90
- [eval] update llama npu eval (#5366) · a5756a87
  Camille Zhong authored Feb 06, 2024
  
  a5756a87
05 Feb, 2024 4 commits
- [llama] fix neftune & pbar with start_step (#5364) · 44ca61a2
  Camille Zhong authored Feb 05, 2024
  
  44ca61a2
- [llama] add flash attn patch for npu (#5362) · a4cec171
  Hongxin Liu authored Feb 05, 2024
  
  a4cec171
- [llama] update training script (#5360) · 73f9f23f
  Hongxin Liu authored Feb 05, 2024
```
* [llama] update training script

* [doc] polish docstr
```
  73f9f23f
- [llama] fix dataloader for hybrid parallel (#5358) · 6c0fa7b9
  Hongxin Liu authored Feb 05, 2024
```
* [plugin] refactor prepare dataloader

* [plugin] update train script
```
  6c0fa7b9
04 Feb, 2024 1 commit
- [gemini] fix param op hook when output is tuple (#5355) · 2dd01e3a
  Hongxin Liu authored Feb 04, 2024
```
* [gemini] fix param op hook when output is tuple

* [gemini] fix param op hook
```
  2dd01e3a
02 Feb, 2024 1 commit
- [fix] remove unnecessary dp_size assert (#5351) · 1c790c08
  Wenhao Chen authored Feb 02, 2024
```
* fix: remove unnecessary assert

* test: add more 3d plugin tests

* fix: add warning
```
  1c790c08
01 Feb, 2024 1 commit

[checkpointio] fix gemini and hybrid parallel optim checkpoint (#5347) · ffffc32d

Hongxin Liu authored Feb 01, 2024

* [checkpointio] fix hybrid parallel optim checkpoint

* [extension] fix cuda extension

* [checkpointio] fix gemini optimizer checkpoint

* polish code

ffffc32d