Commits · 0688d92e2d7b65bf0092d54cf260a8a989784f6f · OpenDAS / ColossalAI

25 Mar, 2024 3 commits

[shardformer]Fix lm parallel. (#5480) · 0688d92e

flybird11111 authored Mar 25, 2024

* fix

* padding vocab_size when using pipeline parallellism

padding vocab_size when using pipeline parallellism

fix

fix

* fix

* fix

fix

fix

* fix gather output

* fix

* fix

* fix

fix resize embedding

fix resize embedding

* fix resize embedding

fix

* revert

* revert

* revert

* fix lm forward distribution

* fix

* test ci

* fix

0688d92e

[release] grok-1 inference benchmark (#5500) · 34e90925

binmakeswell authored Mar 25, 2024

* [release] grok-1 inference benchmark

* [release] grok-1 inference benchmark

* [release] grok-1 inference benchmark

* [release] grok-1 inference benchmark

* [release] grok-1 inference benchmark

34e90925

[hotfix] set return_outputs=False in examples and polish code (#5404) · bb0a668f

Wenhao Chen authored Mar 25, 2024

* fix: simplify merge_batch

* fix: use return_outputs=False to eliminate extra memory consumption

* feat: add return_outputs warning

* style: remove `return_outputs=False` as it is the default value

bb0a668f

24 Mar, 2024 1 commit

[example] update Grok-1 inference (#5495) · 5fcd7795

Yuanheng Zhao authored Mar 24, 2024

* revise grok-1 example

* remove unused arg in scripts

* prevent re-installing torch

* update readme

* revert modifying colossalai requirements

* add perf

* trivial

* add tokenizer url

5fcd7795

22 Mar, 2024 1 commit
- [release] grok-1 314b inference (#5490) · 6df844b8
  binmakeswell authored Mar 22, 2024
```
* [release] grok-1 inference

* [release] grok-1 inference

* [release] grok-1 inference
```
  6df844b8
21 Mar, 2024 1 commit

[example] add grok-1 inference (#5485) · 848a574c

Hongxin Liu authored Mar 21, 2024

* [misc] add submodule

* remove submodule

* [example] support grok-1 tp inference

* [example] add grok-1 inference script

* [example] refactor code

* [example] add grok-1 readme

* [exmaple] add test ci

* [exmaple] update readme

848a574c

20 Mar, 2024 1 commit

[doc] update open-sora demo (#5479) · d158fc0e

binmakeswell authored Mar 20, 2024

* [doc] update open-sora demo

* [doc] update open-sora demo

* [doc] update open-sora demo

d158fc0e

18 Mar, 2024 2 commits

[doc] release Open-Sora 1.0 with model weights (#5468) · bd998ced

binmakeswell authored Mar 18, 2024

* [doc] release Open-Sora 1.0 with model weights

* [doc] release Open-Sora 1.0 with model weights

* [doc] release Open-Sora 1.0 with model weights

bd998ced

[shardformer] fix gathering output when using tensor parallelism (#5431) · 5e16bf79

flybird11111 authored Mar 18, 2024

* fix

* padding vocab_size when using pipeline parallellism

padding vocab_size when using pipeline parallellism

fix

fix

* fix

* fix

fix

fix

* fix gather output

* fix

* fix

* fix

fix resize embedding

fix resize embedding

* fix resize embedding

fix

* revert

* revert

* revert

5e16bf79

13 Mar, 2024 1 commit

[devops] fix compatibility (#5444) · f2e8b9ef

Hongxin Liu authored Mar 13, 2024

* [devops] fix compatibility

* [hotfix] update compatibility test on pr

* [devops] fix compatibility

* [devops] record duration during comp test

* [test] decrease test duration

* fix falcon

f2e8b9ef

12 Mar, 2024 1 commit
- [hotfix] fix typo s/keywrods/keywords etc. (#5429) · 385e85af
  digger yu authored Mar 12, 2024
  
  385e85af
11 Mar, 2024 1 commit
- fix tensor data update for gemini loss caluculation (#5442) · da885ed5
  Camille Zhong authored Mar 11, 2024
  
  da885ed5
07 Mar, 2024 2 commits
- [release] update version (#5411) · 8020f426
  Hongxin Liu authored Mar 07, 2024
  
  8020f426
- [colossal-llama2] add stream chat examlple for chat version model (#5428) · 743e7fad
  Camille Zhong authored Mar 07, 2024
```
* add stream chat for chat version

* remove os.system clear

* modify function name
```
  743e7fad
05 Mar, 2024 11 commits
- [hotfix] fix stable diffusion inference bug. (#5289) · 68f55a70
  Youngon authored Mar 05, 2024
```
* Update train_ddp.yaml

delete  "strategy" to fix DDP config loading bug in "main.py"

* Update train_ddp.yaml

fix inference with scripts/txt2img.py config file load bug.

* Update README.md

add pretrain model test code.
```
  68f55a70
- [doc] Fix typo s/infered/inferred/ (#5288) · c8003d46
  hugo-syn authored Mar 05, 2024
```
Signed-off-by: hugo-syn <hugo.vincent@synacktiv.com>
```
  c8003d46
- [hotfix] fix typo change MoECheckpintIO to MoECheckpointIO (#5335) · 5e1c93d7
  digger yu authored Mar 05, 2024
```
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
```
  5e1c93d7
- [eval-hotfix] set few_shot_data to None when few shot is disabled (#5422) · a7ae2b5b
  Dongruixuan Li authored Mar 05, 2024
  
  a7ae2b5b
- [hotfix] fix typo change enabel to enable under colossalai/shardformer/ (#5317) · 049121d1
  digger yu authored Mar 05, 2024
  
  049121d1
- [hotfix] fix typo change _descrption to _description (#5331) · 16c96d4d
  digger yu authored Mar 05, 2024
  
  16c96d4d
- [doc] update some translations with README-zh-Hans.md (#5382) · 70cce5cb
  digger yu authored Mar 05, 2024
  
  70cce5cb
- [hotfix] fix typo of openmoe model source (#5403) · e239cf90
  Luo Yihang authored Mar 05, 2024
  
  e239cf90
- [hotfix] fix sd vit import error (#5420) · e304e4db
  MickeyCHAN authored Mar 05, 2024
```
* fix import error

* Update dpt_depth.py

---------
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
```
  e304e4db
- [devops] fix extention building (#5427) · 070df689
  Hongxin Liu authored Mar 05, 2024
  
  070df689
- [doc] sora release (#5425) · 822241a9
  binmakeswell authored Mar 05, 2024
```
* [doc] sora release

* [doc] sora release

* [doc] sora release

* [doc] sora release
```
  822241a9
04 Mar, 2024 1 commit

[example]add gpt2 benchmark example script. (#5295) · 29695cf7

flybird11111 authored Mar 04, 2024



* benchmark gpt2

* fix

fix

fix

fix

* [doc] fix typo in Colossal-LLaMA-2/README.md (#5247)

* [workflow] fixed build CI (#5240)

* [workflow] fixed build CI

* polish

* polish

* polish

* polish

* polish

* [ci] fixed booster test (#5251)

* [ci] fixed booster test

* [ci] fixed booster test

* [ci] fixed booster test

* [ci] fixed ddp test (#5254)

* [ci] fixed ddp test

* polish

* fix typo in  applications/ColossalEval/README.md (#5250)

* [ci] fix shardformer tests. (#5255)

* fix ci

fix

* revert: revert p2p

* feat: add enable_metadata_cache option

* revert: enable t5 tests

---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>

* [doc] fix doc typo (#5256)

* [doc] fix annotation display

* [doc] fix llama2 doc

* [hotfix]: add pp sanity check and fix mbs arg (#5268)

* fix: fix misleading mbs arg

* feat: add pp sanity check

* fix: fix 1f1b sanity check

* [workflow] fixed incomplete bash command (#5272)

* [workflow] fixed oom tests (#5275)

* [workflow] fixed oom tests

* polish

* polish

* polish

* [ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (#5276)

* fix ci

fix

* fix test

* revert: revert p2p

* feat: add enable_metadata_cache option

* revert: enable t5 tests

* fix

---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>

* [shardformer] hybridparallelplugin support gradients accumulation. (#5246)

* support gradients acc

fix

fix

fix

fix

fix

fix

fix

fix

fix

fix

fix

fix

fix

* fix

fix

* fix

fix

fix

* [hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230)

* fix auto loading gpt2 tokenizer (#5279)

* [doc] add llama2-13B disyplay (#5285)

* Update README.md

* fix 13b typo

---------
Co-authored-by: binmakeswell <binmakeswell@gmail.com>

* fix llama pretrain (#5287)

* fix

* fix

* fix

fix

* fix

fix

fix

* fix

fix

* benchmark gpt2

* fix

fix

fix

fix

* [workflow] fixed build CI (#5240)

* [workflow] fixed build CI

* polish

* polish

* polish

* polish

* polish

* [ci] fixed booster test (#5251)

* [ci] fixed booster test

* [ci] fixed booster test

* [ci] fixed booster test

* fix

fix

* fix

fix

fix

* fix

* fix

fix

fix

fix

fix

* fix

* Update shardformer.py

---------
Co-authored-by: digger yu <digger-yu@outlook.com>
Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: Wenhao Chen <cwher@outlook.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
Co-authored-by: Zhongkai Zhao <kanezz620@gmail.com>
Co-authored-by: Michelle <97082656+MichelleMa8@users.noreply.github.com>
Co-authored-by: Desperado-Jia <502205863@qq.com>

29695cf7

01 Mar, 2024 1 commit
- fix sft single turn inference example (#5416) · 4b8312c0
  Camille Zhong authored Mar 01, 2024
  
  4b8312c0
29 Feb, 2024 3 commits
- [doc] fix blog link · a1c6cdb1
  binmakeswell authored Feb 29, 2024
  
  a1c6cdb1
- [doc] fix blog link · 5de940de
  binmakeswell authored Feb 29, 2024
  
  5de940de
- [workflow] added pypi channel (#5412) · 2461f378
  Frank Lee authored Feb 29, 2024
  
  2461f378
28 Feb, 2024 1 commit
- update requirements (#5407) · a28c9715
  Tong Li authored Feb 28, 2024
  
  a28c9715
27 Feb, 2024 4 commits
- [shardformer]gather llama logits (#5398) · 0a25e16e
  flybird11111 authored Feb 27, 2024
```
* gather llama logits

* fix
```
  0a25e16e
- [setup] fixed nightly release (#5388) · dcdd8a5e
  Frank Lee authored Feb 27, 2024
  
  dcdd8a5e
- [fsdp] impl save/load shard model/optimizer (#5357) · bf34c6fe
  QinLuo authored Feb 27, 2024
  
  bf34c6fe
- [example] reuse flash attn patch (#5400) · d882d18c
  Hongxin Liu authored Feb 27, 2024
  
  d882d18c
26 Feb, 2024 1 commit
- [extension] hotfix jit extension setup (#5402) · 95c21e39
  Hongxin Liu authored Feb 26, 2024
  
  95c21e39
20 Feb, 2024 1 commit
- [hotfix] Fix wrong import in meta_registry (#5392) · 5d380a1a
  Stephan Kölker authored Feb 20, 2024
  
  5d380a1a
19 Feb, 2024 3 commits
- [hotfix] fix variable type for top_p (#5313) · b833153f
  CZYCW authored Feb 19, 2024
```
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
```
  b833153f
- [doc] updated installation command (#5389) · 705a62a5
  Frank Lee authored Feb 19, 2024
  
  705a62a5
- [doc] Fix typo (#5361) · 69e3ad01
  yixiaoer authored Feb 19, 2024
  
  69e3ad01