Commits · 77efdfe1ddf9cd9d680137d1fb6910daf78dc4e8 · OpenDAS / ColossalAI

13 Apr, 2023 2 commits
- [doc] Update README.md (#3549) · 77efdfe1
  digger-yu authored Apr 13, 2023
```
Format Optimization ,Add [] outside of DeepSpeed
```
  77efdfe1
- Update README.md (#3548) · 3f760da9
  digger-yu authored Apr 13, 2023
```
Delete more ")"
```
  3f760da9
12 Apr, 2023 5 commits

[doc] Update README-zh-Hans.md (#3541) · a3ac48ef
digger-yu authored Apr 12, 2023
```
Fixing document link errors using absolute paths
```
a3ac48ef
Polish Code · de84c031
natalie_cao authored Apr 11, 2023

de84c031

[gemini] gemini supports lazy init (#3379) · 152239bb

Hongxin Liu authored Apr 12, 2023

* [gemini] fix nvme optimizer init

* [gemini] gemini supports lazy init

* [gemini] add init example

* [gemini] add fool model

* [zero] update gemini ddp

* [zero] update init example

* add chunk method

* add chunk method

* [lazyinit] fix lazy tensor tolist

* [gemini] fix buffer materialization

* [misc] remove useless file

* [booster] update gemini plugin

* [test] update gemini plugin test

* [test] fix gemini plugin test

* [gemini] fix import

* [gemini] fix import

* [lazyinit] use new metatensor

* [lazyinit] use new metatensor

* [lazyinit] fix __set__ method

152239bb

[checkpoint] Shard saved checkpoint need to be compatible with the naming... · 366a0355

jiangmingyan authored Apr 12, 2023


[checkpoint]  Shard saved checkpoint need to be compatible with the naming format of hf checkpoint files  (#3479)

* [checkpoint] support huggingface style sharded checkpoint, to be compatible with hf file naming format

* [checkpoint] support huggingface style sharded checkpoint, to be compatible with hf file naming format

* [checkpoint] Shard saved checkpoint add 'variant' field to customize filename

* [checkpoint] Shard saved checkpoint add 'variant' field to customize filename

* [checkpoint] Shard saved checkpoint add 'variant' field to customize filename

* [checkpoint] Shard saved checkpoint add 'variant' field to customize filename

---------
Co-authored-by: luchen <luchen@luchendeMacBook-Pro.local>
Co-authored-by: luchen <luchen@luchendeMBP.lan>

366a0355

[chat]add examples of training with limited resources in chat readme (#3536) · 7182ac2a
Yuanchen authored Apr 12, 2023
```
Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>
```
7182ac2a

11 Apr, 2023 1 commit
- [chat]: add vf_coef argument for PPOTrainer (#3318) · e6a132a4
  zhang-yi-chi authored Apr 11, 2023
  
  e6a132a4
10 Apr, 2023 5 commits
- [chat] add zero2 cpu strategy for sft training (#3520) · 89fd10a1
  ver217 authored Apr 10, 2023
  
  89fd10a1
- [doc] hide diffusion in application path (#3519) · 990d4c3e
  binmakeswell authored Apr 10, 2023
```
- [ ] Stable Diffusion
- [ ] Dreambooth
It's easy for users to think that we don't support them yet. Add them after migrating them from example to application
https://github.com/hpcaitech/ColossalAI/tree/main/examples/images
```
  990d4c3e
- [doc] add requirement and highlight application (#3516) · 0c045570
  binmakeswell authored Apr 10, 2023
```
* [doc] add requirement and highlight application

* [doc] link example and application
```
  0c045570
- [Chat Community] Update README.md (fixed#3487) (#3506) · 635d0a1b
  NatalieC323 authored Apr 10, 2023
```
* Update README.md

* Update README.md

* Update README.md

* Update README.md

---------
Co-authored-by: Fazzie-Maqianli <55798671+Fazziekey@users.noreply.github.com>
```
  635d0a1b
- [doc] Add docs for clip args in zero optim (#3504) · bcf0cbcb
  YH authored Apr 10, 2023
  
  bcf0cbcb
07 Apr, 2023 3 commits
- [coati] Fix LlamaCritic (#3475) · a7ca2972
  gongenlei authored Apr 07, 2023
```
* mv LlamaForCausalLM to LlamaModel

* rm unused imports

---------
Co-authored-by: gongenlei <gongenlei@baidu.com>
```
  a7ca2972
- [example] remove redundant texts & update roberta (#3493) · 8f2c55f9
  mandoxzhang authored Apr 07, 2023
```
* update roberta example

* update roberta example

* modify conflict & update roberta
```
  8f2c55f9
- [example] update roberta with newer ColossalAI (#3472) · ab5fd127
  mandoxzhang authored Apr 07, 2023
```
* update roberta example

* update roberta example
```
  ab5fd127
06 Apr, 2023 14 commits

Revert "[dreambooth] fixing the incompatibity in requirements.txt (#3190) (#3378)" (#3481) · fb8fae6f
NatalieC323 authored Apr 06, 2023

fb8fae6f
[chat] fix stage3 PPO sample sh command (#3477) · 891b8e7f
binmakeswell authored Apr 06, 2023

891b8e7f

[dreambooth] fixing the incompatibity in requirements.txt (#3190) (#3378) · c701b77b

NatalieC323 authored Apr 06, 2023

* Update requirements.txt

* Update environment.yaml

* Update README.md

* Update environment.yaml

* Update README.md

* Update README.md

* Delete requirements_colossalai.txt

* Update requirements.txt

* Update README.md

c701b77b

[doc] updated contributor list (#3474) · 4e998934
Frank Lee authored Apr 06, 2023

4e998934

[checkpoint] support huggingface style sharded checkpoint (#3461) · 52a933e1

jiangmingyan authored Apr 06, 2023



* [checkpoint] support huggingface style sharded checkpoint

* [checkpoint] support huggingface style sharded checkpoint

* [checkpoint] support huggingface style sharded checkpoint

* [checkpoint] support huggingface style sharded checkpoint

* [checkpoint] support huggingface style sharded checkpoint

---------
Co-authored-by: luchen <luchen@luchendeMBP.lan>

52a933e1

add community example dictionary (#3465) · 6afeb120
Fazzie-Maqianli authored Apr 06, 2023

6afeb120

[test] refactor tests with spawn (#3452) · 80eba05b

Frank Lee authored Apr 06, 2023

* [test] added spawn decorator

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code

80eba05b

[Chat]Add Peft support & fix the ptx bug (#3433) · 62f4e2eb

YY Lin authored Apr 06, 2023

* Update ppo.py

Fix the bug of fetching wrong batch data

* Add peft model support in SFT and Prompts training

In stage-1 and stage-3, the peft model supports are added. So the trained artifacts will be only a small lora additions instead of the whole bunch of files.

* Delete test_prompts.txt

* Delete test_pretrained.txt

* Move the peft stuffs to a community folder.

* Move the demo sft to community

* delete dirty files

* Add instructions to install peft using source

* Remove Chinese comments

* remove the Chinese comments

62f4e2eb

[chat]fix save_model(#3377) · 73afb635
Dr-Corgi authored Apr 06, 2023
```
The function save_model should be a part of PPOTrainer.
```
73afb635
[chat]fix readme (#3429) · 57a3c4db
kingkingofall authored Apr 06, 2023
```
* fix stage 2

fix stage 2

* add torch
```
57a3c4db
[booster] fixed the torch ddp plugin with the new checkpoint api (#3442) · 7d8d8256
Frank Lee authored Apr 06, 2023

7d8d8256
Fix typo (#3448) · 8f740deb
YH authored Apr 06, 2023

8f740deb
[test] reorganize zero/gemini tests (#3445) · 933048ad
ver217 authored Apr 06, 2023

933048ad

[Chat] fix the tokenizer "int too big to convert" error in SFT training (#3453) · 72cb4dd4

Camille Zhong authored Apr 06, 2023

* Add RoBERTa for RLHF Stage 2 & 3 (test)

RoBERTa for RLHF Stage 2 & 3 (still in testing)

* Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)"

This reverts commit 06741d894dcbe958acd4e10d771f22275e20e368.

* Add RoBERTa for RLHF stage 2 & 3

1. add roberta folder under model folder
2. add  roberta option in train_reward_model.py
3. add some test in testci

* Update test_ci.sh

* Revert "Update test_ci.sh"

This reverts commit 9c7352b81766f3177d31eeec0ec178a301df966a.

* Add RoBERTa for RLHF Stage 2 & 3 (test)

RoBERTa for RLHF Stage 2 & 3 (still in testing)

* Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)"

This reverts commit 06741d894dcbe958acd4e10d771f22275e20e368.

* Add RoBERTa for RLHF stage 2 & 3

1. add roberta folder under model folder
2. add  roberta option in train_reward_model.py
3. add some test in testci

* Update test_ci.sh

* Revert "Update test_ci.sh"

This reverts commit 9c7352b81766f3177d31eeec0ec178a301df966a.

* update roberta with coati

* chat ci update

* Revert "chat ci update"

This reverts commit 17ae7ae01fa752bd3289fc39069868fde99cf846.

* [Chat] fix the tokenizer "int too big to convert" error in SFT training

fix the tokenizer error during SFT training using Bloom and OPT

72cb4dd4

05 Apr, 2023 2 commits
- [format] Run lint on colossalai.engine (#3367) · 46c009db
  Hakjin Lee authored Apr 06, 2023
  
  46c009db
- fix save_model indent error in ppo trainer (#3450) · b9231390
  Yuanchen authored Apr 05, 2023
```
Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>
```
  b9231390
04 Apr, 2023 6 commits

[autoparallel]integrate auto parallel feature with new tracer (#3408) · ffcdbf0f

YuliangLiu0306 authored Apr 04, 2023

* [autoparallel] integrate new analyzer in module level

* unify the profiling method

* polish

* fix no codegen bug

* fix pass bug

* fix liveness test

* polish

ffcdbf0f

[example] update examples related to zero/gemini (#3431) · 573af841

ver217 authored Apr 04, 2023

* [zero] update legacy import

* [zero] update examples

* [example] fix opt tutorial

* [example] fix opt tutorial

* [example] fix opt tutorial

* [example] fix opt tutorial

* [example] fix import

573af841

fix save_model inin naive and ddp strategy (#3436) · 773955ab
Yuanchen authored Apr 04, 2023
```
Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>
```
773955ab
[checkpoint] refactored the API and added safetensors support (#3427) · 1beb85cc
Frank Lee authored Apr 04, 2023
```
* [checkpoint] refactored the API and added safetensors support

* polish code
```
1beb85cc

[zero] reorganize zero/gemini folder structure (#3424) · 26b7aac0

ver217 authored Apr 04, 2023

* [zero] refactor low-level zero folder structure

* [zero] fix legacy zero import path

* [zero] fix legacy zero import path

* [zero] remove useless import

* [zero] refactor gemini folder structure

* [zero] refactor gemini folder structure

* [zero] refactor legacy zero import path

* [zero] refactor gemini folder structure

* [zero] refactor gemini folder structure

* [zero] refactor gemini folder structure

* [zero] refactor legacy zero import path

* [zero] fix test import path

* [zero] fix test

* [zero] fix circular import

* [zero] update import

26b7aac0

[chat]fix sft training for bloom, gpt and opt (#3418) · b09adff7
Yuanchen authored Apr 04, 2023
```
fix sft training for bloom, gpt and opt 
```
b09adff7

03 Apr, 2023 2 commits

[test] fixed gemini plugin test (#3411) · 638a07a7
Frank Lee authored Apr 03, 2023
```
* [test] fixed gemini plugin test

* polish code

* polish code
```
638a07a7

[chatgpt] add pre-trained model RoBERTa for RLHF stage 2 & 3 (#3223) · 30412866

Camille Zhong authored Apr 03, 2023

* Add RoBERTa for RLHF Stage 2 & 3 (test)

RoBERTa for RLHF Stage 2 & 3 (still in testing)

* Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)"

This reverts commit 06741d894dcbe958acd4e10d771f22275e20e368.

* Add RoBERTa for RLHF stage 2 & 3

1. add roberta folder under model folder
2. add  roberta option in train_reward_model.py
3. add some test in testci

* add test for reward model training

* Update test_ci.sh

* Revert "Update test_ci.sh"

This reverts commit 9c7352b81766f3177d31eeec0ec178a301df966a.

* Add RoBERTa for RLHF Stage 2 & 3 (test)

RoBERTa for RLHF Stage 2 & 3 (still in testing)

* Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)"

This reverts commit 06741d894dcbe958acd4e10d771f22275e20e368.

* Add RoBERTa for RLHF stage 2 & 3

1. add roberta folder under model folder
2. add  roberta option in train_reward_model.py
3. add some test in testci

* Update test_ci.sh

* Revert "Update test_ci.sh"

This reverts commit 9c7352b81766f3177d31eeec0ec178a301df966a.

* update roberta with coati

30412866