Commits · 816add7e7f4844aa381d14af17cabcca121799e2 · OpenDAS / ColossalAI

27 Apr, 2023 2 commits

[chat] refactor model save/load logic (#3654) · 842768a1

Hongxin Liu authored Apr 27, 2023

* [chat] strategy refactor unwrap model

* [chat] strategy refactor save model

* [chat] add docstr

* [chat] refactor trainer save model

* [chat] fix strategy typing

* [chat] refactor trainer save model

* [chat] update readme

* [chat] fix unit test

842768a1

[chat] remove lm model class (#3653) · 6ef70114

Hongxin Liu authored Apr 27, 2023

* [chat] refactor lora

* [chat] remove lm class

* [chat] refactor save model

* [chat] refactor train sft

* [chat] fix ci

* [chat] fix ci

6ef70114

26 Apr, 2023 3 commits

[chat] refactor trainer (#3648) · 2a951955

Hongxin Liu authored Apr 26, 2023

* [chat] ppo trainer remove useless args

* [chat] update examples

* [chat] update benchmark

* [chat] update examples

* [chat] fix sft training with wandb

* [chat] polish docstr

2a951955

[chat] polish performance evaluator (#3647) · f8288315
Hongxin Liu authored Apr 26, 2023

f8288315

[gemini] accelerate inference (#3641) · 50793b35

Hongxin Liu authored Apr 26, 2023

* [gemini] support don't scatter after inference

* [chat] update colossalai strategy

* [chat] fix opt benchmark

* [chat] update opt benchmark

* [gemini] optimize inference

* [test] add gemini inference test

* [chat] fix unit test ci

* [chat] fix ci

* [chat] fix ci

* [chat] skip checkpoint test

50793b35

24 Apr, 2023 1 commit
- [Chat] Remove duplicate functions (#3625) · df309fc6
  ddobokki authored Apr 24, 2023
  
  df309fc6
20 Apr, 2023 1 commit
- [chat] polish code note typo (#3612) · d7bf2847
  digger-yu authored Apr 20, 2023
  
  d7bf2847
18 Apr, 2023 1 commit
- reconstruct chat trainer and fix training script (#3588) · 1ec0d386
  Yuanchen authored Apr 18, 2023
```
Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>
```
  1ec0d386
17 Apr, 2023 2 commits

fix: fix sft (#3568) · 7788e0b0
tingfeng cao authored Apr 17, 2023

7788e0b0

[chatgpt] Detached PPO Training (#3195) · e3551443

csric authored Apr 17, 2023



* run the base

* working on dist ppo

* sync

* detached trainer

* update detached trainer. no maker update function

* facing init problem

* 1 maker 1 trainer detached run. but no model update

* facing cuda problem

* fix save functions

* verified maker update

* nothing

* add ignore

* analyize loss issue

* remove some debug codes

* facing 2m1t stuck issue

* 2m1t verified

* do not use torchrun

* working on 2m2t

* working on 2m2t

* initialize strategy in ray actor env

* facing actor's init order issue

* facing ddp model update issue (need unwarp ddp)

* unwrap ddp actor

* checking 1m2t stuck problem

* nothing

* set timeout for trainer choosing. It solves the stuck problem!

* delete some debug output

* rename to sync with upstream

* rename to sync with upstream

* coati rename

* nothing

* I am going to detach the replaybuffer from trainer and make it a Ray Actor. Two benefits: 1. support TP trainer. 2. asynchronized buffer operations

* experience_maker_holder performs target-revolving _send_experience() instead of length comparison.

* move code to ray subfolder

* working on pipeline inference

* apply comments

---------
Co-authored-by: csric <richcsr256@gmail.com>

e3551443

11 Apr, 2023 1 commit
- [chat]: add vf_coef argument for PPOTrainer (#3318) · e6a132a4
  zhang-yi-chi authored Apr 11, 2023
  
  e6a132a4
06 Apr, 2023 2 commits

[Chat]Add Peft support & fix the ptx bug (#3433) · 62f4e2eb

YY Lin authored Apr 06, 2023

* Update ppo.py

Fix the bug of fetching wrong batch data

* Add peft model support in SFT and Prompts training

In stage-1 and stage-3, the peft model supports are added. So the trained artifacts will be only a small lora additions instead of the whole bunch of files.

* Delete test_prompts.txt

* Delete test_pretrained.txt

* Move the peft stuffs to a community folder.

* Move the demo sft to community

* delete dirty files

* Add instructions to install peft using source

* Remove Chinese comments

* remove the Chinese comments

62f4e2eb

[chat]fix save_model(#3377) · 73afb635
Dr-Corgi authored Apr 06, 2023
```
The function save_model should be a part of PPOTrainer.
```
73afb635

05 Apr, 2023 1 commit
- fix save_model indent error in ppo trainer (#3450) · b9231390
  Yuanchen authored Apr 05, 2023
```
Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>
```
  b9231390
04 Apr, 2023 2 commits

fix save_model inin naive and ddp strategy (#3436) · 773955ab
Yuanchen authored Apr 04, 2023
```
Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>
```
773955ab

[zero] reorganize zero/gemini folder structure (#3424) · 26b7aac0

ver217 authored Apr 04, 2023

* [zero] refactor low-level zero folder structure

* [zero] fix legacy zero import path

* [zero] fix legacy zero import path

* [zero] remove useless import

* [zero] refactor gemini folder structure

* [zero] refactor gemini folder structure

* [zero] refactor legacy zero import path

* [zero] refactor gemini folder structure

* [zero] refactor gemini folder structure

* [zero] refactor gemini folder structure

* [zero] refactor legacy zero import path

* [zero] fix test import path

* [zero] fix test

* [zero] fix circular import

* [zero] update import

26b7aac0

28 Mar, 2023 1 commit
- [Coati] first commit (#3283) · b0ce5a10
  Fazzie-Maqianli authored Mar 28, 2023
  
  b0ce5a10