Commits · d86ddd9b2910ef0e9a093039d70c3789d3af3517 · OpenDAS / ColossalAI

11 Aug, 2023 1 commit
- [hotfix] fix unsafe async comm in zero (#4404) · d86ddd9b
  LuGY authored Aug 11, 2023
```
* improve stablility of zero

* fix wrong index

* add record stream
```
  d86ddd9b
10 Aug, 2023 1 commit
- [gemini] fix tensor storage cleaning in state dict collection (#4396) · 6ccecc0c
  Baizhou Zhang authored Aug 10, 2023
  
  6ccecc0c
09 Aug, 2023 1 commit
- [kernel] updated unittests for coloattention (#4389) · 458ae331
  flybird1111 authored Aug 09, 2023
```
Updated coloattention tests of checking outputs and gradients
```
  458ae331
04 Aug, 2023 4 commits
- [doc] add Series A Funding and NeurIPS news (#4377) · 089c365f
  binmakeswell authored Aug 04, 2023
```
* [doc] add Series A Funding and NeurIPS news

* [kernal] fix mha kernal

* [CI] skip moe

* [CI] fix requirements
```
  089c365f
- [doc] Fix gradient accumulation doc. (#4349) · f40b7189
  flybird1111 authored Aug 04, 2023
```
* [doc] fix gradient accumulation doc

* [doc] fix gradient accumulation doc
```
  f40b7189
- [coloattention] fix import error (#4380) · 38b792aa
  flybird1111 authored Aug 04, 2023
```
fixed an import error
```
  38b792aa
- [fix] coloattention support flash attention 2 (#4347) · 25c57b9f
  flybird1111 authored Aug 04, 2023
```
Improved ColoAttention interface to support flash attention 2. Solved #4322 
```
  25c57b9f
02 Aug, 2023 1 commit

[chat] fix bugs and add unit tests (#4213) · da4f7b85

Wenhao Chen authored Aug 02, 2023

* style: rename replay buffer

Experience replay is typically for off policy algorithms.
Use this name in PPO maybe misleading.

* fix: fix wrong zero2 default arg

* test: update experience tests

* style: rename zero_pad fn

* fix: defer init in CycledDataLoader

* test: add benchmark test

* style: rename internal fn of generation

* style: rename internal fn of lora

* fix: remove unused loss fn

* fix: remove unused utils fn

* refactor: remove generate_with_actor fn

* fix: fix type annotation

* test: add models tests

* fix: skip llama due to long execution time

* style: modify dataset

* style: apply formatter

* perf: update reward dataset

* fix: fix wrong IGNORE_INDEX in sft dataset

* fix: remove DataCollatorForSupervisedDataset

* test: add dataset tests

* style: apply formatter

* style: rename test_ci to test_train

* feat: add llama in inference

* test: add inference tests

* test: change test scripts directory

* fix: update ci

* fix: fix typo

* fix: skip llama due to oom

* fix: fix file mod

* style: apply formatter

* refactor: remove duplicated llama_gptq

* style: apply formatter

* to: update rm test

* feat: add tokenizer arg

* feat: add download model script

* test: update train tests

* fix: modify gemini load and save pretrained

* test: update checkpoint io test

* to: modify nproc_per_node

* fix: do not remove existing dir

* fix: modify save path

* test: add random choice

* fix: fix sft path

* fix: enlarge nproc_per_node to avoid oom

* fix: add num_retry

* fix: make lora config of rm and critic consistent

* fix: add warning about lora weights

* fix: skip some gpt2 tests

* fix: remove grad ckpt in rm and critic due to errors

* refactor: directly use Actor in train_sft

* test: add more arguments

* fix: disable grad ckpt when using lora

* fix: fix save_pretrained and related tests

* test: enable zero2 tests

* revert: remove useless fn

* style: polish code

* test: modify test args

da4f7b85

01 Aug, 2023 5 commits
- [test] remove useless tests (#4359) · 16bf4c02
  Hongxin Liu authored Aug 01, 2023
```
* [test] remove legacy zero test

* [test] remove lazy distribute test

* [test] remove outdated checkpoint io
```
  16bf4c02
- [hotfix] update gradio 3.11 to 3.34.0 (#4329) · 16c0acc0
  caption authored Aug 01, 2023
  
  16c0acc0
- [release] update version (#4332) · 80647712
  Hongxin Liu authored Aug 01, 2023
```
* [release] update version

* [devops] hotfix cuda extension building

* [devops] pytest ignore useless folders
```
  80647712
- [chat] fix compute_approx_kl (#4338) · 75c53890
  Wenhao Chen authored Aug 01, 2023
  
  75c53890
- fix localhost measurement (#4320) · 03654c0c
  LuGY authored Aug 01, 2023
  
  03654c0c
31 Jul, 2023 6 commits

[zero] optimize the optimizer step time (#4221) · 45b08f08

LuGY authored Jul 18, 2023

* optimize the optimizer step time

* fix corner case

* polish

* replace all-reduce with all-gather

* set comm device to cuda

45b08f08

[zero] support shard optimizer state dict of zero (#4194) · 1a49a5ea
LuGY authored Jul 11, 2023
```
* support shard optimizer of zero

* polish code

* support sync grad manually
```
1a49a5ea
[zero] add state dict for low level zero (#4179) · dd7cc582
LuGY authored Jul 06, 2023
```
* add state dict for zero

* fix unit test

* polish
```
dd7cc582
[zero] allow passing process group to zero12 (#4153) · c668801d
LuGY authored Jul 04, 2023
```
* allow passing process group to zero12

* union tp-zero and normal-zero

* polish code
```
c668801d
[zero]support no_sync method for zero1 plugin (#4138) · 79cf1b5f
LuGY authored Jul 04, 2023
```
* support no sync for zero1 plugin

* polish

* polish
```
79cf1b5f

[zero] refactor low level zero for shard evenly (#4030) · c6ab9698

LuGY authored Jun 30, 2023

* refactor low level zero

* fix zero2 and support cpu offload

* avg gradient and modify unit test

* refactor grad store, support layer drop

* refactor bucket store, support grad accumulation

* fix and update unit test of zero and ddp

* compatible with tp, ga and unit test

* fix memory leak and polish

* add zero layer drop unittest

* polish code

* fix import err in unit test

* support diffenert comm dtype, modify docstring style

* polish code

* test padding and fix

* fix unit test of low level zero

* fix pad recording in bucket store

* support some models

* polish

c6ab9698

28 Jul, 2023 1 commit
- support session-based training (#4313) · 5187c96b
  Yuanchen authored Jul 28, 2023
```
Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>
```
  5187c96b
26 Jul, 2023 20 commits
- add llama example CI · ef4b99eb
  binmakeswell authored Jul 22, 2023
  
  ef4b99eb
- [NFC] polish applications/Chat/coati/models/utils.py codestyle (#4277) · 09914053
  yuxuan-lou authored Jul 19, 2023
```
* [NFC] polish colossalai/context/random/__init__.py code style

* [NFC] polish applications/Chat/coati/models/utils.py code style
```
  09914053
- [NFC] polish applications/Chat/coati/trainer/strategies/base.py code style (#4278) · 9e512938
  Zirui Zhu authored Jul 19, 2023
  
  9e512938
- applications/Chat/.gitignore (#4279) · c972d653
  Ziheng Qin authored Jul 19, 2023
```
Co-authored-by: henryqin1997 <henryqin1997@gamil.com>
```
  c972d653
- [NFC] polish applications/Chat/coati/models/generation.py code style (#4275) · 709e121c
  RichardoLuo authored Jul 18, 2023
  
  709e121c
- [NFC] polish applications/Chat/inference/server.py code style (#4274) · dc1b6127
  Yuanchen authored Jul 18, 2023
```
Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>
```
  dc1b6127
- [NFC] fix format of application/Chat/coati/trainer/utils.py (#4273) · caa44330
  アマデウス authored Jul 18, 2023
  
  caa44330
- [NFC] polish applications/Chat/examples/train_reward_model.py code style (#4271) · 1ce997da
  Xu Kai authored Jul 18, 2023
  
  1ce997da
- [NFC] fix: format (#4270) · a50d39a1
  dayellow authored Jul 18, 2023
```
* [NFC] polish colossalai/fx/profiler/experimental/profiler_module/embedding.py code style

* [NFC] polish colossalai/communication/utils.py code style

---------
Co-authored-by: Minghao Huang <huangminghao@luchentech.com>
```
  a50d39a1
- [NFC] polish runtime_preparation_pass style (#4266) · fee55328
  Wenhao Chen authored Jul 18, 2023
  
  fee55328
- [NFC] polish unary_elementwise_generator.py code style (#4267) · 3883db45
  YeAnbang authored Jul 18, 2023
```
Co-authored-by: aye42 <aye42@gatech.edu>
```
  3883db45
- [NFC] polish applications/Chat/coati/trainer/base.py code style (#4260) · 798cb729
  shenggan authored Jul 18, 2023
  
  798cb729
- [NFC] polish applications/Chat/coati/dataset/sft_dataset.py code style (#4259) · b2debdc0
  Zheng Zangwei (Alex Zheng) authored Jul 18, 2023
  
  b2debdc0
- [NFC] polish colossalai/booster/plugin/low_level_zero_plugin.py code style (#4256) · abe4f971
  梁爽 authored Jul 18, 2023
```
Co-authored-by: supercooledith <893754954@qq.com>
```
  abe4f971
- [NFC] polish colossalai/auto_parallel/offload/amp_optimizer.py code style (#4255) · c614a99d
  Yanjia0 authored Jul 18, 2023
  
  c614a99d
- [NFC] polish colossalai/cli/benchmark/utils.py code style (#4254) · 85774f0c
  ocd_with_naming authored Jul 18, 2023
  
  85774f0c
- [NFC] policy applications/Chat/examples/ray/mmmt_prompt.py code style (#4250) · dee1c963
  CZYCW authored Jul 18, 2023
  
  dee1c963
- [NFC] polish applications/Chat/coati/models/base/actor.py code style (#4248) · 77c469e1
  Junming Wu authored Jul 18, 2023
  
  77c469e1
- [NFC] polish applications/Chat/inference/requirements.txt code style (#4265) · 915ed8be
  Camille Zhong authored Jul 18, 2023
  
  915ed8be
- Fix/format (#4261) · 86cf6aed
  Michelle authored Jul 18, 2023
```
* revise shardformer readme (#4246)

* [example] add llama pretraining (#4257)

* [NFC] polish colossalai/communication/p2p.py code style

---------
Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
Co-authored-by: Qianran Ma <qianranm@luchentech.com>
```
  86cf6aed