Commits · 19e1a5cf16ead982eb8818cd69e41b06a5d23b20 · OpenDAS / ColossalAI

27 Mar, 2024 1 commit

[shardformer] update colo attention to support custom mask (#5510) · 19e1a5cf

Hongxin Liu authored Mar 27, 2024

* [feature] refactor colo attention (#5462)

* [extension] update api

* [feature] add colo attention

* [feature] update sdpa

* [feature] update npu attention

* [feature] update flash-attn

* [test] add flash attn test

* [test] update flash attn test

* [shardformer] update modeling to fit colo attention (#5465)

* [misc] refactor folder structure

* [shardformer] update llama flash-attn

* [shardformer] fix llama policy

* [devops] update tensornvme install

* [test] update llama test

* [shardformer] update colo attn kernel dispatch

* [shardformer] update blip2

* [shardformer] update chatglm

* [shardformer] update gpt2

* [shardformer] update gptj

* [shardformer] update opt

* [shardformer] update vit

* [shardformer] update colo attention mask prep

* [shardformer] update whisper

* [test] fix shardformer tests (#5514)

* [test] fix shardformer tests

* [test] fix shardformer tests

19e1a5cf

26 Mar, 2024 1 commit
- [devops] fix example test ci (#5504) · a7790a92
  Hongxin Liu authored Mar 26, 2024
  
  a7790a92
13 Mar, 2024 1 commit

[devops] fix compatibility (#5444) · f2e8b9ef

Hongxin Liu authored Mar 13, 2024

* [devops] fix compatibility

* [hotfix] update compatibility test on pr

* [devops] fix compatibility

* [devops] record duration during comp test

* [test] decrease test duration

* fix falcon

f2e8b9ef

05 Mar, 2024 1 commit
- [devops] fix extention building (#5427) · 070df689
  Hongxin Liu authored Mar 05, 2024
  
  070df689
04 Mar, 2024 1 commit

[example]add gpt2 benchmark example script. (#5295) · 29695cf7

flybird11111 authored Mar 04, 2024



* benchmark gpt2

* fix

fix

fix

fix

* [doc] fix typo in Colossal-LLaMA-2/README.md (#5247)

* [workflow] fixed build CI (#5240)

* [workflow] fixed build CI

* polish

* polish

* polish

* polish

* polish

* [ci] fixed booster test (#5251)

* [ci] fixed booster test

* [ci] fixed booster test

* [ci] fixed booster test

* [ci] fixed ddp test (#5254)

* [ci] fixed ddp test

* polish

* fix typo in  applications/ColossalEval/README.md (#5250)

* [ci] fix shardformer tests. (#5255)

* fix ci

fix

* revert: revert p2p

* feat: add enable_metadata_cache option

* revert: enable t5 tests

---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>

* [doc] fix doc typo (#5256)

* [doc] fix annotation display

* [doc] fix llama2 doc

* [hotfix]: add pp sanity check and fix mbs arg (#5268)

* fix: fix misleading mbs arg

* feat: add pp sanity check

* fix: fix 1f1b sanity check

* [workflow] fixed incomplete bash command (#5272)

* [workflow] fixed oom tests (#5275)

* [workflow] fixed oom tests

* polish

* polish

* polish

* [ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (#5276)

* fix ci

fix

* fix test

* revert: revert p2p

* feat: add enable_metadata_cache option

* revert: enable t5 tests

* fix

---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>

* [shardformer] hybridparallelplugin support gradients accumulation. (#5246)

* support gradients acc

fix

fix

fix

fix

fix

fix

fix

fix

fix

fix

fix

fix

fix

* fix

fix

* fix

fix

fix

* [hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230)

* fix auto loading gpt2 tokenizer (#5279)

* [doc] add llama2-13B disyplay (#5285)

* Update README.md

* fix 13b typo

---------
Co-authored-by: binmakeswell <binmakeswell@gmail.com>

* fix llama pretrain (#5287)

* fix

* fix

* fix

fix

* fix

fix

fix

* fix

fix

* benchmark gpt2

* fix

fix

fix

fix

* [workflow] fixed build CI (#5240)

* [workflow] fixed build CI

* polish

* polish

* polish

* polish

* polish

* [ci] fixed booster test (#5251)

* [ci] fixed booster test

* [ci] fixed booster test

* [ci] fixed booster test

* fix

fix

* fix

fix

fix

* fix

* fix

fix

fix

fix

fix

* fix

* Update shardformer.py

---------
Co-authored-by: digger yu <digger-yu@outlook.com>
Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: Wenhao Chen <cwher@outlook.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
Co-authored-by: Zhongkai Zhao <kanezz620@gmail.com>
Co-authored-by: Michelle <97082656+MichelleMa8@users.noreply.github.com>
Co-authored-by: Desperado-Jia <502205863@qq.com>

29695cf7

29 Feb, 2024 1 commit
- [workflow] added pypi channel (#5412) · 2461f378
  Frank Lee authored Feb 29, 2024
  
  2461f378
27 Feb, 2024 1 commit
- [setup] fixed nightly release (#5388) · dcdd8a5e
  Frank Lee authored Feb 27, 2024
  
  dcdd8a5e
29 Jan, 2024 1 commit
- [workflow] updated CI image (#5318) · 73f4dc57
  Frank Lee authored Jan 29, 2024
  
  73f4dc57
25 Jan, 2024 1 commit

[feat] refactored extension module (#5298) · 7cfed5f0

Frank Lee authored Jan 25, 2024

* [feat] refactored extension module

* polish

* polish

* polish

* polish

* polish

* polish

* polish

* polish

* polish

* polish

7cfed5f0

16 Jan, 2024 2 commits
- [workflow] fixed oom tests (#5275) · d69cd2eb
  Frank Lee authored Jan 16, 2024
```
* [workflow] fixed oom tests

* polish

* polish

* polish
```
  d69cd2eb
- [workflow] fixed incomplete bash command (#5272) · 04244aaa
  Frank Lee authored Jan 16, 2024
  
  04244aaa
11 Jan, 2024 1 commit

[ci] fixed booster test (#5251) · d5eeeb14

Frank Lee authored Jan 11, 2024

* [ci] fixed booster test

* [ci] fixed booster test

* [ci] fixed booster test

d5eeeb14

10 Jan, 2024 1 commit
- [workflow] fixed build CI (#5240) · edf94a35
  Frank Lee authored Jan 10, 2024
```
* [workflow] fixed build CI

* polish

* polish

* polish

* polish

* polish
```
  edf94a35
09 Jan, 2024 1 commit

[npu] change device to accelerator api (#5239) · d202cc28

Hongxin Liu authored Jan 09, 2024



* update accelerator

* fix timer

* fix amp

* update

* fix

* update bug

* add error raise

* fix autocast

* fix set device

* remove doc accelerator

* update doc

* update doc

* update doc

* use nullcontext

* update cpu

* update null context

* change time limit for example

* udpate

* update

* update

* update

* [npu] polish accelerator code

---------
Co-authored-by: Xuanlei Zhao <xuanlei.zhao@gmail.com>
Co-authored-by: zxl <43881818+oahzxl@users.noreply.github.com>

d202cc28

03 Jan, 2024 1 commit
- [devops] update torch versoin in ci (#5217) · 7f3400b5
  Hongxin Liu authored Jan 03, 2024
  
  7f3400b5
28 Nov, 2023 1 commit

[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088) · 7172459e

Wenhao Chen authored Nov 28, 2023



* [shardformer] implement policy for all GPT-J models and test

* [shardformer] support interleaved pipeline parallel for bert finetune

* [shardformer] shardformer support falcon (#4883)

* [shardformer]: fix interleaved pipeline for bert model (#5048)

* [hotfix]: disable seq parallel for gptj and falcon, and polish code (#5093)

* Add Mistral support for Shardformer (#5103)

* [shardformer] add tests to mistral (#5105)

---------
Co-authored-by: Pengtai Xu <henryxu880@gmail.com>
Co-authored-by: ppt0011 <143150326+ppt0011@users.noreply.github.com>
Co-authored-by: flybird11111 <1829166702@qq.com>
Co-authored-by: eric8607242 <e0928021388@gmail.com>

7172459e

23 Nov, 2023 1 commit

[Feature] Add document retrieval QA (#5020) · e53e729d

YeAnbang authored Nov 23, 2023



* add langchain

* add langchain

* Add files via upload

* add langchain

* fix style

* fix style: remove extra space

* add pytest; modified retriever

* add pytest; modified retriever

* add tests to build_on_pr.yml

* fix build_on_pr.yml

* fix build on pr; fix environ vars

* seperate unit tests for colossalqa from build from pr

* fix container setting; fix environ vars

* commented dev code

* add incremental update

* remove stale code

* fix style

* change to sha3 224

* fix retriever; fix style; add unit test for document loader

* fix ci workflow config

* fix ci workflow config

* add set cuda visible device script in ci

* fix doc string

* fix style; update readme; refactored

* add force log info

* change build on pr, ignore colossalqa

* fix docstring, captitalize all initial letters

* fix indexing; fix text-splitter

* remove debug code, update reference

* reset previous commit

* update LICENSE update README add key-value mode, fix bugs

* add files back

* revert force push

* remove junk file

* add test files

* fix retriever bug, add intent classification

* change conversation chain design

* rewrite prompt and conversation chain

* add ui v1

* ui v1

* fix atavar

* add header

* Refactor the RAG Code and support Pangu

* Refactor the ColossalQA chain to Object-Oriented Programming and the UI demo.

* resolved conversation. tested scripts under examples. web demo still buggy

* fix ci tests

* Some modifications to add ChatGPT api

* modify llm.py and remove unnecessary files

* Delete applications/ColossalQA/examples/ui/test_frontend_input.json

* Remove OpenAI api key

* add colossalqa

* move files

* move files

* move files

* move files

* fix style

* Add Readme and fix some bugs.

* Add something to readme and modify some code

* modify a directory name for clarity

* remove redundant directory

* Correct a type in  llm.py

* fix AI prefix

* fix test_memory.py

* fix conversation

* fix some erros and typos

* Fix a missing import in RAG_ChatBot.py

* add colossalcloud LLM wrapper, correct issues in code review

---------
Co-authored-by: YeAnbang <anbangy2@outlook.com>
Co-authored-by: Orion-Zheng <zheng_zian@u.nus.edu>
Co-authored-by: Zian(Andy) Zheng <62330719+Orion-Zheng@users.noreply.github.com>
Co-authored-by: Orion-Zheng <zhengzian@u.nus.edu>

e53e729d

08 Nov, 2023 1 commit
- [misc] add code owners (#5024) · 67f53317
  Hongxin Liu authored Nov 08, 2023
  
  67f53317
01 Nov, 2023 1 commit
- [release] update version (#4995) · 8993c8a8
  Hongxin Liu authored Nov 01, 2023
```
* [release] update version

* [hotfix] fix ci
```
  8993c8a8
27 Sep, 2023 1 commit
- [doc] update slack link (#4823) · 822051d8
  binmakeswell authored Sep 27, 2023
  
  822051d8
26 Sep, 2023 1 commit

[lazy] support from_pretrained (#4801) · 4965c0da

Hongxin Liu authored Sep 26, 2023

* [lazy] patch from pretrained

* [lazy] fix from pretrained and add tests

* [devops] update ci

4965c0da

20 Sep, 2023 1 commit

[chat]: update rm, add wandb and fix bugs (#4471) · 7b9b8644

Wenhao Chen authored Sep 20, 2023



* feat: modify forward fn of critic and reward model

* feat: modify calc_action_log_probs

* to: add wandb in sft and rm trainer

* feat: update train_sft

* feat: update train_rm

* style: modify type annotation and add warning

* feat: pass tokenizer to ppo trainer

* to: modify trainer base and maker base

* feat: add wandb in ppo trainer

* feat: pass tokenizer to generate

* test: update generate fn tests

* test: update train tests

* fix: remove action_mask

* feat: remove unused code

* fix: fix wrong ignore_index

* fix: fix mock tokenizer

* chore: update requirements

* revert: modify make_experience

* fix: fix inference

* fix: add padding side

* style: modify _on_learn_batch_end

* test: use mock tokenizer

* fix: use bf16 to avoid overflow

* fix: fix workflow

* [chat] fix gemini strategy

* [chat] fix

* sync: update colossalai strategy

* fix: fix args and model dtype

* fix: fix checkpoint test

* fix: fix requirements

* fix: fix missing import and wrong arg

* fix: temporarily skip gemini test in stage 3

* style: apply pre-commit

* fix: temporarily skip gemini test in stage 1&2

---------
Co-authored-by: Mingyan Jiang <1829166702@qq.com>

7b9b8644

19 Sep, 2023 1 commit

[misc] update pre-commit and run all files (#4752) · 079bf3cb

Hongxin Liu authored Sep 19, 2023

* [misc] update pre-commit

* [misc] run pre-commit

* [misc] remove useless configuration files

* [misc] ignore cuda for clang-format

079bf3cb

18 Sep, 2023 1 commit

[legacy] clean up legacy code (#4743) · b5f9e37c

Hongxin Liu authored Sep 18, 2023

* [legacy] remove outdated codes of pipeline (#4692)

* [legacy] remove cli of benchmark and update optim (#4690)

* [legacy] remove cli of benchmark and update optim

* [doc] fix cli doc test

* [legacy] fix engine clip grad norm

* [legacy] remove outdated colo tensor (#4694)

* [legacy] remove outdated colo tensor

* [test] fix test import

* [legacy] move outdated zero to legacy (#4696)

* [legacy] clean up utils (#4700)

* [legacy] clean up utils

* [example] update examples

* [legacy] clean up amp

* [legacy] fix amp module

* [legacy] clean up gpc (#4742)

* [legacy] clean up context

* [legacy] clean core, constants and global vars

* [legacy] refactor initialize

* [example] fix examples ci

* [example] fix examples ci

* [legacy] fix tests

* [example] fix gpt example

* [example] fix examples ci

* [devops] fix ci installation

* [example] fix examples ci

b5f9e37c

11 Sep, 2023 1 commit
- [devops] fix concurrency group (#4667) · 536397cc
  Hongxin Liu authored Sep 11, 2023
  
  536397cc
08 Sep, 2023 1 commit

[devops] fix concurrency group and compatibility test (#4665) · a686f9dd

Hongxin Liu authored Sep 08, 2023

* [devops] fix concurrency group

* [devops] fix compatibility test

* [devops] fix tensornvme install

* [devops] fix tensornvme install

* [devops] fix colossalai install

a686f9dd

01 Sep, 2023 1 commit
- [shardformer] support from_pretrained when loading model with HybridParallelPlugin (#4575) · 38ccb8b1
  Baizhou Zhang authored Sep 01, 2023
```
* hybrid plugin support huggingface from_pretrained

* add huggingface compatibility tests

* add folder cleaning

* fix bugs
```
  38ccb8b1
30 Aug, 2023 2 commits
- [devops] cancel previous runs in the PR (#4546) · c7b60f75
  Hongxin Liu authored Aug 30, 2023
  
  c7b60f75
- [coati] update ci · 1c43bfd5
  ver217 authored Aug 30, 2023
  
  1c43bfd5
16 Aug, 2023 1 commit

[devops] add large-scale distributed test marker (#4452) · 26e29d58

Hongxin Liu authored Aug 16, 2023

* [test] remove cpu marker

* [test] remove gpu marker

* [test] update pytest markers

* [ci] update unit test ci

26e29d58

02 Aug, 2023 1 commit

[chat] fix bugs and add unit tests (#4213) · da4f7b85

Wenhao Chen authored Aug 02, 2023

* style: rename replay buffer

Experience replay is typically for off policy algorithms.
Use this name in PPO maybe misleading.

* fix: fix wrong zero2 default arg

* test: update experience tests

* style: rename zero_pad fn

* fix: defer init in CycledDataLoader

* test: add benchmark test

* style: rename internal fn of generation

* style: rename internal fn of lora

* fix: remove unused loss fn

* fix: remove unused utils fn

* refactor: remove generate_with_actor fn

* fix: fix type annotation

* test: add models tests

* fix: skip llama due to long execution time

* style: modify dataset

* style: apply formatter

* perf: update reward dataset

* fix: fix wrong IGNORE_INDEX in sft dataset

* fix: remove DataCollatorForSupervisedDataset

* test: add dataset tests

* style: apply formatter

* style: rename test_ci to test_train

* feat: add llama in inference

* test: add inference tests

* test: change test scripts directory

* fix: update ci

* fix: fix typo

* fix: skip llama due to oom

* fix: fix file mod

* style: apply formatter

* refactor: remove duplicated llama_gptq

* style: apply formatter

* to: update rm test

* feat: add tokenizer arg

* feat: add download model script

* test: update train tests

* fix: modify gemini load and save pretrained

* test: update checkpoint io test

* to: modify nproc_per_node

* fix: do not remove existing dir

* fix: modify save path

* test: add random choice

* fix: fix sft path

* fix: enlarge nproc_per_node to avoid oom

* fix: add num_retry

* fix: make lora config of rm and critic consistent

* fix: add warning about lora weights

* fix: skip some gpt2 tests

* fix: remove grad ckpt in rm and critic due to errors

* refactor: directly use Actor in train_sft

* test: add more arguments

* fix: disable grad ckpt when using lora

* fix: fix save_pretrained and related tests

* test: enable zero2 tests

* revert: remove useless fn

* style: polish code

* test: modify test args

da4f7b85

01 Aug, 2023 1 commit

[release] update version (#4332) · 80647712

Hongxin Liu authored Aug 01, 2023

* [release] update version

* [devops] hotfix cuda extension building

* [devops] pytest ignore useless folders

80647712

21 Jul, 2023 1 commit
- [ci] support testmon core pkg change detection (#4305) · 02192a63
  Hongxin Liu authored Jul 21, 2023
  
  02192a63
04 Jul, 2023 2 commits

[workflow] show test duration (#4159) · cc3cbe9f
Frank Lee authored Jul 04, 2023

cc3cbe9f

[chat] use official transformers and fix some issues (#4117) · 3d8d5d0d

Wenhao Chen authored Jul 04, 2023

* feat: remove on_learn_epoch fn as not used

* revert: add _on_learn_epoch fn

* feat: remove NaiveStrategy

* test: update train_prompts tests

* fix: remove prepare_llama_tokenizer_and_embedding

* test: add lora arg

* feat: remove roberta support in train_prompts due to runtime errs

* feat: remove deberta & roberta in rm as not used

* test: remove deberta and roberta tests

* feat: remove deberta and roberta models as not used

* fix: remove calls to roberta

* fix: remove prepare_llama_tokenizer_and_embedding

* chore: update transformers version

* docs: update transformers version

* fix: fix actor inference

* fix: fix ci

* feat: change llama pad token to unk

* revert: revert ddp setup_distributed

* fix: change llama pad token to unk

* revert: undo unnecessary changes

* fix: use pip to install transformers

3d8d5d0d

28 Jun, 2023 1 commit
- [workflow] added status check for test coverage workflow (#4106) · 1ee947f6
  Frank Lee authored Jun 28, 2023
  
  1ee947f6
22 Jun, 2023 1 commit
- [workflow] cover all public repositories in weekly report (#4069) · b463651f
  Frank Lee authored Jun 22, 2023
  
  b463651f
19 Jun, 2023 1 commit
- [devops] fix build on pr ci (#4043) · 4a81faa5
  Hongxin Liu authored Jun 19, 2023
```
* [devops] fix build on pr ci

* [devops] fix build on pr ci
```
  4a81faa5
13 Jun, 2023 1 commit
- [workflow] fixed the directory check in build (#3980) · 8bcad736
  Frank Lee authored Jun 13, 2023
  
  8bcad736
12 Jun, 2023 1 commit
- [workflow] cancel duplicated workflow jobs (#3960) · 6718a2f2
  Frank Lee authored Jun 12, 2023
  
  6718a2f2