Commits · e239cf9060d90a22557d685c9d6ce350520c5337 · OpenDAS / ColossalAI

05 Mar, 2024 2 commits
- [devops] fix extention building (#5427) · 070df689
  Hongxin Liu authored Mar 05, 2024
  
  070df689
- [doc] sora release (#5425) · 822241a9
  binmakeswell authored Mar 05, 2024
```
* [doc] sora release

* [doc] sora release

* [doc] sora release

* [doc] sora release
```
  822241a9
29 Feb, 2024 1 commit
- [doc] fix blog link · a1c6cdb1
  binmakeswell authored Feb 29, 2024
  
  a1c6cdb1
19 Feb, 2024 2 commits
- [doc] updated installation command (#5389) · 705a62a5
  Frank Lee authored Feb 19, 2024
  
  705a62a5
- [doc] Fix typo (#5361) · 69e3ad01
  yixiaoer authored Feb 19, 2024
  
  69e3ad01
25 Jan, 2024 1 commit
- fix some typo (#5307) · bce9499e
  digger yu authored Jan 25, 2024
  
  bce9499e
09 Jan, 2024 1 commit

[npu] change device to accelerator api (#5239) · d202cc28

Hongxin Liu authored Jan 09, 2024



* update accelerator

* fix timer

* fix amp

* update

* fix

* update bug

* add error raise

* fix autocast

* fix set device

* remove doc accelerator

* update doc

* update doc

* update doc

* use nullcontext

* update cpu

* update null context

* change time limit for example

* udpate

* update

* update

* update

* [npu] polish accelerator code

---------
Co-authored-by: Xuanlei Zhao <xuanlei.zhao@gmail.com>
Co-authored-by: zxl <43881818+oahzxl@users.noreply.github.com>

d202cc28

08 Jan, 2024 1 commit

[doc] SwiftInfer release (#5236) · 7bc6969c

binmakeswell authored Jan 08, 2024

* [doc] SwiftInfer release

* [doc] SwiftInfer release

* [doc] SwiftInfer release

* [doc] SwiftInfer release

* [doc] SwiftInfer release

7bc6969c

07 Jan, 2024 1 commit

[doc] add Colossal-LLaMA-2-13B (#5234) · b9b32b15

binmakeswell authored Jan 07, 2024

* [doc] add Colossal-LLaMA-2-13B

* [doc] add Colossal-LLaMA-2-13B

* [doc] add Colossal-LLaMA-2-13B

b9b32b15

15 Dec, 2023 1 commit

[doc] update pytorch version in documents. (#5177) · 681d9b12

flybird11111 authored Dec 15, 2023

* fix

aaa

fix

fix

fix

* fix

* fix

* test ci

* fix ci

fix

* update pytorch version in documents

681d9b12

28 Nov, 2023 2 commits

[doc] add moe news (#5128) · 177c79f2
binmakeswell authored Nov 28, 2023
```
* [doc] add moe news

* [doc] add moe news

* [doc] add moe news
```
177c79f2

[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088) · 7172459e

Wenhao Chen authored Nov 28, 2023



* [shardformer] implement policy for all GPT-J models and test

* [shardformer] support interleaved pipeline parallel for bert finetune

* [shardformer] shardformer support falcon (#4883)

* [shardformer]: fix interleaved pipeline for bert model (#5048)

* [hotfix]: disable seq parallel for gptj and falcon, and polish code (#5093)

* Add Mistral support for Shardformer (#5103)

* [shardformer] add tests to mistral (#5105)

---------
Co-authored-by: Pengtai Xu <henryxu880@gmail.com>
Co-authored-by: ppt0011 <143150326+ppt0011@users.noreply.github.com>
Co-authored-by: flybird11111 <1829166702@qq.com>
Co-authored-by: eric8607242 <e0928021388@gmail.com>

7172459e

27 Nov, 2023 1 commit
- [nfc] fix typo change directoty to directory (#5111) · d5661f0f
  digger yu authored Nov 27, 2023
  
  d5661f0f
24 Nov, 2023 1 commit
- fix typo change lazy_iniy to lazy_init (#5099) · 2bdf76f1
  digger yu authored Nov 24, 2023
  
  2bdf76f1
22 Nov, 2023 1 commit
- [nfc] fix typo and author name (#5089) · 0d482302
  digger yu authored Nov 22, 2023
  
  0d482302
21 Nov, 2023 1 commit
- [nfc] fix typo in docs/ (#4972) · fd3567e0
  digger yu authored Nov 21, 2023
  
  fd3567e0
31 Oct, 2023 1 commit
- [doc] add supported feature diagram for hybrid parallel plugin (#4996) · 335cb105
  ppt0011 authored Oct 31, 2023
  
  335cb105
18 Oct, 2023 1 commit
- [nfc] fix some typo with colossalai/ docs/ etc. (#4920) · 11009103
  digger yu authored Oct 18, 2023
  
  11009103
17 Oct, 2023 1 commit

[gemini] support gradient accumulation (#4869) · 21ba89ca

Baizhou Zhang authored Oct 17, 2023

* add test

* fix no_sync bug in low level zero plugin

* fix test

* add argument for grad accum

* add grad accum in backward hook for gemini

* finish implementation, rewrite tests

* fix test

* skip stuck model in low level zero test

* update doc

* optimize communication & fix gradient checkpoint

* modify doc

* cleaning codes

* update cpu adam fp16 case

21ba89ca

10 Oct, 2023 1 commit

[doc] update advanced tutorials, training gpt with hybrid parallelism (#4866) · 6a21f96a

flybird11111 authored Oct 10, 2023

* [doc]update advanced tutorials, training gpt with hybrid parallelism

* [doc]update advanced tutorials, training gpt with hybrid parallelism

* update vit tutorials

* update vit tutorials

* update vit tutorials

* update vit tutorials

* update en/train_vit_with_hybrid_parallel.py

* fix

* resolve comments

* fix

6a21f96a

05 Oct, 2023 1 commit
- [test] modify model supporting part of low_level_zero plugin (including correspoding docs) · db40e086
  Zhongkai Zhao authored Oct 05, 2023
  
  db40e086
27 Sep, 2023 2 commits
- [doc] update slack link (#4823) · 822051d8
  binmakeswell authored Sep 27, 2023
  
  822051d8
- [doc] add lazy init docs (#4808) · da15fdb9
  Hongxin Liu authored Sep 27, 2023
  
  da15fdb9
26 Sep, 2023 2 commits
- [checkpointio] support unsharded checkpointIO for hybrid parallel (#4774) · 64a08b2d
  Baizhou Zhang authored Sep 26, 2023
```
* support unsharded saving/loading for model

* support optimizer unsharded saving

* update doc

* support unsharded loading for optimizer

* small fix
```
  64a08b2d
- [doc] polish shardformer doc (#4779) · a2db7554
  Baizhou Zhang authored Sep 26, 2023
```
* fix example format in docstring

* polish shardformer doc
```
  a2db7554
25 Sep, 2023 1 commit
- [doc] add llama2 domain-specific solution news (#4789) · d512a4d3
  binmakeswell authored Sep 25, 2023
```
* [doc] add llama2 domain-specific solution news
```
  d512a4d3
21 Sep, 2023 2 commits
- [doc] add shardformer doc to sidebar (#4768) · 493a5efe
  Baizhou Zhang authored Sep 21, 2023
  
  493a5efe
- [doc] clean up outdated docs (#4765) · 66f39260
  Hongxin Liu authored Sep 21, 2023
```
* [doc] clean up outdated docs

* [doc] fix linking

* [doc] fix linking
```
  66f39260
20 Sep, 2023 1 commit
- [doc] put native colossalai plugins first in description section · 4d7537ba
  Pengtai Xu authored Sep 20, 2023
  
  4d7537ba
19 Sep, 2023 3 commits
- [doc] add model examples for each plugin · e10d9f08
  Pengtai Xu authored Sep 19, 2023
  
  e10d9f08
- [doc] put individual plugin explanation in front · a04337bf
  Pengtai Xu authored Sep 19, 2023
  
  a04337bf
- [doc] explain suitable use case for each plugin · 10513f20
  Pengtai Xu authored Sep 19, 2023
  
  10513f20
18 Sep, 2023 1 commit

[legacy] clean up legacy code (#4743) · b5f9e37c

Hongxin Liu authored Sep 18, 2023

* [legacy] remove outdated codes of pipeline (#4692)

* [legacy] remove cli of benchmark and update optim (#4690)

* [legacy] remove cli of benchmark and update optim

* [doc] fix cli doc test

* [legacy] fix engine clip grad norm

* [legacy] remove outdated colo tensor (#4694)

* [legacy] remove outdated colo tensor

* [test] fix test import

* [legacy] move outdated zero to legacy (#4696)

* [legacy] clean up utils (#4700)

* [legacy] clean up utils

* [example] update examples

* [legacy] clean up amp

* [legacy] fix amp module

* [legacy] clean up gpc (#4742)

* [legacy] clean up context

* [legacy] clean core, constants and global vars

* [legacy] refactor initialize

* [example] fix examples ci

* [example] fix examples ci

* [legacy] fix tests

* [example] fix gpt example

* [example] fix examples ci

* [devops] fix ci installation

* [example] fix examples ci

b5f9e37c

15 Sep, 2023 7 commits

[doc] explaination of loading large pretrained models (#4741) · d151dcab
Baizhou Zhang authored Sep 15, 2023

d151dcab
[doc] polish shardformer doc (#4735) · 451c3465
Baizhou Zhang authored Sep 15, 2023
```
* arrange position of chapters

* fix typos in seq parallel doc
```
451c3465
[shardformer] update seq parallel document (#4730) · 6a03c933
Bin Jia authored Sep 15, 2023
```
* update doc of seq parallel

* fix typo
```
6a03c933

[shardformer] update pipeline parallel document (#4725) · 46162632

flybird11111 authored Sep 15, 2023

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

46162632

[doc] add shardformer support matrix/update tensor parallel documents (#4728) · 50e5602c
Baizhou Zhang authored Sep 15, 2023
```
* add compatibility matrix for shardformer doc

* update tp doc
```
50e5602c
[format] applied code formatting on changed files in pull request 4726 (#4727) · 8c2dda74
github-actions[bot] authored Sep 15, 2023
```
Co-authored-by: github-actions <github-actions@github.com>
```
8c2dda74

[doc] Add user document for Shardformer (#4702) · f911d5b0

Baizhou Zhang authored Sep 15, 2023

* create shardformer doc files

* add docstring for seq-parallel

* update ShardConfig docstring

* add links to llama example

* add outdated massage

* finish introduction & supporting information

* finish 'how shardformer works'

* finish shardformer.md English doc

* fix doctest fail

* add Chinese document

f911d5b0