Commits · 341263df48bbef1174c41b6c4f5f6785f895b0d4 · OpenDAS / ColossalAI

07 Apr, 2024 3 commits
- [hotfix] fix typo s/get_defualt_parser /get_default_parser (#5548) · 341263df
  digger yu authored Apr 07, 2024
  
  341263df
- [fix] fix typo s/muiti-node /multi-node etc. (#5448) · a799ca34
  digger yu authored Apr 07, 2024
  
  a799ca34
- [hotfix] quick fixes to make legacy tutorials runnable (#5559) · 15055f9a
  Edenzzzz authored Apr 07, 2024
```
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
```
  15055f9a
03 Apr, 2024 1 commit

[shardformer] Sequence Parallelism Optimization (#5533) · 8e412a54

Zhongkai Zhao authored Apr 03, 2024



* sequence parallel optimization

* validate sequence parallel in llama (code to be polished)

* shardformer api writing

* integrate sequence parallel in ShardFormer

* fix pp bugs and sp bugs for LlaMa model

* integrating ring-based sequence parallelism into ShardFormer

* [sequence parallelism]: Add fused megatron function

* integrating ring-based sequence parallelism into ShardFormer

---------
Co-authored-by: linsj20 <linsj20@mails.tsinghua.edu.cn>

* fix bugs when useing sp and flashattention together

* fix operation function name

* support flash attention for ulysses-style sp

* clarify sp process group

* fix compatibility bugs in moe plugin

* fix fused linear bugs

* fix linear layer test

* support gpt model all-to-all sp

* modify shard data dimension (meant to be dim=-1)

* support megtron-style sp and distributed attn for llama model

* [shardformer] add megatron sp to llama

* support llama7B 128k with distributed attention

* [shardformer] robustness enhancement

* add block attn

* sp mode 1: keep input as a complete sequence

* fix sp compatability

* finish sp mode 3 support for gpt

* using all_to_all_single when batch size is 1

* support mode 2 sp in gpt2 (#5)

* [shardformer] add megatron sp to llama

* support llama7B 128k with distributed attention

* [shardformer] robustness enhancement

* add block attn

* sp mode 1: keep input as a complete sequence

* fix sp compatability

* refactor ring implementation

* support mode 2 sp in gpt2

* polish code

* enable distributed attn mask when using sp mode 2 and 3 in llama

* automatically enable flash attn when using sp mode 2 and 3 in llama

* inplace attn mask

* add zero2 support for sequence parallel

* polish code

* fix bugs

* fix gemini checkpoint io

* loose tensor checking atol and rtol

* add comment

* fix llama layernorm grad

* fix zero grad

* fix zero grad

* fix conflict

* update split and gather auto grad func

* sequence parallel: inside text split (#6)

* polish code (part 1)

* polish code (part 2)

* polish code (part 2.5)

* polish code (part 3)

* sequence parallel: inside text split

* miscellaneous minor fixes

* polish code

* fix ulysses style ZeRO

* sequence parallel: inside text split

* miscellaneous minor fixes

* disaggregate sp group and dp group for  sp

* fix llama and gpt sp

* polish code

* move ulysses grad sync to ddp (#9)

* remove zero_stage and unbind the grad sync for alltoall sp

* add 2d group creation test

* move ulysses grad sync to ddp

* add 2d group creation test

* remove useless code

* change shard config not to enable sp when enable_all_optimizations

* add sp warnings for several model

* remove useless code

---------
Co-authored-by: linsj20 <linsj20@mails.tsinghua.edu.cn>

8e412a54

02 Apr, 2024 1 commit
- fix incorrect sharding without zero (#5545) · 7e0ec5a8
  Edenzzzz authored Apr 02, 2024
```
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
```
  7e0ec5a8
01 Apr, 2024 1 commit

[shardformer, pipeline] add `gradient_checkpointing_ratio` and heterogenous... · e614aa34

Wenhao Chen authored Apr 01, 2024

[shardformer, pipeline] add `gradient_checkpointing_ratio` and heterogenous shard policy for llama (#5508)

* feat: add `GradientCheckpointConfig` and `PipelineGradientCheckpointConfig`

* feat: apply `GradientCheckpointConfig` to policy and llama_forward

* feat: move `distribute_layer` and `get_stage_index` to PipelineStageManager

* fix: add optional args for `distribute_layer` and `get_stage_index`

* fix: fix changed API calls

* test: update llama tests

* style: polish `GradientCheckpointConfig`

* fix: fix pipeline utils tests

e614aa34

29 Mar, 2024 1 commit

[ColossalChat] Update RLHF V2 (#5286) · df5e9c53

YeAnbang authored Mar 29, 2024



* Add dpo. Fix sft, ppo, lora. Refactor all

* fix and tested ppo

* 2 nd round refactor

* add ci tests

* fix ci

* fix ci

* fix readme, style

* fix readme style

* fix style, fix benchmark

* reproduce benchmark result, remove useless files

* rename to ColossalChat

* use new image

* fix ci workflow

* fix ci

* use local model/tokenizer for ci tests

* fix ci

* fix ci

* fix ci

* fix ci timeout

* fix rm progress bar. fix ci timeout

* fix ci

* fix ci typo

* remove 3d plugin from ci temporary

* test environment

* cannot save optimizer

* support chat template

* fix readme

* fix path

* test ci locally

* restore build_or_pr

* fix ci data path

* fix benchmark

* fix ci, move ci tests to 3080, disable fast tokenizer

* move ci to 85

* support flash attention 2

* add all-in-one data preparation script. Fix colossal-llama2-chat chat template

* add hardware requirements

* move ci test data

* fix save_model, add unwrap

* fix missing bos

* fix missing bos; support grad accumulation with gemini

* fix ci

* fix ci

* fix ci

* fix llama2 chat template config

* debug sft

* debug sft

* fix colossalai version requirement

* fix ci

* add sanity check to prevent NaN loss

* fix requirements

* add dummy data generation script

* add dummy data generation script

* add dummy data generation script

* add dummy data generation script

* update readme

* update readme

* update readme and ignore

* fix logger bug

* support parallel_output

* modify data preparation logic

* fix tokenization

* update lr

* fix inference

* run pre-commit

---------
Co-authored-by: Tong Li <tong.li352711588@gmail.com>

df5e9c53

28 Mar, 2024 1 commit
- [Fix] Grok-1 use tokenizer from the same pretrained path (#5532) · 36c4bb28
  Yuanheng Zhao authored Mar 28, 2024
```
* [fix] use tokenizer from the same pretrained path

* trust remote code
```
  36c4bb28
27 Mar, 2024 3 commits

[shardformer] fix pipeline forward error if custom layer distribution is used (#5189) · 00525f77

Insu Jang authored Mar 27, 2024



* Use self.[distribute_layers|get_stage_index] to exploit custom layer distribution

* Change static methods for t5 layer distribution to member functions

* Change static methods for whisper layer distribution to member functions

* Replace whisper policy usage with self one

* Fix test case to use non-static layer distribution methods

* fix: fix typo

---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>

00525f77

[format] applied code formatting on changed files in pull request 5510 (#5517) · e6707a6e
github-actions[bot] authored Mar 27, 2024
```
Co-authored-by: github-actions <github-actions@github.com>
```
e6707a6e

[shardformer] update colo attention to support custom mask (#5510) · 19e1a5cf

Hongxin Liu authored Mar 27, 2024

* [feature] refactor colo attention (#5462)

* [extension] update api

* [feature] add colo attention

* [feature] update sdpa

* [feature] update npu attention

* [feature] update flash-attn

* [test] add flash attn test

* [test] update flash attn test

* [shardformer] update modeling to fit colo attention (#5465)

* [misc] refactor folder structure

* [shardformer] update llama flash-attn

* [shardformer] fix llama policy

* [devops] update tensornvme install

* [test] update llama test

* [shardformer] update colo attn kernel dispatch

* [shardformer] update blip2

* [shardformer] update chatglm

* [shardformer] update gpt2

* [shardformer] update gptj

* [shardformer] update opt

* [shardformer] update vit

* [shardformer] update colo attention mask prep

* [shardformer] update whisper

* [test] fix shardformer tests (#5514)

* [test] fix shardformer tests

* [test] fix shardformer tests

19e1a5cf

26 Mar, 2024 6 commits
- Merge pull request #5515 from Edenzzzz/fix_layout_convert · 9a3321e9
  Edenzzzz authored Mar 26, 2024
```
Fix layout convertor caching
```
  9a3321e9
- Empty-Commit · 18edcd53
  Edenzzzz authored Mar 26, 2024
  
  18edcd53
- fixed layout converter caching and updated tester · 61da3fbc
  Edenzzzz authored Mar 26, 2024
  
  61da3fbc
- Fix ColoTensorSpec for py11 (#5440) · cbe34c55
  Rocky Duan authored Mar 26, 2024
  
  cbe34c55
- [devops] fix example test ci (#5504) · a7790a92
  Hongxin Liu authored Mar 26, 2024
  
  a7790a92
- [fix] fix grok-1 example typo (#5506) · 131f32a0
  Yuanheng Zhao authored Mar 26, 2024
  
  131f32a0
25 Mar, 2024 3 commits

[shardformer]Fix lm parallel. (#5480) · 0688d92e

flybird11111 authored Mar 25, 2024

* fix

* padding vocab_size when using pipeline parallellism

padding vocab_size when using pipeline parallellism

fix

fix

* fix

* fix

fix

fix

* fix gather output

* fix

* fix

* fix

fix resize embedding

fix resize embedding

* fix resize embedding

fix

* revert

* revert

* revert

* fix lm forward distribution

* fix

* test ci

* fix

0688d92e

[release] grok-1 inference benchmark (#5500) · 34e90925

binmakeswell authored Mar 25, 2024

* [release] grok-1 inference benchmark

* [release] grok-1 inference benchmark

* [release] grok-1 inference benchmark

* [release] grok-1 inference benchmark

* [release] grok-1 inference benchmark

34e90925

[hotfix] set return_outputs=False in examples and polish code (#5404) · bb0a668f

Wenhao Chen authored Mar 25, 2024

* fix: simplify merge_batch

* fix: use return_outputs=False to eliminate extra memory consumption

* feat: add return_outputs warning

* style: remove `return_outputs=False` as it is the default value

bb0a668f

24 Mar, 2024 1 commit

[example] update Grok-1 inference (#5495) · 5fcd7795

Yuanheng Zhao authored Mar 24, 2024

* revise grok-1 example

* remove unused arg in scripts

* prevent re-installing torch

* update readme

* revert modifying colossalai requirements

* add perf

* trivial

* add tokenizer url

5fcd7795

22 Mar, 2024 1 commit
- [release] grok-1 314b inference (#5490) · 6df844b8
  binmakeswell authored Mar 22, 2024
```
* [release] grok-1 inference

* [release] grok-1 inference

* [release] grok-1 inference
```
  6df844b8
21 Mar, 2024 1 commit

[example] add grok-1 inference (#5485) · 848a574c

Hongxin Liu authored Mar 21, 2024

* [misc] add submodule

* remove submodule

* [example] support grok-1 tp inference

* [example] add grok-1 inference script

* [example] refactor code

* [example] add grok-1 readme

* [exmaple] add test ci

* [exmaple] update readme

848a574c

20 Mar, 2024 1 commit

[doc] update open-sora demo (#5479) · d158fc0e

binmakeswell authored Mar 20, 2024

* [doc] update open-sora demo

* [doc] update open-sora demo

* [doc] update open-sora demo

d158fc0e

18 Mar, 2024 2 commits

[doc] release Open-Sora 1.0 with model weights (#5468) · bd998ced

binmakeswell authored Mar 18, 2024

* [doc] release Open-Sora 1.0 with model weights

* [doc] release Open-Sora 1.0 with model weights

* [doc] release Open-Sora 1.0 with model weights

bd998ced

[shardformer] fix gathering output when using tensor parallelism (#5431) · 5e16bf79

flybird11111 authored Mar 18, 2024

* fix

* padding vocab_size when using pipeline parallellism

padding vocab_size when using pipeline parallellism

fix

fix

* fix

* fix

fix

fix

* fix gather output

* fix

* fix

* fix

fix resize embedding

fix resize embedding

* fix resize embedding

fix

* revert

* revert

* revert

5e16bf79

13 Mar, 2024 1 commit

[devops] fix compatibility (#5444) · f2e8b9ef

Hongxin Liu authored Mar 13, 2024

* [devops] fix compatibility

* [hotfix] update compatibility test on pr

* [devops] fix compatibility

* [devops] record duration during comp test

* [test] decrease test duration

* fix falcon

f2e8b9ef

12 Mar, 2024 1 commit
- [hotfix] fix typo s/keywrods/keywords etc. (#5429) · 385e85af
  digger yu authored Mar 12, 2024
  
  385e85af
11 Mar, 2024 1 commit
- fix tensor data update for gemini loss caluculation (#5442) · da885ed5
  Camille Zhong authored Mar 11, 2024
  
  da885ed5
07 Mar, 2024 2 commits
- [release] update version (#5411) · 8020f426
  Hongxin Liu authored Mar 07, 2024
  
  8020f426
- [colossal-llama2] add stream chat examlple for chat version model (#5428) · 743e7fad
  Camille Zhong authored Mar 07, 2024
```
* add stream chat for chat version

* remove os.system clear

* modify function name
```
  743e7fad
05 Mar, 2024 9 commits
- [hotfix] fix stable diffusion inference bug. (#5289) · 68f55a70
  Youngon authored Mar 05, 2024
```
* Update train_ddp.yaml

delete  "strategy" to fix DDP config loading bug in "main.py"

* Update train_ddp.yaml

fix inference with scripts/txt2img.py config file load bug.

* Update README.md

add pretrain model test code.
```
  68f55a70
- [doc] Fix typo s/infered/inferred/ (#5288) · c8003d46
  hugo-syn authored Mar 05, 2024
```
Signed-off-by: hugo-syn <hugo.vincent@synacktiv.com>
```
  c8003d46
- [hotfix] fix typo change MoECheckpintIO to MoECheckpointIO (#5335) · 5e1c93d7
  digger yu authored Mar 05, 2024
```
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
```
  5e1c93d7
- [eval-hotfix] set few_shot_data to None when few shot is disabled (#5422) · a7ae2b5b
  Dongruixuan Li authored Mar 05, 2024
  
  a7ae2b5b
- [hotfix] fix typo change enabel to enable under colossalai/shardformer/ (#5317) · 049121d1
  digger yu authored Mar 05, 2024
  
  049121d1
- [hotfix] fix typo change _descrption to _description (#5331) · 16c96d4d
  digger yu authored Mar 05, 2024
  
  16c96d4d
- [doc] update some translations with README-zh-Hans.md (#5382) · 70cce5cb
  digger yu authored Mar 05, 2024
  
  70cce5cb
- [hotfix] fix typo of openmoe model source (#5403) · e239cf90
  Luo Yihang authored Mar 05, 2024
  
  e239cf90
- [hotfix] fix sd vit import error (#5420) · e304e4db
  MickeyCHAN authored Mar 05, 2024
```
* fix import error

* Update dpt_depth.py

---------
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
```
  e304e4db