- 07 Apr, 2024 3 commits
-
-
digger yu authored
-
digger yu authored
-
Edenzzzz authored
Co-authored-by:Edenzzzz <wtan45@wisc.edu>
-
- 03 Apr, 2024 1 commit
-
-
Zhongkai Zhao authored
* sequence parallel optimization * validate sequence parallel in llama (code to be polished) * shardformer api writing * integrate sequence parallel in ShardFormer * fix pp bugs and sp bugs for LlaMa model * integrating ring-based sequence parallelism into ShardFormer * [sequence parallelism]: Add fused megatron function * integrating ring-based sequence parallelism into ShardFormer --------- Co-authored-by:
linsj20 <linsj20@mails.tsinghua.edu.cn> * fix bugs when useing sp and flashattention together * fix operation function name * support flash attention for ulysses-style sp * clarify sp process group * fix compatibility bugs in moe plugin * fix fused linear bugs * fix linear layer test * support gpt model all-to-all sp * modify shard data dimension (meant to be dim=-1) * support megtron-style sp and distributed attn for llama model * [shardformer] add megatron sp to llama * support llama7B 128k with distributed attention * [shardformer] robustness enhancement * add block attn * sp mode 1: keep input as a complete sequence * fix sp compatability * finish sp mode 3 support for gpt * using all_to_all_single when batch size is 1 * support mode 2 sp in gpt2 (#5) * [shardformer] add megatron sp to llama * support llama7B 128k with distributed attention * [shardformer] robustness enhancement * add block attn * sp mode 1: keep input as a complete sequence * fix sp compatability * refactor ring implementation * support mode 2 sp in gpt2 * polish code * enable distributed attn mask when using sp mode 2 and 3 in llama * automatically enable flash attn when using sp mode 2 and 3 in llama * inplace attn mask * add zero2 support for sequence parallel * polish code * fix bugs * fix gemini checkpoint io * loose tensor checking atol and rtol * add comment * fix llama layernorm grad * fix zero grad * fix zero grad * fix conflict * update split and gather auto grad func * sequence parallel: inside text split (#6) * polish code (part 1) * polish code (part 2) * polish code (part 2.5) * polish code (part 3) * sequence parallel: inside text split * miscellaneous minor fixes * polish code * fix ulysses style ZeRO * sequence parallel: inside text split * miscellaneous minor fixes * disaggregate sp group and dp group for sp * fix llama and gpt sp * polish code * move ulysses grad sync to ddp (#9) * remove zero_stage and unbind the grad sync for alltoall sp * add 2d group creation test * move ulysses grad sync to ddp * add 2d group creation test * remove useless code * change shard config not to enable sp when enable_all_optimizations * add sp warnings for several model * remove useless code --------- Co-authored-by:
linsj20 <linsj20@mails.tsinghua.edu.cn>
-
- 02 Apr, 2024 1 commit
-
-
Edenzzzz authored
Co-authored-by:Edenzzzz <wtan45@wisc.edu>
-
- 01 Apr, 2024 1 commit
-
-
Wenhao Chen authored
[shardformer, pipeline] add `gradient_checkpointing_ratio` and heterogenous shard policy for llama (#5508) * feat: add `GradientCheckpointConfig` and `PipelineGradientCheckpointConfig` * feat: apply `GradientCheckpointConfig` to policy and llama_forward * feat: move `distribute_layer` and `get_stage_index` to PipelineStageManager * fix: add optional args for `distribute_layer` and `get_stage_index` * fix: fix changed API calls * test: update llama tests * style: polish `GradientCheckpointConfig` * fix: fix pipeline utils tests
-
- 29 Mar, 2024 1 commit
-
-
YeAnbang authored
* Add dpo. Fix sft, ppo, lora. Refactor all * fix and tested ppo * 2 nd round refactor * add ci tests * fix ci * fix ci * fix readme, style * fix readme style * fix style, fix benchmark * reproduce benchmark result, remove useless files * rename to ColossalChat * use new image * fix ci workflow * fix ci * use local model/tokenizer for ci tests * fix ci * fix ci * fix ci * fix ci timeout * fix rm progress bar. fix ci timeout * fix ci * fix ci typo * remove 3d plugin from ci temporary * test environment * cannot save optimizer * support chat template * fix readme * fix path * test ci locally * restore build_or_pr * fix ci data path * fix benchmark * fix ci, move ci tests to 3080, disable fast tokenizer * move ci to 85 * support flash attention 2 * add all-in-one data preparation script. Fix colossal-llama2-chat chat template * add hardware requirements * move ci test data * fix save_model, add unwrap * fix missing bos * fix missing bos; support grad accumulation with gemini * fix ci * fix ci * fix ci * fix llama2 chat template config * debug sft * debug sft * fix colossalai version requirement * fix ci * add sanity check to prevent NaN loss * fix requirements * add dummy data generation script * add dummy data generation script * add dummy data generation script * add dummy data generation script * update readme * update readme * update readme and ignore * fix logger bug * support parallel_output * modify data preparation logic * fix tokenization * update lr * fix inference * run pre-commit --------- Co-authored-by:Tong Li <tong.li352711588@gmail.com>
-
- 28 Mar, 2024 1 commit
-
-
Yuanheng Zhao authored
* [fix] use tokenizer from the same pretrained path * trust remote code
-
- 27 Mar, 2024 3 commits
-
-
Insu Jang authored
* Use self.[distribute_layers|get_stage_index] to exploit custom layer distribution * Change static methods for t5 layer distribution to member functions * Change static methods for whisper layer distribution to member functions * Replace whisper policy usage with self one * Fix test case to use non-static layer distribution methods * fix: fix typo --------- Co-authored-by:Wenhao Chen <cwher@outlook.com>
-
github-actions[bot] authored
Co-authored-by:github-actions <github-actions@github.com>
-
Hongxin Liu authored
* [feature] refactor colo attention (#5462) * [extension] update api * [feature] add colo attention * [feature] update sdpa * [feature] update npu attention * [feature] update flash-attn * [test] add flash attn test * [test] update flash attn test * [shardformer] update modeling to fit colo attention (#5465) * [misc] refactor folder structure * [shardformer] update llama flash-attn * [shardformer] fix llama policy * [devops] update tensornvme install * [test] update llama test * [shardformer] update colo attn kernel dispatch * [shardformer] update blip2 * [shardformer] update chatglm * [shardformer] update gpt2 * [shardformer] update gptj * [shardformer] update opt * [shardformer] update vit * [shardformer] update colo attention mask prep * [shardformer] update whisper * [test] fix shardformer tests (#5514) * [test] fix shardformer tests * [test] fix shardformer tests
-
- 26 Mar, 2024 6 commits
-
-
Edenzzzz authored
Fix layout convertor caching
-
Edenzzzz authored
-
Edenzzzz authored
-
Rocky Duan authored
-
Hongxin Liu authored
-
Yuanheng Zhao authored
-
- 25 Mar, 2024 3 commits
-
-
flybird11111 authored
* fix * padding vocab_size when using pipeline parallellism padding vocab_size when using pipeline parallellism fix fix * fix * fix fix fix * fix gather output * fix * fix * fix fix resize embedding fix resize embedding * fix resize embedding fix * revert * revert * revert * fix lm forward distribution * fix * test ci * fix
-
binmakeswell authored
* [release] grok-1 inference benchmark * [release] grok-1 inference benchmark * [release] grok-1 inference benchmark * [release] grok-1 inference benchmark * [release] grok-1 inference benchmark
-
Wenhao Chen authored
* fix: simplify merge_batch * fix: use return_outputs=False to eliminate extra memory consumption * feat: add return_outputs warning * style: remove `return_outputs=False` as it is the default value
-
- 24 Mar, 2024 1 commit
-
-
Yuanheng Zhao authored
* revise grok-1 example * remove unused arg in scripts * prevent re-installing torch * update readme * revert modifying colossalai requirements * add perf * trivial * add tokenizer url
-
- 22 Mar, 2024 1 commit
-
-
binmakeswell authored
* [release] grok-1 inference * [release] grok-1 inference * [release] grok-1 inference
-
- 21 Mar, 2024 1 commit
-
-
Hongxin Liu authored
* [misc] add submodule * remove submodule * [example] support grok-1 tp inference * [example] add grok-1 inference script * [example] refactor code * [example] add grok-1 readme * [exmaple] add test ci * [exmaple] update readme
-
- 20 Mar, 2024 1 commit
-
-
binmakeswell authored
* [doc] update open-sora demo * [doc] update open-sora demo * [doc] update open-sora demo
-
- 18 Mar, 2024 2 commits
-
-
binmakeswell authored
* [doc] release Open-Sora 1.0 with model weights * [doc] release Open-Sora 1.0 with model weights * [doc] release Open-Sora 1.0 with model weights
-
flybird11111 authored
* fix * padding vocab_size when using pipeline parallellism padding vocab_size when using pipeline parallellism fix fix * fix * fix fix fix * fix gather output * fix * fix * fix fix resize embedding fix resize embedding * fix resize embedding fix * revert * revert * revert
-
- 13 Mar, 2024 1 commit
-
-
Hongxin Liu authored
* [devops] fix compatibility * [hotfix] update compatibility test on pr * [devops] fix compatibility * [devops] record duration during comp test * [test] decrease test duration * fix falcon
-
- 12 Mar, 2024 1 commit
-
-
digger yu authored
-
- 11 Mar, 2024 1 commit
-
-
Camille Zhong authored
-
- 07 Mar, 2024 2 commits
-
-
Hongxin Liu authored
-
Camille Zhong authored
* add stream chat for chat version * remove os.system clear * modify function name
-
- 05 Mar, 2024 9 commits
-
-
Youngon authored
* Update train_ddp.yaml delete "strategy" to fix DDP config loading bug in "main.py" * Update train_ddp.yaml fix inference with scripts/txt2img.py config file load bug. * Update README.md add pretrain model test code.
-
hugo-syn authored
Signed-off-by:hugo-syn <hugo.vincent@synacktiv.com>
-
digger yu authored
Co-authored-by:binmakeswell <binmakeswell@gmail.com>
-
Dongruixuan Li authored
-
digger yu authored
-
digger yu authored
-
digger yu authored
-
Luo Yihang authored
-
MickeyCHAN authored
* fix import error * Update dpt_depth.py --------- Co-authored-by:binmakeswell <binmakeswell@gmail.com>
-