- 26 Apr, 2024 1 commit
-
-
Tong Li authored
-
- 25 Apr, 2024 1 commit
-
-
Hongxin Liu authored
* [shardformer] refactor pipeline grad ckpt config * [shardformer] refactor pipeline grad ckpt config * [pipeline] fix stage manager
-
- 23 Apr, 2024 2 commits
-
-
binmakeswell authored
* release llama3 * [release] llama3 * [release] llama3 * [release] llama3 * [release] llama3
-
Hongxin Liu authored
* [plugin] support dp inside for hybriad parallel * [example] update llama benchmark * [example] update llama benchmark * [example] update llama readme * [example] update llama readme
-
- 18 Apr, 2024 1 commit
-
-
Edenzzzz authored
* fix no pad token bug * fixed some auto parallel codegen bug, but might not run on torch 2.1 --------- Co-authored-by:Edenzzzz <wtan45@wisc.edu>
-
- 08 Apr, 2024 1 commit
-
-
Hongxin Liu authored
* [devops] remove post commit ci * [misc] run pre-commit on all files * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 07 Apr, 2024 3 commits
-
-
digger yu authored
-
digger yu authored
-
Edenzzzz authored
Co-authored-by:Edenzzzz <wtan45@wisc.edu>
-
- 01 Apr, 2024 1 commit
-
-
Wenhao Chen authored
[shardformer, pipeline] add `gradient_checkpointing_ratio` and heterogenous shard policy for llama (#5508) * feat: add `GradientCheckpointConfig` and `PipelineGradientCheckpointConfig` * feat: apply `GradientCheckpointConfig` to policy and llama_forward * feat: move `distribute_layer` and `get_stage_index` to PipelineStageManager * fix: add optional args for `distribute_layer` and `get_stage_index` * fix: fix changed API calls * test: update llama tests * style: polish `GradientCheckpointConfig` * fix: fix pipeline utils tests
-
- 28 Mar, 2024 1 commit
-
-
Yuanheng Zhao authored
* [fix] use tokenizer from the same pretrained path * trust remote code
-
- 27 Mar, 2024 1 commit
-
-
Insu Jang authored
* Use self.[distribute_layers|get_stage_index] to exploit custom layer distribution * Change static methods for t5 layer distribution to member functions * Change static methods for whisper layer distribution to member functions * Replace whisper policy usage with self one * Fix test case to use non-static layer distribution methods * fix: fix typo --------- Co-authored-by:Wenhao Chen <cwher@outlook.com>
-
- 26 Mar, 2024 1 commit
-
-
Yuanheng Zhao authored
-
- 25 Mar, 2024 2 commits
-
-
binmakeswell authored
* [release] grok-1 inference benchmark * [release] grok-1 inference benchmark * [release] grok-1 inference benchmark * [release] grok-1 inference benchmark * [release] grok-1 inference benchmark
-
Wenhao Chen authored
* fix: simplify merge_batch * fix: use return_outputs=False to eliminate extra memory consumption * feat: add return_outputs warning * style: remove `return_outputs=False` as it is the default value
-
- 24 Mar, 2024 1 commit
-
-
Yuanheng Zhao authored
* revise grok-1 example * remove unused arg in scripts * prevent re-installing torch * update readme * revert modifying colossalai requirements * add perf * trivial * add tokenizer url
-
- 22 Mar, 2024 1 commit
-
-
binmakeswell authored
* [release] grok-1 inference * [release] grok-1 inference * [release] grok-1 inference
-
- 21 Mar, 2024 1 commit
-
-
Hongxin Liu authored
* [misc] add submodule * remove submodule * [example] support grok-1 tp inference * [example] add grok-1 inference script * [example] refactor code * [example] add grok-1 readme * [exmaple] add test ci * [exmaple] update readme
-
- 12 Mar, 2024 1 commit
-
-
digger yu authored
-
- 05 Mar, 2024 4 commits
-
-
Youngon authored
* Update train_ddp.yaml delete "strategy" to fix DDP config loading bug in "main.py" * Update train_ddp.yaml fix inference with scripts/txt2img.py config file load bug. * Update README.md add pretrain model test code.
-
Luo Yihang authored
-
MickeyCHAN authored
* fix import error * Update dpt_depth.py --------- Co-authored-by:binmakeswell <binmakeswell@gmail.com>
-
Hongxin Liu authored
-
- 04 Mar, 2024 1 commit
-
-
flybird11111 authored
* benchmark gpt2 * fix fix fix fix * [doc] fix typo in Colossal-LLaMA-2/README.md (#5247) * [workflow] fixed build CI (#5240) * [workflow] fixed build CI * polish * polish * polish * polish * polish * [ci] fixed booster test (#5251) * [ci] fixed booster test * [ci] fixed booster test * [ci] fixed booster test * [ci] fixed ddp test (#5254) * [ci] fixed ddp test * polish * fix typo in applications/ColossalEval/README.md (#5250) * [ci] fix shardformer tests. (#5255) * fix ci fix * revert: revert p2p * feat: add enable_metadata_cache option * revert: enable t5 tests --------- Co-authored-by:
Wenhao Chen <cwher@outlook.com> * [doc] fix doc typo (#5256) * [doc] fix annotation display * [doc] fix llama2 doc * [hotfix]: add pp sanity check and fix mbs arg (#5268) * fix: fix misleading mbs arg * feat: add pp sanity check * fix: fix 1f1b sanity check * [workflow] fixed incomplete bash command (#5272) * [workflow] fixed oom tests (#5275) * [workflow] fixed oom tests * polish * polish * polish * [ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (#5276) * fix ci fix * fix test * revert: revert p2p * feat: add enable_metadata_cache option * revert: enable t5 tests * fix --------- Co-authored-by:
Wenhao Chen <cwher@outlook.com> * [shardformer] hybridparallelplugin support gradients accumulation. (#5246) * support gradients acc fix fix fix fix fix fix fix fix fix fix fix fix fix * fix fix * fix fix fix * [hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230) * fix auto loading gpt2 tokenizer (#5279) * [doc] add llama2-13B disyplay (#5285) * Update README.md * fix 13b typo --------- Co-authored-by:
binmakeswell <binmakeswell@gmail.com> * fix llama pretrain (#5287) * fix * fix * fix fix * fix fix fix * fix fix * benchmark gpt2 * fix fix fix fix * [workflow] fixed build CI (#5240) * [workflow] fixed build CI * polish * polish * polish * polish * polish * [ci] fixed booster test (#5251) * [ci] fixed booster test * [ci] fixed booster test * [ci] fixed booster test * fix fix * fix fix fix * fix * fix fix fix fix fix * fix * Update shardformer.py --------- Co-authored-by:
digger yu <digger-yu@outlook.com> Co-authored-by:
Frank Lee <somerlee.9@gmail.com> Co-authored-by:
Wenhao Chen <cwher@outlook.com> Co-authored-by:
binmakeswell <binmakeswell@gmail.com> Co-authored-by:
Zhongkai Zhao <kanezz620@gmail.com> Co-authored-by:
Michelle <97082656+MichelleMa8@users.noreply.github.com> Co-authored-by:
Desperado-Jia <502205863@qq.com>
-
- 27 Feb, 2024 1 commit
-
-
Hongxin Liu authored
-
- 30 Jan, 2024 1 commit
-
-
digger yu authored
-
- 25 Jan, 2024 2 commits
- 19 Jan, 2024 1 commit
-
-
flybird11111 authored
-
- 15 Jan, 2024 1 commit
-
-
Wenhao Chen authored
* fix: fix misleading mbs arg * feat: add pp sanity check * fix: fix 1f1b sanity check
-
- 11 Jan, 2024 1 commit
-
-
binmakeswell authored
* [doc] fix annotation display * [doc] fix llama2 doc
-
- 09 Jan, 2024 1 commit
-
-
Hongxin Liu authored
* update accelerator * fix timer * fix amp * update * fix * update bug * add error raise * fix autocast * fix set device * remove doc accelerator * update doc * update doc * update doc * use nullcontext * update cpu * update null context * change time limit for example * udpate * update * update * update * [npu] polish accelerator code --------- Co-authored-by:
Xuanlei Zhao <xuanlei.zhao@gmail.com> Co-authored-by:
zxl <43881818+oahzxl@users.noreply.github.com>
-
- 08 Jan, 2024 1 commit
-
-
Xuanlei Zhao authored
* update extension * update cpu adam * update is * add doc for cpu adam * update kernel * update commit * update flash * update memory efficient * update flash attn * update flash attention loader * update api * fix * update doc * update example time limit * reverse change * fix doc * remove useless kernel * fix * not use warning * update * update
-
- 02 Jan, 2024 1 commit
-
-
Wenhao Chen authored
* fix: remove drop last in val & test dataloader * feat: add run_forward_only, support arbitrary bs * chore: modify ci script
-
- 22 Dec, 2023 1 commit
-
-
Wenhao Chen authored
* test: add more p2p tests * fix: remove send_forward_recv_forward as p2p op list need to use the same group * fix: make send and receive atomic * feat: update P2PComm fn * feat: add metadata cache in 1f1b * feat: add metadata cache in interleaved pp * feat: modify is_xx_stage fn * revert: add _broadcast_object_list * feat: add interleaved pp in llama policy * feat: set NCCL_BUFFSIZE in HybridParallelPlugin
-
- 08 Dec, 2023 1 commit
-
-
flybird11111 authored
* fix aaa fix fix fix * fix * fix * test ci * fix ci fix
-
- 28 Nov, 2023 2 commits
-
-
binmakeswell authored
* [doc] add moe news * [doc] add moe news * [doc] add moe news
-
Wenhao Chen authored
* [shardformer] implement policy for all GPT-J models and test * [shardformer] support interleaved pipeline parallel for bert finetune * [shardformer] shardformer support falcon (#4883) * [shardformer]: fix interleaved pipeline for bert model (#5048) * [hotfix]: disable seq parallel for gptj and falcon, and polish code (#5093) * Add Mistral support for Shardformer (#5103) * [shardformer] add tests to mistral (#5105) --------- Co-authored-by:
Pengtai Xu <henryxu880@gmail.com> Co-authored-by:
ppt0011 <143150326+ppt0011@users.noreply.github.com> Co-authored-by:
flybird11111 <1829166702@qq.com> Co-authored-by:
eric8607242 <e0928021388@gmail.com>
-
- 27 Nov, 2023 1 commit
-
-
digger yu authored
-
- 22 Nov, 2023 1 commit
-
-
Xuanlei Zhao authored
* llama 3d * update * fix autocast
-