- 27 Mar, 2024 1 commit
-
-
Insu Jang authored
* Use self.[distribute_layers|get_stage_index] to exploit custom layer distribution * Change static methods for t5 layer distribution to member functions * Change static methods for whisper layer distribution to member functions * Replace whisper policy usage with self one * Fix test case to use non-static layer distribution methods * fix: fix typo --------- Co-authored-by:Wenhao Chen <cwher@outlook.com>
-
- 26 Mar, 2024 1 commit
-
-
Yuanheng Zhao authored
-
- 25 Mar, 2024 2 commits
-
-
binmakeswell authored
* [release] grok-1 inference benchmark * [release] grok-1 inference benchmark * [release] grok-1 inference benchmark * [release] grok-1 inference benchmark * [release] grok-1 inference benchmark
-
Wenhao Chen authored
* fix: simplify merge_batch * fix: use return_outputs=False to eliminate extra memory consumption * feat: add return_outputs warning * style: remove `return_outputs=False` as it is the default value
-
- 24 Mar, 2024 1 commit
-
-
Yuanheng Zhao authored
* revise grok-1 example * remove unused arg in scripts * prevent re-installing torch * update readme * revert modifying colossalai requirements * add perf * trivial * add tokenizer url
-
- 22 Mar, 2024 1 commit
-
-
binmakeswell authored
* [release] grok-1 inference * [release] grok-1 inference * [release] grok-1 inference
-
- 21 Mar, 2024 1 commit
-
-
Hongxin Liu authored
* [misc] add submodule * remove submodule * [example] support grok-1 tp inference * [example] add grok-1 inference script * [example] refactor code * [example] add grok-1 readme * [exmaple] add test ci * [exmaple] update readme
-
- 12 Mar, 2024 1 commit
-
-
digger yu authored
-
- 05 Mar, 2024 4 commits
-
-
Youngon authored
* Update train_ddp.yaml delete "strategy" to fix DDP config loading bug in "main.py" * Update train_ddp.yaml fix inference with scripts/txt2img.py config file load bug. * Update README.md add pretrain model test code.
-
Luo Yihang authored
-
MickeyCHAN authored
* fix import error * Update dpt_depth.py --------- Co-authored-by:binmakeswell <binmakeswell@gmail.com>
-
Hongxin Liu authored
-
- 04 Mar, 2024 1 commit
-
-
flybird11111 authored
* benchmark gpt2 * fix fix fix fix * [doc] fix typo in Colossal-LLaMA-2/README.md (#5247) * [workflow] fixed build CI (#5240) * [workflow] fixed build CI * polish * polish * polish * polish * polish * [ci] fixed booster test (#5251) * [ci] fixed booster test * [ci] fixed booster test * [ci] fixed booster test * [ci] fixed ddp test (#5254) * [ci] fixed ddp test * polish * fix typo in applications/ColossalEval/README.md (#5250) * [ci] fix shardformer tests. (#5255) * fix ci fix * revert: revert p2p * feat: add enable_metadata_cache option * revert: enable t5 tests --------- Co-authored-by:
Wenhao Chen <cwher@outlook.com> * [doc] fix doc typo (#5256) * [doc] fix annotation display * [doc] fix llama2 doc * [hotfix]: add pp sanity check and fix mbs arg (#5268) * fix: fix misleading mbs arg * feat: add pp sanity check * fix: fix 1f1b sanity check * [workflow] fixed incomplete bash command (#5272) * [workflow] fixed oom tests (#5275) * [workflow] fixed oom tests * polish * polish * polish * [ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (#5276) * fix ci fix * fix test * revert: revert p2p * feat: add enable_metadata_cache option * revert: enable t5 tests * fix --------- Co-authored-by:
Wenhao Chen <cwher@outlook.com> * [shardformer] hybridparallelplugin support gradients accumulation. (#5246) * support gradients acc fix fix fix fix fix fix fix fix fix fix fix fix fix * fix fix * fix fix fix * [hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230) * fix auto loading gpt2 tokenizer (#5279) * [doc] add llama2-13B disyplay (#5285) * Update README.md * fix 13b typo --------- Co-authored-by:
binmakeswell <binmakeswell@gmail.com> * fix llama pretrain (#5287) * fix * fix * fix fix * fix fix fix * fix fix * benchmark gpt2 * fix fix fix fix * [workflow] fixed build CI (#5240) * [workflow] fixed build CI * polish * polish * polish * polish * polish * [ci] fixed booster test (#5251) * [ci] fixed booster test * [ci] fixed booster test * [ci] fixed booster test * fix fix * fix fix fix * fix * fix fix fix fix fix * fix * Update shardformer.py --------- Co-authored-by:
digger yu <digger-yu@outlook.com> Co-authored-by:
Frank Lee <somerlee.9@gmail.com> Co-authored-by:
Wenhao Chen <cwher@outlook.com> Co-authored-by:
binmakeswell <binmakeswell@gmail.com> Co-authored-by:
Zhongkai Zhao <kanezz620@gmail.com> Co-authored-by:
Michelle <97082656+MichelleMa8@users.noreply.github.com> Co-authored-by:
Desperado-Jia <502205863@qq.com>
-
- 27 Feb, 2024 1 commit
-
-
Hongxin Liu authored
-
- 30 Jan, 2024 1 commit
-
-
digger yu authored
-
- 25 Jan, 2024 2 commits
- 19 Jan, 2024 1 commit
-
-
flybird11111 authored
-
- 15 Jan, 2024 1 commit
-
-
Wenhao Chen authored
* fix: fix misleading mbs arg * feat: add pp sanity check * fix: fix 1f1b sanity check
-
- 11 Jan, 2024 1 commit
-
-
binmakeswell authored
* [doc] fix annotation display * [doc] fix llama2 doc
-
- 09 Jan, 2024 1 commit
-
-
Hongxin Liu authored
* update accelerator * fix timer * fix amp * update * fix * update bug * add error raise * fix autocast * fix set device * remove doc accelerator * update doc * update doc * update doc * use nullcontext * update cpu * update null context * change time limit for example * udpate * update * update * update * [npu] polish accelerator code --------- Co-authored-by:
Xuanlei Zhao <xuanlei.zhao@gmail.com> Co-authored-by:
zxl <43881818+oahzxl@users.noreply.github.com>
-
- 08 Jan, 2024 1 commit
-
-
Xuanlei Zhao authored
* update extension * update cpu adam * update is * add doc for cpu adam * update kernel * update commit * update flash * update memory efficient * update flash attn * update flash attention loader * update api * fix * update doc * update example time limit * reverse change * fix doc * remove useless kernel * fix * not use warning * update * update
-
- 02 Jan, 2024 1 commit
-
-
Wenhao Chen authored
* fix: remove drop last in val & test dataloader * feat: add run_forward_only, support arbitrary bs * chore: modify ci script
-
- 22 Dec, 2023 1 commit
-
-
Wenhao Chen authored
* test: add more p2p tests * fix: remove send_forward_recv_forward as p2p op list need to use the same group * fix: make send and receive atomic * feat: update P2PComm fn * feat: add metadata cache in 1f1b * feat: add metadata cache in interleaved pp * feat: modify is_xx_stage fn * revert: add _broadcast_object_list * feat: add interleaved pp in llama policy * feat: set NCCL_BUFFSIZE in HybridParallelPlugin
-
- 08 Dec, 2023 1 commit
-
-
flybird11111 authored
* fix aaa fix fix fix * fix * fix * test ci * fix ci fix
-
- 28 Nov, 2023 2 commits
-
-
binmakeswell authored
* [doc] add moe news * [doc] add moe news * [doc] add moe news
-
Wenhao Chen authored
* [shardformer] implement policy for all GPT-J models and test * [shardformer] support interleaved pipeline parallel for bert finetune * [shardformer] shardformer support falcon (#4883) * [shardformer]: fix interleaved pipeline for bert model (#5048) * [hotfix]: disable seq parallel for gptj and falcon, and polish code (#5093) * Add Mistral support for Shardformer (#5103) * [shardformer] add tests to mistral (#5105) --------- Co-authored-by:
Pengtai Xu <henryxu880@gmail.com> Co-authored-by:
ppt0011 <143150326+ppt0011@users.noreply.github.com> Co-authored-by:
flybird11111 <1829166702@qq.com> Co-authored-by:
eric8607242 <e0928021388@gmail.com>
-
- 27 Nov, 2023 1 commit
-
-
digger yu authored
-
- 22 Nov, 2023 2 commits
-
-
Xuanlei Zhao authored
* llama 3d * update * fix autocast
-
flybird11111 authored
* fix flash attn * fix fix
-
- 21 Nov, 2023 1 commit
-
-
Hongxin Liu authored
* [setup] refactor infer setup * [hotfix] fix infenrece behavior on 1 1 gpu * [exmaple] refactor inference examples
-
- 20 Nov, 2023 5 commits
-
-
Bin Jia authored
* fix init model with random parameters * fix example
-
github-actions[bot] authored
Co-authored-by:github-actions <github-actions@github.com>
-
Xu Kai authored
* update examples and engine * fix choices * update example
-
Hongxin Liu authored
* [npu] setup device utils (#5047) * [npu] add npu device support * [npu] support low level zero * [test] update npu zero plugin test * [hotfix] fix import * [test] recover tests * [npu] gemini support npu (#5052) * [npu] refactor device utils * [gemini] support npu * [example] llama2+gemini support npu * [kernel] add arm cpu adam kernel (#5065) * [kernel] add arm cpu adam * [optim] update adam optimizer * [kernel] arm cpu adam remove bf16 support
-
Cuiqing Li (李崔卿) authored
* added flash-decoidng of triton based on lightllm kernel * add req * clean * clean * delete build.sh --------- Co-authored-by:cuiqing.li <lixx336@gmail.com>
-
- 19 Nov, 2023 1 commit
-
-
Xu Kai authored
* [inference] support only TP (#4998) * support only tp * enable tp * add support for bloom (#5008) * [refactor] refactor gptq and smoothquant llama (#5012) * refactor gptq and smoothquant llama * fix import error * fix linear import torch-int * fix smoothquant llama import error * fix import accelerate error * fix bug * fix import smooth cuda * fix smoothcuda * [Inference Refactor] Merge chatglm2 with pp and tp (#5023) merge chatglm with pp and tp * [Refactor] remove useless inference code (#5022) * remove useless code * fix quant model * fix test import bug * mv original inference legacy * fix chatglm2 * [Refactor] refactor policy search and quant type controlling in inference (#5035) * [Refactor] refactor policy search and quant type controling in inference * [inference] update readme (#5051) * update readme * update readme * fix architecture * fix table * fix table * [inference] udpate example (#5053) * udpate example * fix run.sh * fix rebase bug * fix some errors * update readme * add some features * update interface * update readme * update benchmark * add requirements-infer --------- Co-authored-by:
Bin Jia <45593998+FoolPlayer@users.noreply.github.com> Co-authored-by:
Zhongkai Zhao <kanezz620@gmail.com>
-
- 18 Nov, 2023 1 commit
-
-
flybird11111 authored
fix llama example
-
- 16 Nov, 2023 2 commits
-
-
Elsa Granger authored
[pipeline,shardformer] Fix p2p efficiency in pipeline, allow skipping loading weight not in weight_map when `strict=False`, fix llama flash attention forward, add flop estimation by megatron in llama benchmark (#5017) * Use p2p * Cannot bidirectonal send p2p * Refactor tensor creation and serialization in P2P communication * Fix llama forward args in flash attention * Add flop estimate from megatron * Support loading weight not in weight_map when strict=False in hybrid_parallel * Use send_forward_recv_backward, etc in 1f1b * Use dataclass for metdata Remove torch.cuda.synchronize() as suggested * Add comment about the torch.cuda.synchronize for potential error * Typo * Update hybrid_parallel_checkpoint_io.py * Update p2p.py * Update one_f_one_b.py * Update p2p.py --------- Co-authored-by:flybird11111 <1829166702@qq.com>
-
Cuiqing Li (李崔卿) authored
* update flash-context-attention * adding kernels * fix * reset * add build script * add building process * add llama2 exmaple * add colossal-llama2 test * clean * fall back test setting * fix test file * clean * clean * clean --------- Co-authored-by:cuiqing.li <lixx336@gmail.com>
-