1. 07 Apr, 2024 3 commits
  2. 03 Apr, 2024 1 commit
    • Zhongkai Zhao's avatar
      [shardformer] Sequence Parallelism Optimization (#5533) · 8e412a54
      Zhongkai Zhao authored
      
      
      * sequence parallel optimization
      
      * validate sequence parallel in llama (code to be polished)
      
      * shardformer api writing
      
      * integrate sequence parallel in ShardFormer
      
      * fix pp bugs and sp bugs for LlaMa model
      
      * integrating ring-based sequence parallelism into ShardFormer
      
      * [sequence parallelism]: Add fused megatron function
      
      * integrating ring-based sequence parallelism into ShardFormer
      
      ---------
      Co-authored-by: default avatarlinsj20 <linsj20@mails.tsinghua.edu.cn>
      
      * fix bugs when useing sp and flashattention together
      
      * fix operation function name
      
      * support flash attention for ulysses-style sp
      
      * clarify sp process group
      
      * fix compatibility bugs in moe plugin
      
      * fix fused linear bugs
      
      * fix linear layer test
      
      * support gpt model all-to-all sp
      
      * modify shard data dimension (meant to be dim=-1)
      
      * support megtron-style sp and distributed attn for llama model
      
      * [shardformer] add megatron sp to llama
      
      * support llama7B 128k with distributed attention
      
      * [shardformer] robustness enhancement
      
      * add block attn
      
      * sp mode 1: keep input as a complete sequence
      
      * fix sp compatability
      
      * finish sp mode 3 support for gpt
      
      * using all_to_all_single when batch size is 1
      
      * support mode 2 sp in gpt2 (#5)
      
      * [shardformer] add megatron sp to llama
      
      * support llama7B 128k with distributed attention
      
      * [shardformer] robustness enhancement
      
      * add block attn
      
      * sp mode 1: keep input as a complete sequence
      
      * fix sp compatability
      
      * refactor ring implementation
      
      * support mode 2 sp in gpt2
      
      * polish code
      
      * enable distributed attn mask when using sp mode 2 and 3 in llama
      
      * automatically enable flash attn when using sp mode 2 and 3 in llama
      
      * inplace attn mask
      
      * add zero2 support for sequence parallel
      
      * polish code
      
      * fix bugs
      
      * fix gemini checkpoint io
      
      * loose tensor checking atol and rtol
      
      * add comment
      
      * fix llama layernorm grad
      
      * fix zero grad
      
      * fix zero grad
      
      * fix conflict
      
      * update split and gather auto grad func
      
      * sequence parallel: inside text split (#6)
      
      * polish code (part 1)
      
      * polish code (part 2)
      
      * polish code (part 2.5)
      
      * polish code (part 3)
      
      * sequence parallel: inside text split
      
      * miscellaneous minor fixes
      
      * polish code
      
      * fix ulysses style ZeRO
      
      * sequence parallel: inside text split
      
      * miscellaneous minor fixes
      
      * disaggregate sp group and dp group for  sp
      
      * fix llama and gpt sp
      
      * polish code
      
      * move ulysses grad sync to ddp (#9)
      
      * remove zero_stage and unbind the grad sync for alltoall sp
      
      * add 2d group creation test
      
      * move ulysses grad sync to ddp
      
      * add 2d group creation test
      
      * remove useless code
      
      * change shard config not to enable sp when enable_all_optimizations
      
      * add sp warnings for several model
      
      * remove useless code
      
      ---------
      Co-authored-by: default avatarlinsj20 <linsj20@mails.tsinghua.edu.cn>
      8e412a54
  3. 02 Apr, 2024 1 commit
  4. 01 Apr, 2024 1 commit
    • Wenhao Chen's avatar
      [shardformer, pipeline] add `gradient_checkpointing_ratio` and heterogenous... · e614aa34
      Wenhao Chen authored
      [shardformer, pipeline] add `gradient_checkpointing_ratio` and heterogenous shard policy for llama (#5508)
      
      * feat: add `GradientCheckpointConfig` and `PipelineGradientCheckpointConfig`
      
      * feat: apply `GradientCheckpointConfig` to policy and llama_forward
      
      * feat: move `distribute_layer` and `get_stage_index` to PipelineStageManager
      
      * fix: add optional args for `distribute_layer` and `get_stage_index`
      
      * fix: fix changed API calls
      
      * test: update llama tests
      
      * style: polish `GradientCheckpointConfig`
      
      * fix: fix pipeline utils tests
      e614aa34
  5. 29 Mar, 2024 1 commit
    • YeAnbang's avatar
      [ColossalChat] Update RLHF V2 (#5286) · df5e9c53
      YeAnbang authored
      
      
      * Add dpo. Fix sft, ppo, lora. Refactor all
      
      * fix and tested ppo
      
      * 2 nd round refactor
      
      * add ci tests
      
      * fix ci
      
      * fix ci
      
      * fix readme, style
      
      * fix readme style
      
      * fix style, fix benchmark
      
      * reproduce benchmark result, remove useless files
      
      * rename to ColossalChat
      
      * use new image
      
      * fix ci workflow
      
      * fix ci
      
      * use local model/tokenizer for ci tests
      
      * fix ci
      
      * fix ci
      
      * fix ci
      
      * fix ci timeout
      
      * fix rm progress bar. fix ci timeout
      
      * fix ci
      
      * fix ci typo
      
      * remove 3d plugin from ci temporary
      
      * test environment
      
      * cannot save optimizer
      
      * support chat template
      
      * fix readme
      
      * fix path
      
      * test ci locally
      
      * restore build_or_pr
      
      * fix ci data path
      
      * fix benchmark
      
      * fix ci, move ci tests to 3080, disable fast tokenizer
      
      * move ci to 85
      
      * support flash attention 2
      
      * add all-in-one data preparation script. Fix colossal-llama2-chat chat template
      
      * add hardware requirements
      
      * move ci test data
      
      * fix save_model, add unwrap
      
      * fix missing bos
      
      * fix missing bos; support grad accumulation with gemini
      
      * fix ci
      
      * fix ci
      
      * fix ci
      
      * fix llama2 chat template config
      
      * debug sft
      
      * debug sft
      
      * fix colossalai version requirement
      
      * fix ci
      
      * add sanity check to prevent NaN loss
      
      * fix requirements
      
      * add dummy data generation script
      
      * add dummy data generation script
      
      * add dummy data generation script
      
      * add dummy data generation script
      
      * update readme
      
      * update readme
      
      * update readme and ignore
      
      * fix logger bug
      
      * support parallel_output
      
      * modify data preparation logic
      
      * fix tokenization
      
      * update lr
      
      * fix inference
      
      * run pre-commit
      
      ---------
      Co-authored-by: default avatarTong Li <tong.li352711588@gmail.com>
      df5e9c53
  6. 28 Mar, 2024 1 commit
  7. 27 Mar, 2024 3 commits
    • Insu Jang's avatar
      [shardformer] fix pipeline forward error if custom layer distribution is used (#5189) · 00525f77
      Insu Jang authored
      
      
      * Use self.[distribute_layers|get_stage_index] to exploit custom layer distribution
      
      * Change static methods for t5 layer distribution to member functions
      
      * Change static methods for whisper layer distribution to member functions
      
      * Replace whisper policy usage with self one
      
      * Fix test case to use non-static layer distribution methods
      
      * fix: fix typo
      
      ---------
      Co-authored-by: default avatarWenhao Chen <cwher@outlook.com>
      00525f77
    • github-actions[bot]'s avatar
    • Hongxin Liu's avatar
      [shardformer] update colo attention to support custom mask (#5510) · 19e1a5cf
      Hongxin Liu authored
      * [feature] refactor colo attention (#5462)
      
      * [extension] update api
      
      * [feature] add colo attention
      
      * [feature] update sdpa
      
      * [feature] update npu attention
      
      * [feature] update flash-attn
      
      * [test] add flash attn test
      
      * [test] update flash attn test
      
      * [shardformer] update modeling to fit colo attention (#5465)
      
      * [misc] refactor folder structure
      
      * [shardformer] update llama flash-attn
      
      * [shardformer] fix llama policy
      
      * [devops] update tensornvme install
      
      * [test] update llama test
      
      * [shardformer] update colo attn kernel dispatch
      
      * [shardformer] update blip2
      
      * [shardformer] update chatglm
      
      * [shardformer] update gpt2
      
      * [shardformer] update gptj
      
      * [shardformer] update opt
      
      * [shardformer] update vit
      
      * [shardformer] update colo attention mask prep
      
      * [shardformer] update whisper
      
      * [test] fix shardformer tests (#5514)
      
      * [test] fix shardformer tests
      
      * [test] fix shardformer tests
      19e1a5cf
  8. 26 Mar, 2024 6 commits
  9. 25 Mar, 2024 3 commits
    • flybird11111's avatar
      [shardformer]Fix lm parallel. (#5480) · 0688d92e
      flybird11111 authored
      * fix
      
      * padding vocab_size when using pipeline parallellism
      
      padding vocab_size when using pipeline parallellism
      
      fix
      
      fix
      
      * fix
      
      * fix
      
      fix
      
      fix
      
      * fix gather output
      
      * fix
      
      * fix
      
      * fix
      
      fix resize embedding
      
      fix resize embedding
      
      * fix resize embedding
      
      fix
      
      * revert
      
      * revert
      
      * revert
      
      * fix lm forward distribution
      
      * fix
      
      * test ci
      
      * fix
      0688d92e
    • binmakeswell's avatar
      [release] grok-1 inference benchmark (#5500) · 34e90925
      binmakeswell authored
      * [release] grok-1 inference benchmark
      
      * [release] grok-1 inference benchmark
      
      * [release] grok-1 inference benchmark
      
      * [release] grok-1 inference benchmark
      
      * [release] grok-1 inference benchmark
      34e90925
    • Wenhao Chen's avatar
      [hotfix] set return_outputs=False in examples and polish code (#5404) · bb0a668f
      Wenhao Chen authored
      * fix: simplify merge_batch
      
      * fix: use return_outputs=False to eliminate extra memory consumption
      
      * feat: add return_outputs warning
      
      * style: remove `return_outputs=False` as it is the default value
      bb0a668f
  10. 24 Mar, 2024 1 commit
    • Yuanheng Zhao's avatar
      [example] update Grok-1 inference (#5495) · 5fcd7795
      Yuanheng Zhao authored
      * revise grok-1 example
      
      * remove unused arg in scripts
      
      * prevent re-installing torch
      
      * update readme
      
      * revert modifying colossalai requirements
      
      * add perf
      
      * trivial
      
      * add tokenizer url
      5fcd7795
  11. 22 Mar, 2024 1 commit
  12. 21 Mar, 2024 1 commit
    • Hongxin Liu's avatar
      [example] add grok-1 inference (#5485) · 848a574c
      Hongxin Liu authored
      * [misc] add submodule
      
      * remove submodule
      
      * [example] support grok-1 tp inference
      
      * [example] add grok-1 inference script
      
      * [example] refactor code
      
      * [example] add grok-1 readme
      
      * [exmaple] add test ci
      
      * [exmaple] update readme
      848a574c
  13. 20 Mar, 2024 1 commit
  14. 18 Mar, 2024 2 commits
  15. 13 Mar, 2024 1 commit
    • Hongxin Liu's avatar
      [devops] fix compatibility (#5444) · f2e8b9ef
      Hongxin Liu authored
      * [devops] fix compatibility
      
      * [hotfix] update compatibility test on pr
      
      * [devops] fix compatibility
      
      * [devops] record duration during comp test
      
      * [test] decrease test duration
      
      * fix falcon
      f2e8b9ef
  16. 12 Mar, 2024 1 commit
  17. 11 Mar, 2024 1 commit
  18. 07 Mar, 2024 2 commits
  19. 05 Mar, 2024 9 commits