1. 24 Apr, 2024 1 commit
  2. 03 Apr, 2024 1 commit
    • Zhongkai Zhao's avatar
      [shardformer] Sequence Parallelism Optimization (#5533) · 8e412a54
      Zhongkai Zhao authored
      
      
      * sequence parallel optimization
      
      * validate sequence parallel in llama (code to be polished)
      
      * shardformer api writing
      
      * integrate sequence parallel in ShardFormer
      
      * fix pp bugs and sp bugs for LlaMa model
      
      * integrating ring-based sequence parallelism into ShardFormer
      
      * [sequence parallelism]: Add fused megatron function
      
      * integrating ring-based sequence parallelism into ShardFormer
      
      ---------
      Co-authored-by: default avatarlinsj20 <linsj20@mails.tsinghua.edu.cn>
      
      * fix bugs when useing sp and flashattention together
      
      * fix operation function name
      
      * support flash attention for ulysses-style sp
      
      * clarify sp process group
      
      * fix compatibility bugs in moe plugin
      
      * fix fused linear bugs
      
      * fix linear layer test
      
      * support gpt model all-to-all sp
      
      * modify shard data dimension (meant to be dim=-1)
      
      * support megtron-style sp and distributed attn for llama model
      
      * [shardformer] add megatron sp to llama
      
      * support llama7B 128k with distributed attention
      
      * [shardformer] robustness enhancement
      
      * add block attn
      
      * sp mode 1: keep input as a complete sequence
      
      * fix sp compatability
      
      * finish sp mode 3 support for gpt
      
      * using all_to_all_single when batch size is 1
      
      * support mode 2 sp in gpt2 (#5)
      
      * [shardformer] add megatron sp to llama
      
      * support llama7B 128k with distributed attention
      
      * [shardformer] robustness enhancement
      
      * add block attn
      
      * sp mode 1: keep input as a complete sequence
      
      * fix sp compatability
      
      * refactor ring implementation
      
      * support mode 2 sp in gpt2
      
      * polish code
      
      * enable distributed attn mask when using sp mode 2 and 3 in llama
      
      * automatically enable flash attn when using sp mode 2 and 3 in llama
      
      * inplace attn mask
      
      * add zero2 support for sequence parallel
      
      * polish code
      
      * fix bugs
      
      * fix gemini checkpoint io
      
      * loose tensor checking atol and rtol
      
      * add comment
      
      * fix llama layernorm grad
      
      * fix zero grad
      
      * fix zero grad
      
      * fix conflict
      
      * update split and gather auto grad func
      
      * sequence parallel: inside text split (#6)
      
      * polish code (part 1)
      
      * polish code (part 2)
      
      * polish code (part 2.5)
      
      * polish code (part 3)
      
      * sequence parallel: inside text split
      
      * miscellaneous minor fixes
      
      * polish code
      
      * fix ulysses style ZeRO
      
      * sequence parallel: inside text split
      
      * miscellaneous minor fixes
      
      * disaggregate sp group and dp group for  sp
      
      * fix llama and gpt sp
      
      * polish code
      
      * move ulysses grad sync to ddp (#9)
      
      * remove zero_stage and unbind the grad sync for alltoall sp
      
      * add 2d group creation test
      
      * move ulysses grad sync to ddp
      
      * add 2d group creation test
      
      * remove useless code
      
      * change shard config not to enable sp when enable_all_optimizations
      
      * add sp warnings for several model
      
      * remove useless code
      
      ---------
      Co-authored-by: default avatarlinsj20 <linsj20@mails.tsinghua.edu.cn>
      8e412a54
  3. 01 Apr, 2024 1 commit
    • Wenhao Chen's avatar
      [shardformer, pipeline] add `gradient_checkpointing_ratio` and heterogenous... · e614aa34
      Wenhao Chen authored
      [shardformer, pipeline] add `gradient_checkpointing_ratio` and heterogenous shard policy for llama (#5508)
      
      * feat: add `GradientCheckpointConfig` and `PipelineGradientCheckpointConfig`
      
      * feat: apply `GradientCheckpointConfig` to policy and llama_forward
      
      * feat: move `distribute_layer` and `get_stage_index` to PipelineStageManager
      
      * fix: add optional args for `distribute_layer` and `get_stage_index`
      
      * fix: fix changed API calls
      
      * test: update llama tests
      
      * style: polish `GradientCheckpointConfig`
      
      * fix: fix pipeline utils tests
      e614aa34
  4. 30 Oct, 2023 1 commit
    • Jianghai's avatar
      [Inference] Dynamic Batching Inference, online and offline (#4953) · cf579ff4
      Jianghai authored
      
      
      * [inference] Dynamic Batching for Single and Multiple GPUs (#4831)
      
      * finish batch manager
      
      * 1
      
      * first
      
      * fix
      
      * fix dynamic batching
      
      * llama infer
      
      * finish test
      
      * support different lengths generating
      
      * del prints
      
      * del prints
      
      * fix
      
      * fix bug
      
      ---------
      
      Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
      
      * [inference] Async dynamic batching  (#4894)
      
      * finish input and output logic
      
      * add generate
      
      * test forward
      
      * 1
      
      * [inference]Re push async dynamic batching (#4901)
      
      * adapt to ray server
      
      * finish async
      
      * finish test
      
      * del test
      
      ---------
      Co-authored-by: default avataryuehuayingxueluo <867460659@qq.com>
      
      * Revert "[inference]Re push async dynamic batching (#4901)" (#4905)
      
      This reverts commit fbf3c09e673794ed18c91d4bab1a7dfea052e95a.
      
      * Revert "[inference] Async dynamic batching  (#4894)"
      
      This reverts commit fced14025043e29ce816b315f440601188f7f79f.
      
      * Revert "[inference] Async dynamic batching  (#4894)" (#4909)
      
      This reverts commit fced14025043e29ce816b315f440601188f7f79f.
      
      * Add Ray Distributed Environment Init Scripts
      
      * support DynamicBatchManager base function
      
      * revert _set_tokenizer version
      
      * add driver async generate
      
      * add async test
      
      * fix bugs in test_ray_dist.py
      
      * add get_tokenizer.py
      
      * fix code style
      
      * fix bugs about No module named 'pydantic' in ci test
      
      * fix bugs in ci test
      
      * fix bugs in ci test
      
      * fix bugs in ci test
      
      * [infer]Add Ray Distributed Environment Init Scripts (#4911)
      
      * Revert "[inference] Async dynamic batching  (#4894)"
      
      This reverts commit fced14025043e29ce816b315f440601188f7f79f.
      
      * Add Ray Distributed Environment Init Scripts
      
      * support DynamicBatchManager base function
      
      * revert _set_tokenizer version
      
      * add driver async generate
      
      * add async test
      
      * fix bugs in test_ray_dist.py
      
      * add get_tokenizer.py
      
      * fix code style
      
      * fix bugs about No module named 'pydantic' in ci test
      
      * fix bugs in ci test
      
      * fix bugs in ci test
      
      * fix bugs in ci test
      
      * support dynamic batch for bloom model and is_running function
      
      * [Inference]Test for new Async engine (#4935)
      
      * infer engine
      
      * infer engine
      
      * test engine
      
      * test engine
      
      * new manager
      
      * change step
      
      * add
      
      * test
      
      * fix
      
      * fix
      
      * finish test
      
      * finish test
      
      * finish test
      
      * finish test
      
      * add license
      
      ---------
      Co-authored-by: default avataryuehuayingxueluo <867460659@qq.com>
      
      * add assertion for config (#4947)
      
      * [Inference] Finish dynamic batching offline test (#4948)
      
      * test
      
      * fix test
      
      * fix quant
      
      * add default
      
      * fix
      
      * fix some bugs
      
      * fix some bugs
      
      * fix
      
      * fix bug
      
      * fix bugs
      
      * reset param
      
      ---------
      Co-authored-by: default avataryuehuayingxueluo <867460659@qq.com>
      Co-authored-by: default avatarCuiqing Li <lixx3527@gmail.com>
      Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
      cf579ff4
  5. 20 Oct, 2023 1 commit
  6. 19 Sep, 2023 1 commit
  7. 09 Sep, 2023 1 commit
    • flybird11111's avatar
      [shardformer] update llama2/opt finetune example and fix llama2 policy (#4645) · 7486ed7d
      flybird11111 authored
      * [shardformer] update shardformer readme
      
      [shardformer] update shardformer readme
      
      [shardformer] update shardformer readme
      
      * [shardformer] update llama2/opt finetune example and shardformer update to llama2
      
      * [shardformer] update llama2/opt finetune example and shardformer update to llama2
      
      * [shardformer] update llama2/opt finetune example and shardformer update to llama2
      
      * [shardformer] change dataset
      
      * [shardformer] change dataset
      
      * [shardformer] fix CI
      
      * [shardformer] fix
      
      * [shardformer] fix
      
      * [shardformer] fix
      
      * [shardformer] fix
      
      * [shardformer] fix
      
      [example] update opt example
      
      [example] resolve comments
      
      fix
      
      fix
      7486ed7d
  8. 04 Jul, 2023 1 commit