1. 24 Sep, 2022 1 commit
  2. 23 Sep, 2022 8 commits
  3. 22 Sep, 2022 2 commits
  4. 21 Sep, 2022 1 commit
  5. 20 Sep, 2022 5 commits
  6. 19 Sep, 2022 1 commit
    • Kirigaya Kazuto's avatar
      [pipeline/chimera] reconstruct PipelineBase and Worker to support more... · edc9e419
      Kirigaya Kazuto authored
      [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera (#1595)
      
      * [pipeline/tuning] improve dispatch performance both time and space cost
      
      * [pipeline/converge] add interface for testing convergence
      
      * [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style
      
      * Update PipelineBase.py
      
      * [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera
      edc9e419
  7. 16 Sep, 2022 1 commit
  8. 14 Sep, 2022 3 commits
    • Boyuan Yao's avatar
      [fx] Add offload codegen (#1598) · a7cda6f5
      Boyuan Yao authored
      * [fx] add input activation offload to codegen
      
      * [fx] modify unit test
      
      * [fx] remove two skips in torch11
      
      * [fx] use all_input_nodes instead of _input_nodes
      a7cda6f5
    • Super Daniel's avatar
      [hotfix/rotor] fix variable names (#1597) · c8e9b2ad
      Super Daniel authored
      * [fx] add some comment and docstrings.
      
      * [fx] add dataflow analysis for an autograd graph.
      
      * add intepretation for graph analysis.
      
      * [fx] before doing save_tensor_hooks.
      
      * [fx] provide an accurate estimation of memory except for GPT-2.
      
      * [fx] provide an accurate estimation of memory except for GPT-2.
      
      * [fx] provide an accurate estimation of memory except for GPT-2.
      
      * [fx] a very accurate version on GPT-2.
      
      * [fx] refactor code.
      
      * [fx] remove redundant inplace=True.
      
      * [fx] refactor code.
      
      * [fx] refactor code.
      
      * [fx] refactor code.
      
      * [fx] dive into backward memory.
      
      * [fx] fix variable names in ckpt_solvers and unskip tests.
      
      * [fx] commit my changes.
      
      * [fx] restore skips.
      
      * [fx] restore skips.
      
      * [fx] chaange stage into phase.
      
      * [fx] chaange stage into phase.
      
      * [fx] chaange stage into phase.
      c8e9b2ad
    • YuliangLiu0306's avatar
      [autoparallel] add reshape handler (#1594) · faa23b9d
      YuliangLiu0306 authored
      * [autoparallel] add reshape handler
      
      * polish code
      faa23b9d
  9. 13 Sep, 2022 5 commits
  10. 12 Sep, 2022 1 commit
    • Boyuan Yao's avatar
      [fx] Add nested checkpoint in activation checkpoint codegen (#1585) · f3687e4e
      Boyuan Yao authored
      * [fx] add nested activation_checkpoint codegen
      
      * undo algorithms commits
      
      * solver
      
      * undo some commits
      
      * [fx] torch11 add nested activation checkpoint codegen
      
      * remove some imports
      
      * [fx] add some comments in activation codegen
      
      * [fx] codegen instance error fix
      f3687e4e
  11. 08 Sep, 2022 1 commit
  12. 07 Sep, 2022 4 commits
  13. 06 Sep, 2022 1 commit
  14. 05 Sep, 2022 1 commit
  15. 02 Sep, 2022 1 commit
  16. 01 Sep, 2022 4 commits
    • Super Daniel's avatar
      [fx] add test for meta tensor. (#1527) · 7dc53237
      Super Daniel authored
      * [fx] add test for meta tensor.
      
      * [fx] add test for meta tensor.
      
      * [fx] add test for meta tensor.
      
      * [fx] add test for meta tensor.
      
      * [fx] fix error.
      7dc53237
    • YuliangLiu0306's avatar
      4b3d6cae
    • CsRic's avatar
      [embedding] add tablewise sharding for FAW (#1526) · 5156d5b4
      CsRic authored
      5156d5b4
    • Kirigaya Kazuto's avatar
      [pipeline/pipleline_process_group] finish PipelineProcessGroup to manage local... · f1e18362
      Kirigaya Kazuto authored
      [pipeline/pipleline_process_group] finish PipelineProcessGroup to manage local abd global rank in TP,DP and PP (#1508)
      
      * support p2p communication with any type of object | pass test
      
      * reconstruct pipeline schedule with p2p_v2.py(support communication with List[Any]) | pass test
      
      * [engin/schedule] use p2p_v2 to recontruct pipeline_schedule
      
      * [pipeline/rpc] implement a demo for PP with cuda rpc framework
      
      * [pipeline/rpc] support interleaving | fix checkpoint bug | change logic when dispatch data in work_list to ensure steady 1F1B
      
      * [pipeline/rpc] implement distributed optimizer | test with assert_close
      
      * [pipeline/rpc] implement distributed optimizer | test with assert_close
      
      * [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy
      
      * [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy
      
      * [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy
      
      * [pipeline/pipleline_process_group] finish PipelineProcessGroup to manage local abd global rank in TP,DP and PP
      
      * [pipeline/pipleline_process_group] remove comment
      
      * [pipeline/pipleline_process_group] remove comment
      
      * [pipeline/pipleline_process_group] skip process group test
      
      * [pipeline/pipleline_process_group] remove test named function
      f1e18362