1. 20 Sep, 2022 1 commit
    • Kirigaya Kazuto's avatar
      [pipeline/chimera] test chimera | fix bug of initializing (#1615) · 170fa810
      Kirigaya Kazuto authored
      * [pipeline/tuning] improve dispatch performance both time and space cost
      
      * [pipeline/converge] add interface for testing convergence
      
      * [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style
      
      * Update PipelineBase.py
      
      * [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera
      
      * [pipeline/chimera] test chimera | fix bug of initializing
      170fa810
  2. 19 Sep, 2022 1 commit
    • Kirigaya Kazuto's avatar
      [pipeline/chimera] reconstruct PipelineBase and Worker to support more... · edc9e419
      Kirigaya Kazuto authored
      [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera (#1595)
      
      * [pipeline/tuning] improve dispatch performance both time and space cost
      
      * [pipeline/converge] add interface for testing convergence
      
      * [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style
      
      * Update PipelineBase.py
      
      * [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera
      edc9e419
  3. 07 Sep, 2022 1 commit
  4. 26 Aug, 2022 1 commit
    • Kirigaya Kazuto's avatar
      [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy (#1497) · 5a6fd71f
      Kirigaya Kazuto authored
      * support p2p communication with any type of object | pass test
      
      * reconstruct pipeline schedule with p2p_v2.py(support communication with List[Any]) | pass test
      
      * [engin/schedule] use p2p_v2 to recontruct pipeline_schedule
      
      * [pipeline/rpc] implement a demo for PP with cuda rpc framework
      
      * [pipeline/rpc] support interleaving | fix checkpoint bug | change logic when dispatch data in work_list to ensure steady 1F1B
      
      * [pipeline/rpc] implement distributed optimizer | test with assert_close
      
      * [pipeline/rpc] implement distributed optimizer | test with assert_close
      
      * [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy
      
      * [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy
      
      * [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy
      5a6fd71f
  5. 25 Aug, 2022 1 commit
    • Kirigaya Kazuto's avatar
      [pipeline/rpc] implement distributed optimizer | test with assert_close (#1486) · 9145aef2
      Kirigaya Kazuto authored
      * support p2p communication with any type of object | pass test
      
      * reconstruct pipeline schedule with p2p_v2.py(support communication with List[Any]) | pass test
      
      * [engin/schedule] use p2p_v2 to recontruct pipeline_schedule
      
      * [pipeline/rpc] implement a demo for PP with cuda rpc framework
      
      * [pipeline/rpc] support interleaving | fix checkpoint bug | change logic when dispatch data in work_list to ensure steady 1F1B
      
      * [pipeline/rpc] implement distributed optimizer | test with assert_close
      
      * [pipeline/rpc] implement distributed optimizer | test with assert_close
      9145aef2