1. 31 Aug, 2022 1 commit
    • Boyuan Yao's avatar
      [fx] Fix wrong index in annotation and minimal flops in ckpt solver (#1521) · b231430b
      Boyuan Yao authored
      * [fx] fix wrong variable name in solver rotor
      
      * [fx] fix wrong variable name in solver rotor
      
      * [fx] fix the discretize bug
      
      * [fx] fix the first op in activation checkpoint codegen
      
      * [fx] fix some bugs of ckpt solver
      
      * [fx] modify test_ckpt_torchvision
      
      * [fx] set sequence to __sequence__ attr of GraphModule
      
      * [fx] docstring modification
      
      * [fx] remove performance test
      b231430b
  2. 30 Aug, 2022 3 commits
  3. 29 Aug, 2022 2 commits
  4. 27 Aug, 2022 1 commit
  5. 26 Aug, 2022 4 commits
    • Kirigaya Kazuto's avatar
      [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy (#1497) · 5a6fd71f
      Kirigaya Kazuto authored
      * support p2p communication with any type of object | pass test
      
      * reconstruct pipeline schedule with p2p_v2.py(support communication with List[Any]) | pass test
      
      * [engin/schedule] use p2p_v2 to recontruct pipeline_schedule
      
      * [pipeline/rpc] implement a demo for PP with cuda rpc framework
      
      * [pipeline/rpc] support interleaving | fix checkpoint bug | change logic when dispatch data in work_list to ensure steady 1F1B
      
      * [pipeline/rpc] implement distributed optimizer | test with assert_close
      
      * [pipeline/rpc] implement distributed optimizer | test with assert_close
      
      * [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy
      
      * [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy
      
      * [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy
      5a6fd71f
    • CsRic's avatar
    • YuliangLiu0306's avatar
    • Boyuan Yao's avatar
      [fx] Add activation checkpoint solver rotor (#1496) · de1e716d
      Boyuan Yao authored
      * [fx] fix defining ckpt functions inside forward
      
      * [fx] Modify activation checkpoint codegen and add ColoGraphModule
      
      * [fx] some modification
      
      * some modifications
      
      * some modifications
      
      * some modifications
      
      * some modifications
      
      * some code modifications
      
      * [automatic_parallel] ckpt solver rotor
      
      * [fx] add ckpt_solver_rotor
      
      * [fx] modification
      
      * code refactor
      
      * code refactor
      de1e716d
  6. 25 Aug, 2022 5 commits
  7. 24 Aug, 2022 3 commits
    • Jiarui Fang's avatar
      cde7b8a5
    • Super Daniel's avatar
      [fx] add profiler for fx nodes. (#1480) · 32efe8e7
      Super Daniel authored
      * [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
      
      * [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
      
      * [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
      
      * [fx] merge development into main (#1)
      
      * [fx] activation checkpointing using Chen strategies.
      
      * [fx] add test for ckpt_solver_chen
      
      * [fx] add vanilla activation checkpoint search with test on resnet and densenet
      
      * [fx] add a namespace code for solver_chen.
      
      * [fx] fix the false interpretation of algorithm 3 in https://arxiv.org/abs/1604.06174.
      
      * [fx] fix lowercase naming conventions.
      
      * [fx] simplify test for ckpt.
      
      * [fx] add rules to linearize computation graphs for searching. (#2)
      
      * [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
      
      * [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
      
      * [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
      
      * [fx] merge development into main (#1)
      
      * [fx] activation checkpointing using Chen strategies.
      
      * [fx] add test for ckpt_solver_chen
      
      * [fx] add vanilla activation checkpoint search with test on resnet and densenet
      
      * [fx] add a namespace code for solver_chen.
      
      * [fx] fix the false interpretation of algorithm 3 in https://arxiv.org/abs/1604.06174.
      
      * [fx] fix lowercase naming conventions.
      
      * [fx] simplify test for ckpt.
      
      * [fx] fix test and algorithm bugs in activation checkpointing.
      
      * [fx] polish ckpt_test.
      
      * [fx] add rules to linearize computation graphs for searching.
      
      * [fx] remove chen_sqrt for sake of simplicity
      
      * [fx] remove chen_sqrt for sake of simplicity
      
      * [fx] remove chen_sqrt for sake of simplicity
      
      * [fx] remove chen_sqrt for sake of simplicity
      
      * [fx] fix inconsistencies.
      
      * [fx] fix MetaInfoProp.
      
      * [fx] fix MetaInfoProp.
      
      * [fx] consider MetaInfoProp for inplace operands.
      
      * [fx] consider MetaInfoProp for inplace operands.
      
      * [fx] consider MetaInfoProp for inplace operands.
      
      * [fx] consider MetaInfoProp for inplace operands.
      
      * [fx] consider MetaInfoProp for inplace operands.
      
      * [fx] add profiler for fx nodes.
      
      * [fx] add profiler for fx nodes.
      
      * [fx] add profiler for fx nodes.
      
      * [fx] add profiler for fx nodes.
      
      * [fx] add profiler for fx nodes.
      
      * [fx] add profiler for fx nodes.
      
      * [fx] add profiler for fx nodes.
      
      * [fx] fix error in tests.
      
      * [fx] unfix bug.
      
      * [fx] unfix bug.
      32efe8e7
    • Kirigaya Kazuto's avatar
      [pipeline/rpc] support interleaving | fix checkpoint bug | change logic when... · a6c87491
      Kirigaya Kazuto authored
      [pipeline/rpc] support interleaving | fix checkpoint bug | change logic when dispatch data in work_list to ensure steady 1F1B (#1483)
      
      * support p2p communication with any type of object | pass test
      
      * reconstruct pipeline schedule with p2p_v2.py(support communication with List[Any]) | pass test
      
      * [engin/schedule] use p2p_v2 to recontruct pipeline_schedule
      
      * [pipeline/rpc] implement a demo for PP with cuda rpc framework
      
      * [pipeline/rpc] support interleaving | fix checkpoint bug | change logic when dispatch data in work_list to ensure steady 1F1B
      a6c87491
  8. 23 Aug, 2022 2 commits
  9. 22 Aug, 2022 3 commits
    • Boyuan Yao's avatar
      [fx] Fix ckpt functions' definitions in forward (#1476) · 1f2e547f
      Boyuan Yao authored
      * [fx] fix defining ckpt functions inside forward
      
      * [fx] Modify activation checkpoint codegen and add ColoGraphModule
      
      * [fx] some modification
      
      * some modifications
      
      * some modifications
      
      * some modifications
      
      * some modifications
      
      * some code modifications
      1f2e547f
    • Kirigaya Kazuto's avatar
      [pipeline/rpc] implement a demo for PP with cuda rpc framework (#1470) · bb5f5289
      Kirigaya Kazuto authored
      * support p2p communication with any type of object | pass test
      
      * reconstruct pipeline schedule with p2p_v2.py(support communication with List[Any]) | pass test
      
      * [engin/schedule] use p2p_v2 to recontruct pipeline_schedule
      
      * [pipeline/rpc] implement a demo for PP with cuda rpc framework
      
      * Delete p2p_v2.py
      
      * Delete _pipeline_schedule_v2.py
      
      * Delete test_object_list_p2p_v2.py
      
      * Delete test_boardcast_send_recv_v2.py
      
      * Delete test_cifar_with_data_pipeline_tensor_v2.py
      bb5f5289
    • Frank Lee's avatar
      [autoparallel] added dot handler (#1475) · 628c7e3f
      Frank Lee authored
      628c7e3f
  10. 19 Aug, 2022 2 commits
  11. 17 Aug, 2022 2 commits
  12. 16 Aug, 2022 2 commits
  13. 15 Aug, 2022 1 commit
    • Super Daniel's avatar
      [fx] fix test and algorithm bugs in activation checkpointing. (#1451) · 0dbd61c2
      Super Daniel authored
      * [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
      
      * [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
      
      * [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
      
      * [fx] merge development into main (#1)
      
      * [fx] activation checkpointing using Chen strategies.
      
      * [fx] add test for ckpt_solver_chen
      
      * [fx] add vanilla activation checkpoint search with test on resnet and densenet
      
      * [fx] add a namespace code for solver_chen.
      
      * [fx] fix the false interpretation of algorithm 3 in https://arxiv.org/abs/1604.06174.
      
      * [fx] fix lowercase naming conventions.
      
      * [fx] simplify test for ckpt.
      
      * [fx] fix test and algorithm bugs in activation checkpointing.
      
      * mend
      
      [fx] fix test and algorithm bugs in activation checkpointing.
      
      * mend
      
      [fx] fix test and algorithm bugs in activation checkpointing.
      
      * mend
      
      [fx] fix test and algorithm bugs in activation checkpointing.
      
      * mend
      
      [fx] fix test and algorithm bugs in activation checkpointing.
      
      * [fx] polish ckpt_test.
      
      * [fx] polish ckpt_test.
      
      * [fx] polish ckpt_test.
      0dbd61c2
  14. 12 Aug, 2022 7 commits
    • Geng Zhang's avatar
    • Frank Lee's avatar
      [test] fixed the activation codegen test (#1447) · 5a52e21f
      Frank Lee authored
      * [test] fixed the activation codegen test
      
      * polish code
      5a52e21f
    • YuliangLiu0306's avatar
      [tensor] shape consistency generate transform path and communication cost (#1435) · 0f304236
      YuliangLiu0306 authored
      * [tensor] shape consistency output transform path and communication cost
      
      * polish code
      0f304236
    • Boyuan Yao's avatar
      [fx] Use colossalai checkpoint and add offload recognition in codegen (#1439) · 5774fe02
      Boyuan Yao authored
      * [fx] Use colossalai.utils.checkpoint to replace torch.utils.checkpoint for offload activation and add offload annotation recognition in codegen
      
      * [fx] Use colossalai.utils.checkpoint to replace torch.utils.checkpoint for offload activation and add offload annotation recognition in codegen
      
      * Modification of test and add TODO in codegen
      
      * [fx] Modification of colossal ckpt usage
      
      * [fx] add gpc.destroy() to test_codegen
      5774fe02
    • Kirigaya Kazuto's avatar
      [engin/schedule] use p2p_v2 to recontruct pipeline_schedule (#1408) · e9460b45
      Kirigaya Kazuto authored
      * support p2p communication with any type of object | pass test
      
      * reconstruct pipeline schedule with p2p_v2.py(support communication with List[Any]) | pass test
      
      * [communication] add p2p_v2.py to support communication with List[Any]
      
      * Delete _pipeline_schedule_v2.py
      
      * Delete test_cifar_with_data_pipeline_tensor_v2.py
      
      * [engin/schedule] use p2p_v2 to recontruct pipeline_schedule
      
      * [engin/schedule] use p2p_v2 to recontruct pipeline_schedule
      
      * [engin/schedule] use p2p_v2 to recontruct pipeline_schedule
      
      * [engin/schedule] use p2p_v2 to recontruct pipeline_schedule
      
      * [engin/schedule] use p2p_v2 to recontruct pipeline_schedule
      
      * Delete p2p_v2.py
      
      * Delete test_boardcast_send_recv_v2.py
      
      * Delete test_object_list_p2p_v2.py
      
      * [engin/schedule] use p2p_v2 to recontruct pipeline_schedule
      
      * [communication] remove print code
      
      * [communication] remove print code
      
      * [engin/schedule] shorten the running time of testing file to prevent cancelling in CI
      e9460b45
    • Frank Lee's avatar
      [tensor] added linear implementation for the new sharding spec (#1416) · ae1b58cd
      Frank Lee authored
      * [tensor] added linear implementation for the new sharding spec
      
      * polish code
      ae1b58cd
    • Super Daniel's avatar
      [fx] fix the false interpretation of algorithm 3 in https://arxiv.org/abs/1604.06174. (#1446) · d40a9392
      Super Daniel authored
      * [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
      
      * [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
      
      * [fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages
      
      * [fx] activation checkpointing using Chen strategies.
      
      * [fx] add test for ckpt_solver_chen
      
      * mend
      
      * [fx] add vanilla activation checkpoint search with test on resnet and densenet
      
      * [fx] add vanilla activation checkpoint search with test on resnet and densenet
      
      * [fx] add a namespace code for solver_chen.
      
      * [fx] fix the false interpretation of algorithm 3 in https://arxiv.org/abs/1604.06174.
      
      * [fx] fix lowercase naming conventions.
      d40a9392
  15. 11 Aug, 2022 2 commits