1. 11 Mar, 2022 4 commits
    • Jiarui Fang's avatar
      Feature/zero (#279) · 5a560a06
      Jiarui Fang authored
      
      
      * add zero1 (#209)
      
      * add zero1
      
      * add test zero1
      
      * update zero stage 1 develop (#212)
      
      * Implement naive zero3 (#240)
      
      * naive zero3 works well
      
      * add zero3 param manager
      
      * add TODOs in comments
      
      * add gather full param ctx
      
      * fix sub module streams
      
      * add offload
      
      * fix bugs of hook and add unit tests
      
      * fix bugs of hook and add unit tests (#252)
      
      * add gather full param ctx
      
      * fix sub module streams
      
      * add offload
      
      * fix bugs of hook and add unit tests
      
      * polish code and add state dict hook
      
      * fix bug
      
      * update unit test
      
      * refactor reconstructed zero code
      
      * clip_grad support zero3 and add unit test
      
      * add unit test for Zero3ParameterManager
      
      * [WIP] initialize the shard param class
      
      * [WIP] Yet another sharded model implementation (#274)
      
      * [WIP] initialize the shard param class
      
      * [WIP] Yes another implementation of shardModel. Using a better hook method.
      
      * torch.concat -> torch.cat
      
      * fix test_zero_level_1.py::test_zero_level_1 unitest
      
      * remove deepspeed implementation and refactor for the reconstructed zero module
      
      * polish zero dp unittests
      Co-authored-by: default avatarver217 <lhx0217@gmail.com>
      Co-authored-by: default avatarFrank Lee <somerlee.9@gmail.com>
      5a560a06
    • 1SAA's avatar
      Added TPExpert for special situation · 82023779
      1SAA authored
      82023779
    • 1SAA's avatar
      Optimized MoE layer and fixed some bugs; · 219df6e6
      1SAA authored
      Decreased moe tests;
      
      Added FFNExperts and ViTMoE model
      219df6e6
    • zbian's avatar
      fixed padding index issue for vocab parallel embedding layers; updated 3D... · 3dba0705
      zbian authored
      fixed padding index issue for vocab parallel embedding layers; updated 3D linear to be compatible with examples in the tutorial
      3dba0705
  2. 15 Feb, 2022 1 commit
    • アマデウス's avatar
      moved env variables to global variables; (#215) · 9ee197d0
      アマデウス authored
      added branch context;
      added vocab parallel layers;
      moved split_batch from load_batch to tensor parallel embedding layers;
      updated gpt model;
      updated unit test cases;
      fixed few collective communicator bugs
      9ee197d0
  3. 25 Jan, 2022 1 commit
    • Jiarui Fang's avatar
      add pytorch hooks (#179) · 569357fe
      Jiarui Fang authored
      * add pytorch hooks
      fix #175
      
      * remove licenses in src code
      
      * add gpu memory tracer
      
      * replacing print with logger in ophooks.
      569357fe
  4. 20 Jan, 2022 1 commit
  5. 17 Jan, 2022 1 commit
  6. 30 Dec, 2021 1 commit
    • ver217's avatar
      Optimize pipeline schedule (#94) · 96780e6e
      ver217 authored
      
      
      * add pipeline shared module wrapper and update load batch
      
      * added model parallel process group for amp and clip grad (#86)
      
      * added model parallel process group for amp and clip grad
      
      * update amp and clip with model parallel process group
      
      * remove pipeline_prev/next group (#88)
      
      * micro batch offload
      
      * optimize pipeline gpu memory usage
      
      * pipeline can receive tensor shape (#93)
      
      * optimize pipeline gpu memory usage
      
      * fix grad accumulation step counter
      
      * rename classes and functions
      Co-authored-by: default avatarFrank Lee <somerlee.9@gmail.com>
      96780e6e
  7. 29 Dec, 2021 1 commit
  8. 27 Dec, 2021 1 commit
  9. 20 Dec, 2021 2 commits
  10. 16 Dec, 2021 1 commit
  11. 09 Dec, 2021 1 commit
    • Frank Lee's avatar
      Develop/experiments (#59) · da01c234
      Frank Lee authored
      
      
      * Add gradient accumulation, fix lr scheduler
      
      * fix FP16 optimizer and adapted torch amp with tensor parallel (#18)
      
      * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes
      
      * fixed trainer
      
      * Revert "fixed trainer"
      
      This reverts commit 2e0b0b76990e8d4e337add483d878c0f61cf5097.
      
      * improved consistency between trainer, engine and schedule (#23)
      Co-authored-by: default avatar1SAA <c2h214748@gmail.com>
      
      * Split conv2d, class token, positional embedding in 2d, Fix random number in ddp
      Fix convergence in cifar10, Imagenet1000
      
      * Integrate 1d tensor parallel in Colossal-AI (#39)
      
      * fixed 1D and 2D convergence (#38)
      
      * optimized 2D operations
      
      * fixed 1D ViT convergence problem
      
      * Feature/ddp (#49)
      
      * remove redundancy func in setup (#19) (#20)
      
      * use env to control the language of doc (#24) (#25)
      
      * Support TP-compatible Torch AMP and Update trainer API (#27)
      
      * Add gradient accumulation, fix lr scheduler
      
      * fix FP16 optimizer and adapted torch amp with tensor parallel (#18)
      
      * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes
      
      * fixed trainer
      
      * Revert "fixed trainer"
      
      This reverts commit 2e0b0b76990e8d4e337add483d878c0f61cf5097.
      
      * improved consistency between trainer, engine and schedule (#23)
      Co-authored-by: default avatar1SAA <c2h214748@gmail.com>
      Co-authored-by: default avatar1SAA <c2h214748@gmail.com>
      Co-authored-by: default avatarver217 <lhx0217@gmail.com>
      
      * add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29)
      
      * add explanation for ViT example (#35) (#36)
      
      * support torch ddp
      
      * fix loss accumulation
      
      * add log for ddp
      
      * change seed
      
      * modify timing hook
      Co-authored-by: default avatarFrank Lee <somerlee.9@gmail.com>
      Co-authored-by: default avatar1SAA <c2h214748@gmail.com>
      Co-authored-by: default avatarbinmakeswell <binmakeswell@gmail.com>
      
      * Feature/pipeline (#40)
      
      * remove redundancy func in setup (#19) (#20)
      
      * use env to control the language of doc (#24) (#25)
      
      * Support TP-compatible Torch AMP and Update trainer API (#27)
      
      * Add gradient accumulation, fix lr scheduler
      
      * fix FP16 optimizer and adapted torch amp with tensor parallel (#18)
      
      * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes
      
      * fixed trainer
      
      * Revert "fixed trainer"
      
      This reverts commit 2e0b0b76990e8d4e337add483d878c0f61cf5097.
      
      * improved consistency between trainer, engine and schedule (#23)
      Co-authored-by: default avatar1SAA <c2h214748@gmail.com>
      Co-authored-by: default avatar1SAA <c2h214748@gmail.com>
      Co-authored-by: default avatarver217 <lhx0217@gmail.com>
      
      * add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29)
      
      * add explanation for ViT example (#35) (#36)
      
      * optimize communication of pipeline parallel
      
      * fix grad clip for pipeline
      Co-authored-by: default avatarFrank Lee <somerlee.9@gmail.com>
      Co-authored-by: default avatar1SAA <c2h214748@gmail.com>
      Co-authored-by: default avatarbinmakeswell <binmakeswell@gmail.com>
      
      * optimized 3d layer to fix slow computation ; tested imagenet performance with 3d; reworked lr_scheduler config definition; fixed launch args; fixed some printing issues; simplified apis of 3d layers (#51)
      
      * Update 2.5d layer code to get a similar accuracy on imagenet-1k dataset
      
      * update api for better usability (#58)
      
      update api for better usability
      Co-authored-by: default avatar1SAA <c2h214748@gmail.com>
      Co-authored-by: default avatarver217 <lhx0217@gmail.com>
      Co-authored-by: default avatarpuck_WCR <46049915+WANG-CR@users.noreply.github.com>
      Co-authored-by: default avatarbinmakeswell <binmakeswell@gmail.com>
      Co-authored-by: default avatarアマデウス <kurisusnowdeng@users.noreply.github.com>
      Co-authored-by: default avatarBoxiangW <45734921+BoxiangW@users.noreply.github.com>
      da01c234
  12. 18 Nov, 2021 1 commit
  13. 28 Oct, 2021 2 commits