• Hongxin Liu's avatar
    [chat] add distributed PPO trainer (#3740) · b5f05663
    Hongxin Liu authored
    
    
    * Detached ppo (#9)
    
    * run the base
    
    * working on dist ppo
    
    * sync
    
    * detached trainer
    
    * update detached trainer. no maker update function
    
    * facing init problem
    
    * 1 maker 1 trainer detached run. but no model update
    
    * facing cuda problem
    
    * fix save functions
    
    * verified maker update
    
    * nothing
    
    * add ignore
    
    * analyize loss issue
    
    * remove some debug codes
    
    * facing 2m1t stuck issue
    
    * 2m1t verified
    
    * do not use torchrun
    
    * working on 2m2t
    
    * working on 2m2t
    
    * initialize strategy in ray actor env
    
    * facing actor's init order issue
    
    * facing ddp model update issue (need unwarp ddp)
    
    * unwrap ddp actor
    
    * checking 1m2t stuck problem
    
    * nothing
    
    * set timeout for trainer choosing. It solves the stuck problem!
    
    * delete some debug output
    
    * rename to sync with upstream
    
    * rename to sync with upstream
    
    * coati rename
    
    * nothing
    
    * I am going to detach the replaybuffer from trainer and make it a Ray Actor. Two benefits: 1. support TP trainer. 2. asynchronized buffer operations
    
    * experience_maker_holder performs target-revolving _send_experience() instead of length comparison.
    
    * move code to ray subfolder
    
    * working on pipeline inference
    
    * apply comments
    
    * working on pipeline strategy. in progress.
    
    * remove pipeline code. clean this branch
    
    * update remote parameters by state_dict. no test
    
    * nothing
    
    * state_dict sharding transfer
    
    * merge debug branch
    
    * gemini _unwrap_model fix
    
    * simplify code
    
    * simplify code & fix LoRALinear AttributeError
    
    * critic unwrapped state_dict
    
    ---------
    Co-authored-by: default avatarcsric <richcsr256@gmail.com>
    
    * [chat] add perfomance evaluator and fix bugs (#10)
    
    * [chat] add performance evaluator for ray
    
    * [chat] refactor debug arg
    
    * [chat] support hf config
    
    * [chat] fix generation
    
    * [chat] add 1mmt dummy example
    
    * [chat] fix gemini ckpt
    
    * split experience to send (#11)
    Co-authored-by: default avatarcsric <richcsr256@gmail.com>
    
    * [chat] refactor trainer and maker (#12)
    
    * [chat] refactor experience maker holder
    
    * [chat] refactor model init
    
    * [chat] refactor trainer args
    
    * [chat] refactor model init
    
    * [chat] refactor trainer
    
    * [chat] refactor experience sending logic and training loop args (#13)
    
    * [chat] refactor experience send logic
    
    * [chat] refactor trainer
    
    * [chat] refactor trainer
    
    * [chat] refactor experience maker
    
    * [chat] refactor pbar
    
    * [chat] refactor example folder (#14)
    
    * [chat] support quant (#15)
    
    * [chat] add quant
    
    * [chat] add quant example
    
    * prompt example (#16)
    
    * prompt example
    
    * prompt load csv data
    
    * remove legacy try
    
    ---------
    Co-authored-by: default avatarcsric <richcsr256@gmail.com>
    
    * [chat] add mmmt dummy example and refactor experience sending (#17)
    
    * [chat] add mmmt dummy example
    
    * [chat] refactor naive strategy
    
    * [chat] fix struck problem
    
    * [chat] fix naive strategy
    
    * [chat] optimize experience maker sending logic
    
    * [chat] refactor sending assignment
    
    * [chat] refactor performance evaluator (#18)
    
    * Prompt Example & requires_grad state_dict & sharding state_dict (#19)
    
    * prompt example
    
    * prompt load csv data
    
    * remove legacy try
    
    * maker models require_grad set to False
    
    * working on zero redundancy update
    
    * mmmt_prompt example; naive strategy requires_grad state_dict & sharding; maker model requires_no_grad.
    
    * remove legacy examples
    
    * remove legacy examples
    
    * remove replay buffer tp state. bad design
    
    ---------
    Co-authored-by: default avatarcsric <richcsr256@gmail.com>
    
    * state_dict sending adapts to new unwrap function (#20)
    
    * prompt example
    
    * prompt load csv data
    
    * remove legacy try
    
    * maker models require_grad set to False
    
    * working on zero redundancy update
    
    * mmmt_prompt example; naive strategy requires_grad state_dict & sharding; maker model requires_no_grad.
    
    * remove legacy examples
    
    * remove legacy examples
    
    * remove replay buffer tp state. bad design
    
    * opt benchmark
    
    * better script
    
    * nothing
    
    * [chat] strategy refactor unwrap model
    
    * [chat] strategy refactor save model
    
    * [chat] add docstr
    
    * [chat] refactor trainer save model
    
    * [chat] fix strategy typing
    
    * [chat] refactor trainer save model
    
    * [chat] update readme
    
    * [chat] fix unit test
    
    * working on lora reconstruction
    
    * state_dict sending adapts to new unwrap function
    
    * remove comments
    
    ---------
    Co-authored-by: default avatarcsric <richcsr256@gmail.com>
    Co-authored-by: default avatarver217 <lhx0217@gmail.com>
    
    * [chat-ray] add readme (#21)
    
    * add readme
    
    * transparent graph
    
    * add note background
    
    ---------
    Co-authored-by: default avatarcsric <richcsr256@gmail.com>
    
    * [chat] get images from url (#22)
    
    * Refactor/chat ray (#23)
    
    * [chat] lora add todo
    
    * [chat] remove unused pipeline strategy
    
    * [chat] refactor example structure
    
    * [chat] setup ci for ray
    
    * [chat-ray] Support LoRA trainer. LoRA weights reconstruction. (#24)
    
    * lora support prototype
    
    * lora support
    
    * 1mmt lora & remove useless code
    
    ---------
    Co-authored-by: default avatarcsric <richcsr256@gmail.com>
    
    * [chat] fix test ci for ray
    
    * [chat] fix test ci requirements for ray
    
    * [chat] fix ray runtime env
    
    * [chat] fix ray runtime env
    
    * [chat] fix example ci docker args
    
    * [chat] add debug info in trainer
    
    * [chat] add nccl debug info
    
    * [chat] skip ray test
    
    * [doc] fix typo
    
    ---------
    Co-authored-by: default avatarcsric <59389055+CsRic@users.noreply.github.com>
    Co-authored-by: default avatarcsric <richcsr256@gmail.com>
    b5f05663
colossalai.py 9.9 KB