• csric's avatar
    [chatgpt] Detached PPO Training (#3195) · e3551443
    csric authored
    
    
    * run the base
    
    * working on dist ppo
    
    * sync
    
    * detached trainer
    
    * update detached trainer. no maker update function
    
    * facing init problem
    
    * 1 maker 1 trainer detached run. but no model update
    
    * facing cuda problem
    
    * fix save functions
    
    * verified maker update
    
    * nothing
    
    * add ignore
    
    * analyize loss issue
    
    * remove some debug codes
    
    * facing 2m1t stuck issue
    
    * 2m1t verified
    
    * do not use torchrun
    
    * working on 2m2t
    
    * working on 2m2t
    
    * initialize strategy in ray actor env
    
    * facing actor's init order issue
    
    * facing ddp model update issue (need unwarp ddp)
    
    * unwrap ddp actor
    
    * checking 1m2t stuck problem
    
    * nothing
    
    * set timeout for trainer choosing. It solves the stuck problem!
    
    * delete some debug output
    
    * rename to sync with upstream
    
    * rename to sync with upstream
    
    * coati rename
    
    * nothing
    
    * I am going to detach the replaybuffer from trainer and make it a Ray Actor. Two benefits: 1. support TP trainer. 2. asynchronized buffer operations
    
    * experience_maker_holder performs target-revolving _send_experience() instead of length comparison.
    
    * move code to ray subfolder
    
    * working on pipeline inference
    
    * apply comments
    
    ---------
    Co-authored-by: default avatarcsric <richcsr256@gmail.com>
    e3551443
1m1t.sh 825 Bytes