• Wenhao Chen's avatar
    [chat]: update rm, add wandb and fix bugs (#4471) · 7b9b8644
    Wenhao Chen authored
    
    
    * feat: modify forward fn of critic and reward model
    
    * feat: modify calc_action_log_probs
    
    * to: add wandb in sft and rm trainer
    
    * feat: update train_sft
    
    * feat: update train_rm
    
    * style: modify type annotation and add warning
    
    * feat: pass tokenizer to ppo trainer
    
    * to: modify trainer base and maker base
    
    * feat: add wandb in ppo trainer
    
    * feat: pass tokenizer to generate
    
    * test: update generate fn tests
    
    * test: update train tests
    
    * fix: remove action_mask
    
    * feat: remove unused code
    
    * fix: fix wrong ignore_index
    
    * fix: fix mock tokenizer
    
    * chore: update requirements
    
    * revert: modify make_experience
    
    * fix: fix inference
    
    * fix: add padding side
    
    * style: modify _on_learn_batch_end
    
    * test: use mock tokenizer
    
    * fix: use bf16 to avoid overflow
    
    * fix: fix workflow
    
    * [chat] fix gemini strategy
    
    * [chat] fix
    
    * sync: update colossalai strategy
    
    * fix: fix args and model dtype
    
    * fix: fix checkpoint test
    
    * fix: fix requirements
    
    * fix: fix missing import and wrong arg
    
    * fix: temporarily skip gemini test in stage 3
    
    * style: apply pre-commit
    
    * fix: temporarily skip gemini test in stage 1&2
    
    ---------
    Co-authored-by: default avatarMingyan Jiang <1829166702@qq.com>
    7b9b8644
train_reward_model.py 7.7 KB