• BlueRum's avatar
    [chatgpt]Reward Model Training Process update (#3133) · 7548ca5a
    BlueRum authored
    * add normalize function to value_head in bloom rm
    
    * add normalization to value_function in gpt_rm
    
    * add normalization to value_head of opt_rm
    
    * add Anthropic/hh-rlhf dataset
    
    * Update __init__.py
    
    * Add LogExpLoss in RM training
    
    * Update __init__.py
    
    * update rm trainer to use acc as target
    
    * update example/train_rm
    
    * Update train_rm.sh
    
    * code style
    
    * Update README.md
    
    * Update README.md
    
    * add rm test to ci
    
    * fix tokenier
    
    * fix typo
    
    * change batchsize to avoid oom in ci
    
    * Update test_ci.sh
    7548ca5a
train_rm.sh 357 Bytes