• Wenhao Chen's avatar
    [chat] fix bugs and add unit tests (#4213) · da4f7b85
    Wenhao Chen authored
    * style: rename replay buffer
    
    Experience replay is typically for off policy algorithms.
    Use this name in PPO maybe misleading.
    
    * fix: fix wrong zero2 default arg
    
    * test: update experience tests
    
    * style: rename zero_pad fn
    
    * fix: defer init in CycledDataLoader
    
    * test: add benchmark test
    
    * style: rename internal fn of generation
    
    * style: rename internal fn of lora
    
    * fix: remove unused loss fn
    
    * fix: remove unused utils fn
    
    * refactor: remove generate_with_actor fn
    
    * fix: fix type annotation
    
    * test: add models tests
    
    * fix: skip llama due to long execution time
    
    * style: modify dataset
    
    * style: apply formatter
    
    * perf: update reward dataset
    
    * fix: fix wrong IGNORE_INDEX in sft dataset
    
    * fix: remove DataCollatorForSupervisedDataset
    
    * test: add dataset tests
    
    * style: apply formatter
    
    * style: rename test_ci to test_train
    
    * feat: add llama in inference
    
    * test: add inference tests
    
    * test: change test scripts directory
    
    * fix: update ci
    
    * fix: fix typo
    
    * fix: skip llama due to oom
    
    * fix: fix file mod
    
    * style: apply formatter
    
    * refactor: remove duplicated llama_gptq
    
    * style: apply formatter
    
    * to: update rm test
    
    * feat: add tokenizer arg
    
    * feat: add download model script
    
    * test: update train tests
    
    * fix: modify gemini load and save pretrained
    
    * test: update checkpoint io test
    
    * to: modify nproc_per_node
    
    * fix: do not remove existing dir
    
    * fix: modify save path
    
    * test: add random choice
    
    * fix: fix sft path
    
    * fix: enlarge nproc_per_node to avoid oom
    
    * fix: add num_retry
    
    * fix: make lora config of rm and critic consistent
    
    * fix: add warning about lora weights
    
    * fix: skip some gpt2 tests
    
    * fix: remove grad ckpt in rm and critic due to errors
    
    * refactor: directly use Actor in train_sft
    
    * test: add more arguments
    
    * fix: disable grad ckpt when using lora
    
    * fix: fix save_pretrained and related tests
    
    * test: enable zero2 tests
    
    * revert: remove useless fn
    
    * style: polish code
    
    * test: modify test args
    da4f7b85
train_prompts.py 10.3 KB