"applications/ColossalQA/colossalqa/vscode:/vscode.git/clone" did not exist on "a799ca343b13665661a5e95f5ad1523457bef2e2"
  1. 29 Mar, 2024 1 commit
    • YeAnbang's avatar
      [ColossalChat] Update RLHF V2 (#5286) · df5e9c53
      YeAnbang authored
      
      
      * Add dpo. Fix sft, ppo, lora. Refactor all
      
      * fix and tested ppo
      
      * 2 nd round refactor
      
      * add ci tests
      
      * fix ci
      
      * fix ci
      
      * fix readme, style
      
      * fix readme style
      
      * fix style, fix benchmark
      
      * reproduce benchmark result, remove useless files
      
      * rename to ColossalChat
      
      * use new image
      
      * fix ci workflow
      
      * fix ci
      
      * use local model/tokenizer for ci tests
      
      * fix ci
      
      * fix ci
      
      * fix ci
      
      * fix ci timeout
      
      * fix rm progress bar. fix ci timeout
      
      * fix ci
      
      * fix ci typo
      
      * remove 3d plugin from ci temporary
      
      * test environment
      
      * cannot save optimizer
      
      * support chat template
      
      * fix readme
      
      * fix path
      
      * test ci locally
      
      * restore build_or_pr
      
      * fix ci data path
      
      * fix benchmark
      
      * fix ci, move ci tests to 3080, disable fast tokenizer
      
      * move ci to 85
      
      * support flash attention 2
      
      * add all-in-one data preparation script. Fix colossal-llama2-chat chat template
      
      * add hardware requirements
      
      * move ci test data
      
      * fix save_model, add unwrap
      
      * fix missing bos
      
      * fix missing bos; support grad accumulation with gemini
      
      * fix ci
      
      * fix ci
      
      * fix ci
      
      * fix llama2 chat template config
      
      * debug sft
      
      * debug sft
      
      * fix colossalai version requirement
      
      * fix ci
      
      * add sanity check to prevent NaN loss
      
      * fix requirements
      
      * add dummy data generation script
      
      * add dummy data generation script
      
      * add dummy data generation script
      
      * add dummy data generation script
      
      * update readme
      
      * update readme
      
      * update readme and ignore
      
      * fix logger bug
      
      * support parallel_output
      
      * modify data preparation logic
      
      * fix tokenization
      
      * update lr
      
      * fix inference
      
      * run pre-commit
      
      ---------
      Co-authored-by: default avatarTong Li <tong.li352711588@gmail.com>
      df5e9c53
  2. 29 Jan, 2024 1 commit
  3. 20 Sep, 2023 1 commit
    • Wenhao Chen's avatar
      [chat]: update rm, add wandb and fix bugs (#4471) · 7b9b8644
      Wenhao Chen authored
      
      
      * feat: modify forward fn of critic and reward model
      
      * feat: modify calc_action_log_probs
      
      * to: add wandb in sft and rm trainer
      
      * feat: update train_sft
      
      * feat: update train_rm
      
      * style: modify type annotation and add warning
      
      * feat: pass tokenizer to ppo trainer
      
      * to: modify trainer base and maker base
      
      * feat: add wandb in ppo trainer
      
      * feat: pass tokenizer to generate
      
      * test: update generate fn tests
      
      * test: update train tests
      
      * fix: remove action_mask
      
      * feat: remove unused code
      
      * fix: fix wrong ignore_index
      
      * fix: fix mock tokenizer
      
      * chore: update requirements
      
      * revert: modify make_experience
      
      * fix: fix inference
      
      * fix: add padding side
      
      * style: modify _on_learn_batch_end
      
      * test: use mock tokenizer
      
      * fix: use bf16 to avoid overflow
      
      * fix: fix workflow
      
      * [chat] fix gemini strategy
      
      * [chat] fix
      
      * sync: update colossalai strategy
      
      * fix: fix args and model dtype
      
      * fix: fix checkpoint test
      
      * fix: fix requirements
      
      * fix: fix missing import and wrong arg
      
      * fix: temporarily skip gemini test in stage 3
      
      * style: apply pre-commit
      
      * fix: temporarily skip gemini test in stage 1&2
      
      ---------
      Co-authored-by: default avatarMingyan Jiang <1829166702@qq.com>
      7b9b8644
  4. 30 Aug, 2023 1 commit
  5. 02 Aug, 2023 1 commit
    • Wenhao Chen's avatar
      [chat] fix bugs and add unit tests (#4213) · da4f7b85
      Wenhao Chen authored
      * style: rename replay buffer
      
      Experience replay is typically for off policy algorithms.
      Use this name in PPO maybe misleading.
      
      * fix: fix wrong zero2 default arg
      
      * test: update experience tests
      
      * style: rename zero_pad fn
      
      * fix: defer init in CycledDataLoader
      
      * test: add benchmark test
      
      * style: rename internal fn of generation
      
      * style: rename internal fn of lora
      
      * fix: remove unused loss fn
      
      * fix: remove unused utils fn
      
      * refactor: remove generate_with_actor fn
      
      * fix: fix type annotation
      
      * test: add models tests
      
      * fix: skip llama due to long execution time
      
      * style: modify dataset
      
      * style: apply formatter
      
      * perf: update reward dataset
      
      * fix: fix wrong IGNORE_INDEX in sft dataset
      
      * fix: remove DataCollatorForSupervisedDataset
      
      * test: add dataset tests
      
      * style: apply formatter
      
      * style: rename test_ci to test_train
      
      * feat: add llama in inference
      
      * test: add inference tests
      
      * test: change test scripts directory
      
      * fix: update ci
      
      * fix: fix typo
      
      * fix: skip llama due to oom
      
      * fix: fix file mod
      
      * style: apply formatter
      
      * refactor: remove duplicated llama_gptq
      
      * style: apply formatter
      
      * to: update rm test
      
      * feat: add tokenizer arg
      
      * feat: add download model script
      
      * test: update train tests
      
      * fix: modify gemini load and save pretrained
      
      * test: update checkpoint io test
      
      * to: modify nproc_per_node
      
      * fix: do not remove existing dir
      
      * fix: modify save path
      
      * test: add random choice
      
      * fix: fix sft path
      
      * fix: enlarge nproc_per_node to avoid oom
      
      * fix: add num_retry
      
      * fix: make lora config of rm and critic consistent
      
      * fix: add warning about lora weights
      
      * fix: skip some gpt2 tests
      
      * fix: remove grad ckpt in rm and critic due to errors
      
      * refactor: directly use Actor in train_sft
      
      * test: add more arguments
      
      * fix: disable grad ckpt when using lora
      
      * fix: fix save_pretrained and related tests
      
      * test: enable zero2 tests
      
      * revert: remove useless fn
      
      * style: polish code
      
      * test: modify test args
      da4f7b85
  6. 04 Jul, 2023 1 commit
    • Wenhao Chen's avatar
      [chat] use official transformers and fix some issues (#4117) · 3d8d5d0d
      Wenhao Chen authored
      * feat: remove on_learn_epoch fn as not used
      
      * revert: add _on_learn_epoch fn
      
      * feat: remove NaiveStrategy
      
      * test: update train_prompts tests
      
      * fix: remove prepare_llama_tokenizer_and_embedding
      
      * test: add lora arg
      
      * feat: remove roberta support in train_prompts due to runtime errs
      
      * feat: remove deberta & roberta in rm as not used
      
      * test: remove deberta and roberta tests
      
      * feat: remove deberta and roberta models as not used
      
      * fix: remove calls to roberta
      
      * fix: remove prepare_llama_tokenizer_and_embedding
      
      * chore: update transformers version
      
      * docs: update transformers version
      
      * fix: fix actor inference
      
      * fix: fix ci
      
      * feat: change llama pad token to unk
      
      * revert: revert ddp setup_distributed
      
      * fix: change llama pad token to unk
      
      * revert: undo unnecessary changes
      
      * fix: use pip to install transformers
      3d8d5d0d
  7. 07 Jun, 2023 1 commit
    • Hongxin Liu's avatar
      [chat] add distributed PPO trainer (#3740) · b5f05663
      Hongxin Liu authored
      
      
      * Detached ppo (#9)
      
      * run the base
      
      * working on dist ppo
      
      * sync
      
      * detached trainer
      
      * update detached trainer. no maker update function
      
      * facing init problem
      
      * 1 maker 1 trainer detached run. but no model update
      
      * facing cuda problem
      
      * fix save functions
      
      * verified maker update
      
      * nothing
      
      * add ignore
      
      * analyize loss issue
      
      * remove some debug codes
      
      * facing 2m1t stuck issue
      
      * 2m1t verified
      
      * do not use torchrun
      
      * working on 2m2t
      
      * working on 2m2t
      
      * initialize strategy in ray actor env
      
      * facing actor's init order issue
      
      * facing ddp model update issue (need unwarp ddp)
      
      * unwrap ddp actor
      
      * checking 1m2t stuck problem
      
      * nothing
      
      * set timeout for trainer choosing. It solves the stuck problem!
      
      * delete some debug output
      
      * rename to sync with upstream
      
      * rename to sync with upstream
      
      * coati rename
      
      * nothing
      
      * I am going to detach the replaybuffer from trainer and make it a Ray Actor. Two benefits: 1. support TP trainer. 2. asynchronized buffer operations
      
      * experience_maker_holder performs target-revolving _send_experience() instead of length comparison.
      
      * move code to ray subfolder
      
      * working on pipeline inference
      
      * apply comments
      
      * working on pipeline strategy. in progress.
      
      * remove pipeline code. clean this branch
      
      * update remote parameters by state_dict. no test
      
      * nothing
      
      * state_dict sharding transfer
      
      * merge debug branch
      
      * gemini _unwrap_model fix
      
      * simplify code
      
      * simplify code & fix LoRALinear AttributeError
      
      * critic unwrapped state_dict
      
      ---------
      Co-authored-by: default avatarcsric <richcsr256@gmail.com>
      
      * [chat] add perfomance evaluator and fix bugs (#10)
      
      * [chat] add performance evaluator for ray
      
      * [chat] refactor debug arg
      
      * [chat] support hf config
      
      * [chat] fix generation
      
      * [chat] add 1mmt dummy example
      
      * [chat] fix gemini ckpt
      
      * split experience to send (#11)
      Co-authored-by: default avatarcsric <richcsr256@gmail.com>
      
      * [chat] refactor trainer and maker (#12)
      
      * [chat] refactor experience maker holder
      
      * [chat] refactor model init
      
      * [chat] refactor trainer args
      
      * [chat] refactor model init
      
      * [chat] refactor trainer
      
      * [chat] refactor experience sending logic and training loop args (#13)
      
      * [chat] refactor experience send logic
      
      * [chat] refactor trainer
      
      * [chat] refactor trainer
      
      * [chat] refactor experience maker
      
      * [chat] refactor pbar
      
      * [chat] refactor example folder (#14)
      
      * [chat] support quant (#15)
      
      * [chat] add quant
      
      * [chat] add quant example
      
      * prompt example (#16)
      
      * prompt example
      
      * prompt load csv data
      
      * remove legacy try
      
      ---------
      Co-authored-by: default avatarcsric <richcsr256@gmail.com>
      
      * [chat] add mmmt dummy example and refactor experience sending (#17)
      
      * [chat] add mmmt dummy example
      
      * [chat] refactor naive strategy
      
      * [chat] fix struck problem
      
      * [chat] fix naive strategy
      
      * [chat] optimize experience maker sending logic
      
      * [chat] refactor sending assignment
      
      * [chat] refactor performance evaluator (#18)
      
      * Prompt Example & requires_grad state_dict & sharding state_dict (#19)
      
      * prompt example
      
      * prompt load csv data
      
      * remove legacy try
      
      * maker models require_grad set to False
      
      * working on zero redundancy update
      
      * mmmt_prompt example; naive strategy requires_grad state_dict & sharding; maker model requires_no_grad.
      
      * remove legacy examples
      
      * remove legacy examples
      
      * remove replay buffer tp state. bad design
      
      ---------
      Co-authored-by: default avatarcsric <richcsr256@gmail.com>
      
      * state_dict sending adapts to new unwrap function (#20)
      
      * prompt example
      
      * prompt load csv data
      
      * remove legacy try
      
      * maker models require_grad set to False
      
      * working on zero redundancy update
      
      * mmmt_prompt example; naive strategy requires_grad state_dict & sharding; maker model requires_no_grad.
      
      * remove legacy examples
      
      * remove legacy examples
      
      * remove replay buffer tp state. bad design
      
      * opt benchmark
      
      * better script
      
      * nothing
      
      * [chat] strategy refactor unwrap model
      
      * [chat] strategy refactor save model
      
      * [chat] add docstr
      
      * [chat] refactor trainer save model
      
      * [chat] fix strategy typing
      
      * [chat] refactor trainer save model
      
      * [chat] update readme
      
      * [chat] fix unit test
      
      * working on lora reconstruction
      
      * state_dict sending adapts to new unwrap function
      
      * remove comments
      
      ---------
      Co-authored-by: default avatarcsric <richcsr256@gmail.com>
      Co-authored-by: default avatarver217 <lhx0217@gmail.com>
      
      * [chat-ray] add readme (#21)
      
      * add readme
      
      * transparent graph
      
      * add note background
      
      ---------
      Co-authored-by: default avatarcsric <richcsr256@gmail.com>
      
      * [chat] get images from url (#22)
      
      * Refactor/chat ray (#23)
      
      * [chat] lora add todo
      
      * [chat] remove unused pipeline strategy
      
      * [chat] refactor example structure
      
      * [chat] setup ci for ray
      
      * [chat-ray] Support LoRA trainer. LoRA weights reconstruction. (#24)
      
      * lora support prototype
      
      * lora support
      
      * 1mmt lora & remove useless code
      
      ---------
      Co-authored-by: default avatarcsric <richcsr256@gmail.com>
      
      * [chat] fix test ci for ray
      
      * [chat] fix test ci requirements for ray
      
      * [chat] fix ray runtime env
      
      * [chat] fix ray runtime env
      
      * [chat] fix example ci docker args
      
      * [chat] add debug info in trainer
      
      * [chat] add nccl debug info
      
      * [chat] skip ray test
      
      * [doc] fix typo
      
      ---------
      Co-authored-by: default avatarcsric <59389055+CsRic@users.noreply.github.com>
      Co-authored-by: default avatarcsric <richcsr256@gmail.com>
      b5f05663
  8. 24 Apr, 2023 1 commit
  9. 18 Apr, 2023 1 commit
    • Camille Zhong's avatar
      Update test_ci.sh · 36a519b4
      Camille Zhong authored
      update
      
      Update test_ci.sh
      
      Update test_ci.sh
      
      Update test_ci.sh
      
      Update test_ci.sh
      
      Update test_ci.sh
      
      Update test_ci.sh
      
      Update run_chatgpt_examples.yml
      
      Update run_chatgpt_examples.yml
      
      Update run_chatgpt_examples.yml
      
      Update run_chatgpt_examples.yml
      
      Update run_chatgpt_examples.yml
      
      Update run_chatgpt_examples.yml
      
      Update test_ci.sh
      
      Update test_ci.sh
      
      update
      
      Update run_chatgpt_examples.yml
      
      Update run_chatgpt_examples.yml
      
      update ci
      
      Update test_ci.sh
      
      Update run_chatgpt_examples.yml
      
      Update run_chatgpt_examples.yml
      
      Update run_chatgpt_examples.yml
      
      Update run_chatgpt_examples.yml
      
      Update run_chatgpt_examples.yml
      
      Update run_chatgpt_examples.yml
      
      Update run_chatgpt_examples.yml
      
      Update test_ci.sh
      
      Update test_ci.sh
      
      Update run_chatgpt_examples.yml
      
      Update test_ci.sh
      
      Update test_ci.sh
      
      Update test_ci.sh
      
      update test ci
      
      RoBERTa for RLHF Stage 2 & 3 (still in testing)
      
      Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)"
      
      This reverts commit 06741d894dcbe958acd4e10d771f22275e20e368.
      
      Add RoBERTa for RLHF stage 2 & 3
      
      1. add roberta folder under model folder
      2. add  roberta option in train_reward_model.py
      3. add some test in testci
      
      Update test_ci.sh
      
      Revert "Update test_ci.sh"
      
      This reverts commit 9c7352b81766f3177d31eeec0ec178a301df966a.
      
      Add RoBERTa for RLHF Stage 2 & 3 (test)
      
      RoBERTa for RLHF Stage 2 & 3 (still in testing)
      
      Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)"
      
      This reverts commit 06741d894dcbe958acd4e10d771f22275e20e368.
      
      Add RoBERTa for RLHF stage 2 & 3
      
      1. add roberta folder under model folder
      2. add  roberta option in train_reward_model.py
      3. add some test in testci
      
      Update test_ci.sh
      
      Revert "Update test_ci.sh"
      
      This reverts commit 9c7352b81766f3177d31eeec0ec178a301df966a.
      
      update roberta with coati
      
      chat ci update
      
      Revert "chat ci update"
      
      This reverts commit 17ae7ae01fa752bd3289fc39069868fde99cf846.
      
      [test]chat_update_ci
      
      Update test_ci.sh
      
      Update test_ci.sh
      
      test
      
      Update gpt_critic.py
      
      Update gpt_critic.py
      
      Update run_chatgpt_unit_tests.yml
      
      update test ci
      
      update
      
      update
      
      update
      
      update
      
      Update test_ci.sh
      
      update
      
      Update test_ci.sh
      
      Update test_ci.sh
      
      Update run_chatgpt_examples.yml
      
      Update run_chatgpt_examples.yml
      36a519b4
  10. 14 Mar, 2023 1 commit
  11. 15 Feb, 2023 2 commits