1. 29 Mar, 2024 1 commit
    • YeAnbang's avatar
      [ColossalChat] Update RLHF V2 (#5286) · df5e9c53
      YeAnbang authored
      
      
      * Add dpo. Fix sft, ppo, lora. Refactor all
      
      * fix and tested ppo
      
      * 2 nd round refactor
      
      * add ci tests
      
      * fix ci
      
      * fix ci
      
      * fix readme, style
      
      * fix readme style
      
      * fix style, fix benchmark
      
      * reproduce benchmark result, remove useless files
      
      * rename to ColossalChat
      
      * use new image
      
      * fix ci workflow
      
      * fix ci
      
      * use local model/tokenizer for ci tests
      
      * fix ci
      
      * fix ci
      
      * fix ci
      
      * fix ci timeout
      
      * fix rm progress bar. fix ci timeout
      
      * fix ci
      
      * fix ci typo
      
      * remove 3d plugin from ci temporary
      
      * test environment
      
      * cannot save optimizer
      
      * support chat template
      
      * fix readme
      
      * fix path
      
      * test ci locally
      
      * restore build_or_pr
      
      * fix ci data path
      
      * fix benchmark
      
      * fix ci, move ci tests to 3080, disable fast tokenizer
      
      * move ci to 85
      
      * support flash attention 2
      
      * add all-in-one data preparation script. Fix colossal-llama2-chat chat template
      
      * add hardware requirements
      
      * move ci test data
      
      * fix save_model, add unwrap
      
      * fix missing bos
      
      * fix missing bos; support grad accumulation with gemini
      
      * fix ci
      
      * fix ci
      
      * fix ci
      
      * fix llama2 chat template config
      
      * debug sft
      
      * debug sft
      
      * fix colossalai version requirement
      
      * fix ci
      
      * add sanity check to prevent NaN loss
      
      * fix requirements
      
      * add dummy data generation script
      
      * add dummy data generation script
      
      * add dummy data generation script
      
      * add dummy data generation script
      
      * update readme
      
      * update readme
      
      * update readme and ignore
      
      * fix logger bug
      
      * support parallel_output
      
      * modify data preparation logic
      
      * fix tokenization
      
      * update lr
      
      * fix inference
      
      * run pre-commit
      
      ---------
      Co-authored-by: default avatarTong Li <tong.li352711588@gmail.com>
      df5e9c53
  2. 27 Sep, 2023 1 commit
    • flybird11111's avatar
      [chat] fix gemini strategy (#4698) · be400a09
      flybird11111 authored
      * [chat] fix gemini strategy
      
      * [chat] fix gemini strategy
      
      * [chat] fix gemini strategy
      
      * [chat] fix gemini strategy
      
      * g# This is a combination of 2 commits.
      
      [chat] fix gemini strategy
      
      fox
      
      * [chat] fix gemini strategy
      
      update llama2 example
      
      [chat] fix gemini strategy
      
      * [fix] fix gemini strategy
      
      * [fix] fix gemini strategy
      
      * [fix] fix gemini strategy
      
      * [fix] fix gemini strategy
      
      * [fix] fix gemini strategy
      
      * [fix] fix gemini strategy
      
      * [fix] fix gemini strategy
      
      * [fix] fix gemini strategy
      
      * [fix] fix gemini strategy
      
      * [fix] fix gemini strategy
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * Update train_prompts.py
      be400a09
  3. 20 Sep, 2023 1 commit
    • Wenhao Chen's avatar
      [chat]: update rm, add wandb and fix bugs (#4471) · 7b9b8644
      Wenhao Chen authored
      
      
      * feat: modify forward fn of critic and reward model
      
      * feat: modify calc_action_log_probs
      
      * to: add wandb in sft and rm trainer
      
      * feat: update train_sft
      
      * feat: update train_rm
      
      * style: modify type annotation and add warning
      
      * feat: pass tokenizer to ppo trainer
      
      * to: modify trainer base and maker base
      
      * feat: add wandb in ppo trainer
      
      * feat: pass tokenizer to generate
      
      * test: update generate fn tests
      
      * test: update train tests
      
      * fix: remove action_mask
      
      * feat: remove unused code
      
      * fix: fix wrong ignore_index
      
      * fix: fix mock tokenizer
      
      * chore: update requirements
      
      * revert: modify make_experience
      
      * fix: fix inference
      
      * fix: add padding side
      
      * style: modify _on_learn_batch_end
      
      * test: use mock tokenizer
      
      * fix: use bf16 to avoid overflow
      
      * fix: fix workflow
      
      * [chat] fix gemini strategy
      
      * [chat] fix
      
      * sync: update colossalai strategy
      
      * fix: fix args and model dtype
      
      * fix: fix checkpoint test
      
      * fix: fix requirements
      
      * fix: fix missing import and wrong arg
      
      * fix: temporarily skip gemini test in stage 3
      
      * style: apply pre-commit
      
      * fix: temporarily skip gemini test in stage 1&2
      
      ---------
      Co-authored-by: default avatarMingyan Jiang <1829166702@qq.com>
      7b9b8644
  4. 19 Sep, 2023 1 commit
  5. 15 Sep, 2023 1 commit
  6. 04 Jul, 2023 1 commit
    • Wenhao Chen's avatar
      [chat] use official transformers and fix some issues (#4117) · 3d8d5d0d
      Wenhao Chen authored
      * feat: remove on_learn_epoch fn as not used
      
      * revert: add _on_learn_epoch fn
      
      * feat: remove NaiveStrategy
      
      * test: update train_prompts tests
      
      * fix: remove prepare_llama_tokenizer_and_embedding
      
      * test: add lora arg
      
      * feat: remove roberta support in train_prompts due to runtime errs
      
      * feat: remove deberta & roberta in rm as not used
      
      * test: remove deberta and roberta tests
      
      * feat: remove deberta and roberta models as not used
      
      * fix: remove calls to roberta
      
      * fix: remove prepare_llama_tokenizer_and_embedding
      
      * chore: update transformers version
      
      * docs: update transformers version
      
      * fix: fix actor inference
      
      * fix: fix ci
      
      * feat: change llama pad token to unk
      
      * revert: revert ddp setup_distributed
      
      * fix: change llama pad token to unk
      
      * revert: undo unnecessary changes
      
      * fix: use pip to install transformers
      3d8d5d0d
  7. 29 Jun, 2023 1 commit
    • Wenhao Chen's avatar
      [chat] remove naive strategy and split colossalai strategy (#4094) · edd75a59
      Wenhao Chen authored
      * feat: remove on_learn_epoch fn as not used
      
      * revert: add _on_learn_epoch fn
      
      * to: remove the use of NaiveStrategy
      
      * test: remove NaiveStrategy tests
      
      * feat: remove NaiveStrategy
      
      * style: modify comments and params
      
      * feat: split ColossalAIStrategy into LowLevelZeroStrategy and GeminiStrategy
      
      * fix: remove naive
      
      * fix: align with modified colossal strategy
      
      * fix: fix ddp _try_init_dist arg
      edd75a59
  8. 07 Jun, 2023 1 commit
    • Hongxin Liu's avatar
      [chat] add distributed PPO trainer (#3740) · b5f05663
      Hongxin Liu authored
      
      
      * Detached ppo (#9)
      
      * run the base
      
      * working on dist ppo
      
      * sync
      
      * detached trainer
      
      * update detached trainer. no maker update function
      
      * facing init problem
      
      * 1 maker 1 trainer detached run. but no model update
      
      * facing cuda problem
      
      * fix save functions
      
      * verified maker update
      
      * nothing
      
      * add ignore
      
      * analyize loss issue
      
      * remove some debug codes
      
      * facing 2m1t stuck issue
      
      * 2m1t verified
      
      * do not use torchrun
      
      * working on 2m2t
      
      * working on 2m2t
      
      * initialize strategy in ray actor env
      
      * facing actor's init order issue
      
      * facing ddp model update issue (need unwarp ddp)
      
      * unwrap ddp actor
      
      * checking 1m2t stuck problem
      
      * nothing
      
      * set timeout for trainer choosing. It solves the stuck problem!
      
      * delete some debug output
      
      * rename to sync with upstream
      
      * rename to sync with upstream
      
      * coati rename
      
      * nothing
      
      * I am going to detach the replaybuffer from trainer and make it a Ray Actor. Two benefits: 1. support TP trainer. 2. asynchronized buffer operations
      
      * experience_maker_holder performs target-revolving _send_experience() instead of length comparison.
      
      * move code to ray subfolder
      
      * working on pipeline inference
      
      * apply comments
      
      * working on pipeline strategy. in progress.
      
      * remove pipeline code. clean this branch
      
      * update remote parameters by state_dict. no test
      
      * nothing
      
      * state_dict sharding transfer
      
      * merge debug branch
      
      * gemini _unwrap_model fix
      
      * simplify code
      
      * simplify code & fix LoRALinear AttributeError
      
      * critic unwrapped state_dict
      
      ---------
      Co-authored-by: default avatarcsric <richcsr256@gmail.com>
      
      * [chat] add perfomance evaluator and fix bugs (#10)
      
      * [chat] add performance evaluator for ray
      
      * [chat] refactor debug arg
      
      * [chat] support hf config
      
      * [chat] fix generation
      
      * [chat] add 1mmt dummy example
      
      * [chat] fix gemini ckpt
      
      * split experience to send (#11)
      Co-authored-by: default avatarcsric <richcsr256@gmail.com>
      
      * [chat] refactor trainer and maker (#12)
      
      * [chat] refactor experience maker holder
      
      * [chat] refactor model init
      
      * [chat] refactor trainer args
      
      * [chat] refactor model init
      
      * [chat] refactor trainer
      
      * [chat] refactor experience sending logic and training loop args (#13)
      
      * [chat] refactor experience send logic
      
      * [chat] refactor trainer
      
      * [chat] refactor trainer
      
      * [chat] refactor experience maker
      
      * [chat] refactor pbar
      
      * [chat] refactor example folder (#14)
      
      * [chat] support quant (#15)
      
      * [chat] add quant
      
      * [chat] add quant example
      
      * prompt example (#16)
      
      * prompt example
      
      * prompt load csv data
      
      * remove legacy try
      
      ---------
      Co-authored-by: default avatarcsric <richcsr256@gmail.com>
      
      * [chat] add mmmt dummy example and refactor experience sending (#17)
      
      * [chat] add mmmt dummy example
      
      * [chat] refactor naive strategy
      
      * [chat] fix struck problem
      
      * [chat] fix naive strategy
      
      * [chat] optimize experience maker sending logic
      
      * [chat] refactor sending assignment
      
      * [chat] refactor performance evaluator (#18)
      
      * Prompt Example & requires_grad state_dict & sharding state_dict (#19)
      
      * prompt example
      
      * prompt load csv data
      
      * remove legacy try
      
      * maker models require_grad set to False
      
      * working on zero redundancy update
      
      * mmmt_prompt example; naive strategy requires_grad state_dict & sharding; maker model requires_no_grad.
      
      * remove legacy examples
      
      * remove legacy examples
      
      * remove replay buffer tp state. bad design
      
      ---------
      Co-authored-by: default avatarcsric <richcsr256@gmail.com>
      
      * state_dict sending adapts to new unwrap function (#20)
      
      * prompt example
      
      * prompt load csv data
      
      * remove legacy try
      
      * maker models require_grad set to False
      
      * working on zero redundancy update
      
      * mmmt_prompt example; naive strategy requires_grad state_dict & sharding; maker model requires_no_grad.
      
      * remove legacy examples
      
      * remove legacy examples
      
      * remove replay buffer tp state. bad design
      
      * opt benchmark
      
      * better script
      
      * nothing
      
      * [chat] strategy refactor unwrap model
      
      * [chat] strategy refactor save model
      
      * [chat] add docstr
      
      * [chat] refactor trainer save model
      
      * [chat] fix strategy typing
      
      * [chat] refactor trainer save model
      
      * [chat] update readme
      
      * [chat] fix unit test
      
      * working on lora reconstruction
      
      * state_dict sending adapts to new unwrap function
      
      * remove comments
      
      ---------
      Co-authored-by: default avatarcsric <richcsr256@gmail.com>
      Co-authored-by: default avatarver217 <lhx0217@gmail.com>
      
      * [chat-ray] add readme (#21)
      
      * add readme
      
      * transparent graph
      
      * add note background
      
      ---------
      Co-authored-by: default avatarcsric <richcsr256@gmail.com>
      
      * [chat] get images from url (#22)
      
      * Refactor/chat ray (#23)
      
      * [chat] lora add todo
      
      * [chat] remove unused pipeline strategy
      
      * [chat] refactor example structure
      
      * [chat] setup ci for ray
      
      * [chat-ray] Support LoRA trainer. LoRA weights reconstruction. (#24)
      
      * lora support prototype
      
      * lora support
      
      * 1mmt lora & remove useless code
      
      ---------
      Co-authored-by: default avatarcsric <richcsr256@gmail.com>
      
      * [chat] fix test ci for ray
      
      * [chat] fix test ci requirements for ray
      
      * [chat] fix ray runtime env
      
      * [chat] fix ray runtime env
      
      * [chat] fix example ci docker args
      
      * [chat] add debug info in trainer
      
      * [chat] add nccl debug info
      
      * [chat] skip ray test
      
      * [doc] fix typo
      
      ---------
      Co-authored-by: default avatarcsric <59389055+CsRic@users.noreply.github.com>
      Co-authored-by: default avatarcsric <richcsr256@gmail.com>
      b5f05663