Unverified Commit 5134ad5d authored by github-actions[bot]'s avatar github-actions[bot] Committed by GitHub
Browse files

[format] applied code formatting on changed files in pull request 3296 (#3298)


Co-authored-by: default avatargithub-actions <github-actions@github.com>
parent 682af613
...@@ -17,7 +17,7 @@ ...@@ -17,7 +17,7 @@
- [Stage1 - Supervised instructs tuning](#stage1---supervised-instructs-tuning) - [Stage1 - Supervised instructs tuning](#stage1---supervised-instructs-tuning)
- [Stage2 - Training reward model](#stage2---training-reward-model) - [Stage2 - Training reward model](#stage2---training-reward-model)
- [Stage3 - Training model with reinforcement learning by human feedback](#stage3---training-model-with-reinforcement-learning-by-human-feedback) - [Stage3 - Training model with reinforcement learning by human feedback](#stage3---training-model-with-reinforcement-learning-by-human-feedback)
- [Inference - After Training](#inference---after-training) - [Inference - After Training](#inference---after-training)
- [Coati7B examples](#coati7b-examples) - [Coati7B examples](#coati7b-examples)
- [Generation](#generation) - [Generation](#generation)
- [Open QA](#open-qa) - [Open QA](#open-qa)
......
...@@ -100,7 +100,7 @@ Model performance in [Anthropics paper](https://arxiv.org/abs/2204.05862): ...@@ -100,7 +100,7 @@ Model performance in [Anthropics paper](https://arxiv.org/abs/2204.05862):
- --max_len: max sentence length for generation, type=int, default=512 - --max_len: max sentence length for generation, type=int, default=512
- --test: whether is only tesing, if it's ture, the dataset will be small - --test: whether is only tesing, if it's ture, the dataset will be small
## Stage3 - Training model using prompts with RL ## Stage3 - Training model using prompts with RL
Stage3 uses reinforcement learning algorithm, which is the most complex part of the training process, as shown below: Stage3 uses reinforcement learning algorithm, which is the most complex part of the training process, as shown below:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment