First add

d3dd8642 · Rayyyyy · d3dd8642 · d3dd8642 · d3dd8642 · d3dd8642
Commit d3dd8642 authored Jun 26, 2024 by Rayyyyy
20 changed files
--- a/docs/checkpoint_convert_me_to_hf.md
+++ b/docs/checkpoint_convert_me_to_hf.md
+
+
+
+
+
+### <font color=#FFC125 >源2.0-MoE 模型</font> 
+
+-----
+**ckpt转换说明**
+### <strong>🔘 ckpt转换</strong> 
+
+我们提供的的模型文件是8路流水并行（8pp）的模型文件，我们提供了自动转换脚本，可以依次执行完转换流程，使用方式如下：
+
+
+
+**<font color=#FFFFF0 >如果提前将8路流水并行合并，可以直接执行： </font>**
+
+
+```sh
+bash examples/convert_hf_moe.sh
+```
+
+在转换时需要
+
+
+**<font color=#FFFFF0 >如果不合并流水，可以按下面的方式进行转换： </font>**
+
+首先执行转换脚本：
+
+```sh
+bash examples/convert_hf_moe.sh
+```
+执行这个脚本，每一路流水对应的.ckpt文件都会生成一个对应的.bin文件，等完成转换之后可以删除这些中间文件。
+
+然后执行下面的命令：
+
+```sh
+python tools/concat.py --input-path $input_path --output-path $output_path --pp_rank 8 --num_layers 24 
+```
+
+这里的`--input-path`设置为上一步中产生的中间文件路径，这个命令会在`--output-path`设置的路径下生成一个完整的.bin文件。
+
+
+### <strong>🔘 bin文件拆分</strong> 
+执行上面的转换命令后会生成一个bin文件，可以执行下面的命令将其拆分：
+```sh
+python tools/split_bin.py --input-path $input_path --output-path $output_path
+```
+
+### <strong>🔘 HF模型推理</strong> 
+
+可以通过如下代码调用YuanMoE模型来生成文本： 
+
+```python
+import torch, transformers
+import sys, os
+sys.path.append(
+    os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir)))
+from transformers import AutoModelForCausalLM,AutoTokenizer,LlamaTokenizer
+
+print("Creat tokenizer...")
+tokenizer = LlamaTokenizer.from_pretrained('IEITYuan/Yuan2-hf-moe', add_eos_token=False, add_bos_token=False, eos_token='<eod>')
+tokenizer.add_tokens(['<sep>', '<pad>', '<mask>', '<predict>', '<FIM_SUFFIX>', '<FIM_PREFIX>', '<FIM_MIDDLE>','<commit_before>','<commit_msg>','<commit_after>','<jupyter_start>','<jupyter_text>','<jupyter_code>','<jupyter_output>','<empty_output>'], special_tokens=True)
+
+print("Creat model...")
+model = AutoModelForCausalLM.from_pretrained('IEITYuan/Yuan2-hf-moe', device_map='auto', torch_dtype=torch.bfloat16, trust_remote_code=True)
+
+inputs = tokenizer("请问目前最先进的机器学习算法有哪些？", return_tensors="pt")["input_ids"].to("cuda:0")
+outputs = model.generate(inputs,do_sample=False,max_length=100)
+print(tokenizer.decode(outputs[0]))
+```
--- a/docs/checkpoint_process.md
+++ b/docs/checkpoint_process.md
+# checkpoint_process
+
+
+## Introduction
+
+The provided 51B ckpt was trained with 16-way pipeline parallelism and 1-way tensor parallelism. The provided 102B ckpt was trained with 32-way pipeline parallelism and 1-way tensor parallelism. 
+
+To efficiently utilize multiple devices in the distributed training process, we provide scripts for merging/spliting tensor and merging pipeline, which can be found in the  **`examples`** directory.
+
+**`examples/split_tp_partitions.sh`**: Split  checkpoint along the tensor.
+
+**`examples/merge_tp_partitions.sh`**: Merge  checkpoint along the tensor.
+
+**`examples/merge_pp_partitions.sh`**: Merge  checkpoint along the pipeline.
+
+The variables in the code should be set as follows:
+
+|Variable name	|Description	|
+|--------------------------|----------------------------------------|
+|`LOAD_CHECKPOINT_PATH`|the path that loads the checkpoint to be splited/merged|
+|`SAVE_CHECKPOINT_PATH`|the storage path of the splited/merged checkpoint|
+|`SAVE_SPLITED_CHECKPOINT_PATH`|the middle storage path of the converted checkpoint|
+|`TOKENIZER_MODEL_PATH`|the path of tokenizer model|
+|`--tensor-model-parallel-size`|the original tensor model parallel size|
+|`--pipeline-model-parallel-size`|the original pipeline model parallel size|
+|`--target-tensor-model-parallel-size`|the target tensor model parallel size|
+|`--target-pipeline-model-parallel-size`|the target pipeline model parallel size|
+|`--pipeline-model-parallel-blocks`|the number of transformer layers specified by the user for each pipeline stage|
+|`--target-pipeline-model-parallel-blocks`|the number of transformer layers specified by the user for each pipeline stage in output model|
+|`--process-checkpoint`|the parameter sets device=None when processing checkpoint|
+|`--pipeline-generate-layer`|the parameter controls which-way pipeline only convert the parameter.|
+|`--tensor-generate-layer`|the parameter controls which-way tensor only convert the parameter.|
+
+## Usage
+
+Run the following command to split checkpoint along tensor:
+```bash
+bash examples/split_tp_partitions.sh
+```
+Run the following command to merge checkpoint along tensor:
+```bash
+bash examples/merge_tp_partitions.sh
+```
+Run the following command to merge checkpoint along pipeline:
+```bash
+bash examples/merge_pp_partitions.sh
+```
+The scirpt for converting the 51B ckpt with 16-way pipeline and 1-way tensor to 4-way tensor and 1-way pipeline is provided: 
+```
+bash examples/ckpt_partitions_51B.sh
+```
+The scirpt for converting the 102B ckpt with 32-way pipeline and 1-way tensor to 8-way tensor and 1-way pipeline is provided:
+```
+bash examples/ckpt_partitions_102B.sh
+```
+
+
+
+There is no fixed order for splitting tensor and merging pipeline. It is generally suggested to split tensor first and then merge pipeline.
+If you want to define the parameters for splitting and merging yourself, you can follow the steps below (take 51Bckpt as an example):
+>**step1 bash examples/split\_tp\_partitions.sh**
+--tensor-model-parallel-size  1
+--target-model-parallel-size  8 
+--pipeline-model-parallel-size 16
+--target-pipeline-model-parallel-size  16
+--pipeline-model-parallel-blocks 2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,2
+--pipeline-generate-layer 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
+**step2 bash examples/merge\_pp\_partitions.sh**
+--tensor-model-parallel-size 8
+--target-model-parallel-size 8
+--pipeline-model-parallel-size 16
+--target-model-parallel-size 2
+--pipeline-model-parallel-blocks 2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,2
+--target-pipeline-model-parallel-blocks 10,32
+--tensor-generate-layer 0,1,2,3,4,5,6,7
+
+When runing step2 after step1, the splited ckpt obtained by step1 need to be as LOAD_CHECKPOINT_PATH in script2. It should be noted that `tensor-model-parallel-size` and `pipeline-model-parallel-size` need to be the same as the number of tensor ways and parallel ways in the loaded checkpoint.
+ 
+
+## Notice
+
+### --pipeline-generate-layer and --tensor-generate-layer
+
+They can used to control which layers only convert the parameter. If all layers are converted, specify all-way pipelines or all-way tensors(for example, 4-way tensor: 0,1,2,3. 1-way tensor:0. 1-way pipeline:0, 8-way pipeline:0,1,2,3,4,5,6,7). If only convert the layers in pipeline stage 0,1, specify the parameter `--pipeline-generate-layer` as 0,1. If only convert the layers in tensor 3,4,5,6, specify the parameter `--tensor-generate-layer` as 3,4,5,6.
+
+### --pipeline-model-parallel-blocks and --target-pipeline-model-parallel-blocks
+
+`--pipeline-model-parallel-blocks` specifys the number of transformer layers for each pipeline stage, and the length of this parameter needs to equal with pipeline-model-parallel-size. `--target-pipeline-model-parallel-blocks` specify the number of transformer layers for each pipeline stage in output model and its length needs to equal with target-pipeline-model-parallel-size.
+
+### WORLD\_SIZE setting
+
+When procesing 51B/102B ckpt, if *'AssertionError: num of args.pipeline\_model\_parallel\_blocks must eq args.pipeline\_model\_parallel\_size'* occurs, it may be because tensor\_model\_parallel\_size * pipeline\_model\_parallel\_size > world\_size. Modifying **os.environ["WORLD\_SIZE"] in scirpt merge\_pp\_partitions.py/split\_tp\_partitions.py** can solve this problem. It is recommended to set it to **256** as it is large enough to cover most cases.
--- a/docs/data_process.md
+++ b/docs/data_process.md
+# data\_process
+
+## Introduction
+
+Since Yuan2.0 runs under the Megatron framework, the text corpus needs to be transformed into token ids and stored in binary files before training. We provide **preprocess\_data\_yuan.py**, a script which helps to efficiently transform texts into token ids, and which is specifically designed for preprocessing Chinese corpus. The script can be found in the 'tools' directory.
+
+The main variables in the code should be set as follows:
+
+| Variable name      | Description                                                                                                                                                                                                                                         |
+| ------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `--input`          | The path where you store the training datasets, the datasets should be stored in .txt files. Note: even there is only one .txt file needs to be processed, the path should be where you put the .txt (i.e. the folder), not the path for the .txt.  |
+| `--data-idx`       | This sets up the indices for the training dataset. If there is just one dataset to convert, the --data-idx should be set to '0'. If there are multiple training datasets, set it as '0-n', where n is the number of training datasets.              |
+| `--tokenizer_path` | The path to import tokenizer files.                                                                                                                                                                                                                 |
+| `--output_path`    | The path where to store the preprocessed dataset, one .idx file and one .bin file would be created for each dataset.                                                                                                                                |
+
+
+
+## Dataset
+
+Samples in dataset should be seperated with '\n', and within each sample, the '\n' should be replaced with '\<n>', therefore each line in the dataset is a single sample. And the program would replace the '\<n>' back to '\n' during preprocessing.&#x20;
+
+For the datasets used to finetune Yuan2.0, you shall put a '\<sep>' between the instruction and the response.&#x20;
+
+The following is an example of samples in finetune dataset:
+
+```text
+John买了3件衬衫，每件售价为20美元。此外，他还需要支付所有商品的10%税款。他总共支付了多少钱？<sep>John购买的3件衬衫的总价为3 \times 20 = 60美元。<n>所有商品的税款为总价的10%，即60 \times 0.1 = 6美元。<n>因此，John总共支付的钱数为60 + 6 = 66美元。
+每年，Dani作为赢得亚马逊季度最佳买家的奖励，会得到4个一对裤子（每对裤子2条）。如果初始时他有50条裤子，计算出5年后他将拥有多少条裤子。<sep>每年Dani会得到4 \times 2 = 8条裤子，因此5年后他将得到8 \times 5 = 40条裤子。<n>那么，5年后他总共拥有的裤子数量为初始时的50条加上5年内得到的40条，即50 + 40 = 90条裤子。<n>因此，5年后他将拥有90条裤子。
+```
+
+
+
+## Usage
+
+Run the following command to initiate data processing.
+
+```text
+python ./tools/preprocess_data_yuan.py --input '<Specify path>' --data-idx '0-42' --tokenizer_path './tokenizer' --output_path '<Specify path>'
+```
+
+If a dataset has already been processed, i.e. its .idx file and .bin file have been existed in the '—output\_path', the program would skip this dataset.&#x20;
+
--- a/docs/data_process_cn.md
+++ b/docs/data_process_cn.md
+# data\_process
+
+## 简介
+
+由于源2.0 是在 Megatron 框架下训练与微调的，因此在训练之前需要将文本语料转换为token id 并存储在 .bin文件中。我们提供的 preprocess\_data\_yuan.py 脚本可以高效地将文本转换为 token id，是专门为预处理中文语料而设计的。该脚本可在 "tools "目录下找到。
+
+preprocess\_data\_yuan.py脚本中的主要参数说明如下：
+
+| 参数名称               | 参数描述                                                                                    |
+| ------------------ | --------------------------------------------------------------------------------------- |
+| `--input`          | 存储训练数据集的路径，数据集应存储为 .txt 文件。注意：即使仅有一个.txt文件需要处理，此处输入的也应该是数据存储路径（文件夹），不是.txt的路径。          |
+| `--data-idx`       | 此参数设置训练数据集的索引。如果只有一个数据集需要转换，则 --data-idx 应设置为 "0"。如果有多个训练数据集，则设置为 "0-n"，其中 n 是训练数据集的数量。 |
+| `--tokenizer_path` | 导入tokenizer文件的路径。                                                                       |
+| `--output_path`    | 完成预处理后的数据集的输出路径，每个数据集将创建一个 .idx 文件和一个 .bin 文件。                                          |
+
+
+
+## 数据集
+
+数据集中的样本应该用"\n"分隔，在每个样本中，"\n"应该替换为"\<n>"，因此数据集中的每一行都是一个样本。程序在预处理时会把"\<n>"替换回"\n"。
+
+对于用于微调 Yuan2.0 的数据集，应在Instruction与Response之间加上"\<sep>"。
+
+以下是两条微调数据集的样例：
+
+```text
+John买了3件衬衫，每件售价为20美元。此外，他还需要支付所有商品的10%税款。他总共支付了多少钱？<sep>John购买的3件衬衫的总价为3 \times 20 = 60美元。<n>所有商品的税款为总价的10%，即60 \times 0.1 = 6美元。<n>因此，John总共支付的钱数为60 + 6 = 66美元。
+每年，Dani作为赢得亚马逊季度最佳买家的奖励，会得到4个一对裤子（每对裤子2条）。如果初始时他有50条裤子，计算出5年后他将拥有多少条裤子。<sep>每年Dani会得到4 \times 2 = 8条裤子，因此5年后他将得到8 \times 5 = 40条裤子。<n>那么，5年后他总共拥有的裤子数量为初始时的50条加上5年内得到的40条，即50 + 40 = 90条裤子。<n>因此，5年后他将拥有90条裤子。
+```
+
+
+
+## 使用
+
+运行一下命令开始预处理数据：
+
+```text
+python ./tools/preprocess_data_yuan.py --input '<Specify path>' --data-idx '0-42' --tokenizer_path './tokenizer' --output_path '<Specify path>'
+```
+
+如果数据集已被处理，即其 .idx 文件和 .bin 文件已存在于"-output\_path"（输出路径）中，程序将跳过该数据集。
+
--- a/docs/distrib_optimizer.md
+++ b/docs/distrib_optimizer.md
+# Distributed Optimizer
+
+The motivation for the distributed optimizer is to save memory by distributing the optimizer state evenly across data parallel ranks, versus the current method of replicating the optimizer state across data parallel ranks. As described in https://arxiv.org/abs/1910.02054, this branch specifically implements the following:
+
+- [yes] distribute all 'non-overlapping' optimizer state (i.e., model params already in fp32 are NOT distributed)
+- [no] distribute model gradients
+- [no] distribute model parameters
+
+Theoretical memory savings vary depending on the combination of the model's param dtype and grad dtype. In the current implementation, the theoretical number of bytes per parameter is (where 'd' is the data parallel size):
+
+|        | Non-distributed optim | Distributed optim |
+| ------ | ------ | ------ |
+| float16 param, float16 grads | 20 | 4 + 16/d |
+| float16 param, fp32 grads    | 18 | 6 + 12/d |
+| fp32 param, fp32 grads       | 16 | 8 + 8/d  |
+
+The implementation of the distributed optimizer is centered on using the contiguous grad buffer for communicating grads & params between the model state and the optimizer state. The grad buffer at any given moment either holds:
+
+1. all model grads
+2. a 1/d size _copy_ of the main grads (before copying to the optimizer state)
+3. a 1/d size _copy_ of the main params (after copying from the optimizer state)
+4. all model params
+5. zeros (or None), between iterations
+
+The grad buffer is used for performing reduce-scatter and all-gather operations, for passing grads & params between the model state and optimizer state. With this implementation, no dynamic buffers are allocated.
+
+The figures below illustrate the grad buffer's sharding scheme, and the key steps of the distributed optimizer's param update:
+
+## Data flow
+
+![Data flow](images/distrib_optimizer/data_flow.png)
+
+## Sharding scheme
+
+![Sharding scheme](images/distrib_optimizer/sharding_scheme.png)
+
+## Key steps
+
+_(note: using illustrations above, and assuming fp16 grads)_
+
+- Backward pass finishes (grad buffer holds 16 fp16 grad elements)
+- Call reduce-scatter on each DP rank
+- Each DP rank now has 4 elements within the grad buffer that are fully reduced (remaining 12 elements are garbage)
+- Each DP rank copies its relevant 4 fp16 grad elements from the grad buffer into 4 fp32 main grad elements (separate buffer, owned by the optimizer); i.e.
+  - DP rank 0 copies elements [0:4]
+  - DP rank 1 copies elements [4:8]
+  - DP rank 2 copies elements [8:12]
+  - DP rank 3 copies elements [12:16]
+- Optimizer.step()
+- Each DP rank copies its 4 fp32 main (/optimizer) param elements into the corresponding 4 fp16 elements in the grad buffer
+- Call all-gather on each DP rank
+- Grad buffer now contains all 16, fully updated, fp16 model param elements
+- Copy updated model params from grad buffer into their respective param tensors
+- (At this point, grad buffer is ready to be zero'd for the next iteration)
--- a/docs/eval_arc.md
+++ b/docs/eval_arc.md
+# eval_arc
+
+## Dataset
+**`datasets/ARC/ARC_challenge.txt`.** ARC_challenge test set containing 2,344 multiple choice questions.
+
+In the text, the content before `[SEP]` is the question, and the content after `[SEP]` is the standard answer to that question.
+
+## Evaluation
+
+### Introduction
+**`examples/eval_arc_2x32B.sh`.** The evaluation results for ARC_challenge could be obtained by running this program. 
+
+The variables in the code should be set as follows: 
+
+| Variable name               | Description          |
+| ------------------- | --------------------------------------------- |
+| `CHECKPOINT_PATH`    | the path that saves the checkpoint to be evaluated.       |
+| `TOKENIZER_MODEL_PATH`    | the path that saves the tokenizer.                  |
+| `MATH_DATA`    | the path that saves the evaluation set.                  |
+| `OUTPUT_PATH`    | the path that saves the evaluation results.                  |
+
+### Usage
+
+Run the following command to evaluate the model's performance on the test set:
+```
+bash -x examples/eval_arc_2x32B.sh
+```
+
+### Result
+The evaluation result will be saved in the path of `OUTPUT_PATH`. In the text, the content before `[SEP]` is the question, and the content after `[SEP]` is the answer of our model to that question.
+
+## Accuracy
+### Introduction
+**`tasks/GSM8K/score_arc.py`.** The accuracy of evaluation results for ARC_challenge could be obtained by running this program.
+
+The path variables in the code should be set as follows: 
+
+| Variable name               | Description          |
+| ------------------- | --------------------------------------------- |
+| `origin_file_path`  | Path of evaluation set file.                 |
+| `eval_file_path`    | Path for saving the evaluation result file.                  |
+| `txt_eval_res_dir`  | Path for storing distinguished results. Files ending with _true contain correctly results, while those ending in _false contain incorrectly results. |
+
+### Usage
+Run the following command to evaluate the model's performance on the test set:
+```
+python score_arc.py
+```
+### Result
+"Number of correct answers" and "Number of incorrect answers" respectively represent the number of correct answers and the number of incorrect answers, while "accuracy" indicates the accuracy . 
+
--- a/docs/eval_arc_cn.md
+++ b/docs/eval_arc_cn.md
+# eval_arc
+
+## 数据集
+**`datasets/ARC/ARC_challenge.txt`.** ARC_challenge测试集，共包含2344个选择题。
+
+其中，“[SEP]”之前的内容是原始问题，“[SEP]”之后的内容是该问题的标准答案。
+
+## 评测
+
+### 说明
+**`examples/eval_arc_2x32B.sh`.** 运行该程序即可获得模型在ARC_challenge数据集上的推理结果。
+
+代码中的变量设置如下：
+
+| 变量名            | 解释          |
+| ------------------- | --------------------------------------------- |
+| `CHECKPOINT_PATH`    | 待评测checkpoint的路径 |
+| `TOKENIZER_MODEL_PATH` | tokenizer的路径          |
+| `MATH_DATA`    | 待测试数据集的路径       |
+| `OUTPUT_PATH`    | 推理结果的保存路径         |
+
+### 运行
+
+运行以下命令获得推理结果：
+```
+bash -x examples/eval_arc_2x32B.sh
+```
+
+### 结果
+评测结果将保存在 `OUTPUT_PATH`中。其中，“[SEP]”之前的内容为原始问题，“[SEP]”之后的内容是模型对该问题的解析。
+
+## 准确率
+### 说明
+**`tasks/ARC/score_arc.py`.** 运行该程序即可获得ARC_challenge评测结果的准确率。
+
+代码中的变量设置如下：
+
+| 变量名称               | 说明          |
+| ------------------- | --------------------------------------------- |
+| `origin_file_path`  | 测试集的保存路径               |
+| `eval_file_path`    | 评测结果文件的保存路径       |
+| `txt_eval_res_dir`  | 准确率评判结果的保存路径，以"true"结尾的文件中为正确结果，以"false"结尾的文件中为错误结果。 |
+
+### 运行
+执行以下命令以评估模型在测试集上的准确率：
+```
+python score_arc.py
+```
+### 结果
+“Number of correct answers”和“Number of incorrect answers”分别表示回答正确答案数和回答错误答案数，“accuracy”表示准确率。
+
--- a/docs/eval_gsm8k.md
+++ b/docs/eval_gsm8k.md
+# eval_gms8k
+
+## Dataset
+**`datasets/GSM8K/gsm8k.txt`.** The original English version of the [gsm8k](https://github.com/openai/grade-school-math) test set containing 1,319 questions.
+
+In the text, the content before `[SEP]` is the question, and the content after `[SEP]` is the standard answer to that question.
+
+## Evaluation
+
+### Introduction
+**`examples/eval_gsm8k_2x32B.sh`.** The evaluation results for gsm8k could be obtained by running this program. 
+
+The variables in the code should be set as follows: 
+
+| Variable name               | Description          |
+| ------------------- | --------------------------------------------- |
+| `CHECKPOINT_PATH`    | the path that saves the checkpoint to be evaluated.       |
+| `TOKENIZER_MODEL_PATH`    | the path that saves the tokenizer.                  |
+| `MATH_DATA`    | the path that saves the evaluation set.                  |
+| `OUTPUT_PATH`    | the path that saves the evaluation results.                  |
+
+### Usage
+
+Run the following command to evaluate the model's performance on the test set:
+```
+bash -x examples/eval_gsm8k_2x32B.sh
+```
+
+### Result
+The evaluation result will be saved in the path of `OUTPUT_PATH`. In the text, the content before `[SEP]` is the question, and the content after `[SEP]` is the answer of our model to that question.
+
+## Accuracy
+### Introduction
+**`tasks/GSM8K/score_gsm8k.py`.** The accuracy of evaluation results for gsm8k could be obtained by running this program.
+
+The path variables in the code should be set as follows: 
+
+| Variable name               | Description          |
+| ------------------- | --------------------------------------------- |
+| `origin_file_path`  | Path of evaluation set file.                 |
+| `eval_file_path`    | Path for saving the evaluation result file.                  |
+| `txt_eval_res_dir`  | Path for storing distinguished results. Files ending with _true contain correctly results, while those ending in _false contain incorrectly results. |
+
+### Usage
+Run the following command to evaluate the model's performance on the test set:
+```
+python score_gsm8k.py
+```
+### Result
+"Number of correct answers" and "Number of incorrect answers" respectively represent the number of correct answers and the number of incorrect answers, while "accuracy" indicates the accuracy . 
+
--- a/docs/eval_gsm8k_cn.md
+++ b/docs/eval_gsm8k_cn.md
+# eval_gms8k
+
+## 数据集
+**`datasets/GSM8K/gsm8k.txt`.** 原始[gsm8k](https://github.com/openai/grade-school-math)测试集，共包含1319个数学问题。
+
+其中，“[SEP]”之前的内容是原始问题，“[SEP]”之后的内容是该问题的标准答案。
+
+## 评测
+
+### 说明
+**`examples/eval_gsm8k_2x32B.sh`.** 运行该程序即可获得模型在gsm8k数据集上的推理结果。
+
+代码中的变量设置如下：
+
+| 变量名            | 解释          |
+| ------------------- | --------------------------------------------- |
+| `CHECKPOINT_PATH`    | 待评测checkpoint的路径 |
+| `TOKENIZER_MODEL_PATH` | tokenizer的路径          |
+| `MATH_DATA`    | 待测试数据集的路径       |
+| `OUTPUT_PATH`    | 推理结果的保存路径         |
+
+### 运行
+
+运行以下命令获得推理结果：
+```
+bash -x examples/eval_gsm8k_2x32B.sh
+```
+
+### 结果
+评测结果将保存在 `OUTPUT_PATH`中。其中，“[SEP]”之前的内容为原始问题，“[SEP]”之后的内容是模型对该问题的解析。
+
+## 准确率
+### 说明
+**`tasks/GSM8K/score_gsm8k.py`.** 运行该程序即可获得gsm8k评测结果的准确率。
+
+代码中的变量设置如下：
+
+| 变量名称               | 说明          |
+| ------------------- | --------------------------------------------- |
+| `origin_file_path`  | 测试集的保存路径               |
+| `eval_file_path`    | 评测结果文件的保存路径       |
+| `txt_eval_res_dir`  | 准确率评判结果的保存路径，以"true"结尾的文件中为正确结果，以"false"结尾的文件中为错误结果。 |
+
+### 运行
+执行以下命令以评估模型在测试集上的准确率：
+```
+python score_gsm8k.py
+```
+### 结果
+“Number of correct answers”和“Number of incorrect answers”分别表示回答正确答案数和回答错误答案数，“accuracy”表示准确率。
+
--- a/docs/eval_humaneval.md
+++ b/docs/eval_humaneval.md
+# eval\_humaneval
+
+## Dataset
+
+**datasets/HUMANEVAL/HumanEval.jsonl.gz** The original English version of the [HumanEval](https://github.com/openai/human-eval "HumanEval") dataset containing 164 questions.
+
+**datasets/HUMANEVAL/HumanEval-textprompts.jsonl** The Chinese version of the HumanEval dataset obtained by translation with the aid of gpt-4 model.
+
+**datasets/HUMANEVAL/HumanEval-instructions.jsonl** The HumanEval dataset in instruction style.
+
+**datasets/HUMANEVAL/HumanEval-instructions-fewshot.jsonl** The HumanEval dataset in instruction style with few-shot prompt.
+
+## Evaluation
+
+### Introduction
+
+**examples/eval\_humaneval.sh.** The evaluation results for HumanEval on Yuan2.0-M32 model could be obtained by running this program.
+
+Before running the evaluation program, you have to specify the following checkpoint\_path in bash script by yourself.&#x20;
+
+| Variable name     | Description                                         |
+| ----------------- | --------------------------------------------------- |
+| `CHECKPOINT_PATH` | the path that saves the checkpoint to be evaluated. |
+
+### Requirement
+
+Make sure HumanEval program is installed befere running the HumanEval evaluation on Yuan2.0-M32 checkpoint.&#x20;
+
+```text
+$ git clone https://github.com/openai/human-eval
+$ pip install -e human-eval
+```
+
+After HumanEval program is installed, we shall go to this script,
+
+```纯文本
+/usr/local/lib/python3.10/dist-packages/human_eval-1.0-py3.10.egg/human_eval/execution.py
+```
+
+and make the following change on "check\_program" variable in "check\_correctness" function, to ensure there is no duplicate function signature in generated codes.
+
+```text
+check_program = (
+    #problem["prompt"] +
+    completion + "\n" +
+    problem["test"] + "\n" +
+    f"check({problem['entry_point']})"
+)
+
+```
+
+Also, if you are new to HumanEval, you have to delete the extra "#" in "check\_program", right before  the line "exec(check\_program, exec\_globals)".
+
+```text
+# WARNING
+# This program exists to execute untrusted model-generated code. Although
+# it is highly unlikely that model-generated code will do something overtly
+# malicious in response to this test suite, model-generated code may act
+# destructively due to a lack of model capability or alignment.
+# Users are strongly encouraged to sandbox this evaluation suite so that it
+# does not perform destructive actions on their host or network. For more
+# information on how OpenAI sandboxes its code, see the accompanying paper.
+# Once you have read this disclaimer and taken appropriate precautions,
+# uncomment the following line and proceed at your own risk:
+                         exec(check_program, exec_globals)
+                result.append("passed")
+            except TimeoutException:
+                result.append("timed out")
+            except BaseException as e:
+                result.append(f"failed: {e}")
+
+```
+
+
+
+### Usage
+
+Run the following commands to evaluate the Yuan2.0-M32 model's performance on HumanEval dataset. Before running the bash script, you shall change directory to main directory of 'Yuan2.0-M32', and you have to specify the the checkpoit path where you store the checkpoint in the bash script.
+
+
+
+Evaluate MOE model on HumanEval dataset.
+
+```纯文本
+cd <Specify Path>/Yuan2.0-M32/
+bash examples/eval_humaneval_2x32.sh
+```
+
+### Results
+
+The evaluation results will be gathered in samples.jsonl in \$OUTPUT\_PATH. After the generation of all the tasks done, the "evaluate\_functional\_correctness" function of HumanEval would automatically evaluate the results and return the accuracy.
+
--- a/docs/eval_humaneval_cn.md
+++ b/docs/eval_humaneval_cn.md
+# eval\_humaneval\_cn
+
+## 评测数据集
+
+**datasets/HUMANEVAL/HumanEval.jsonl.gz** 英文原版的 [HumanEval](https://github.com/openai/human-eval "HumanEval") 评测数据集包含164 道问题。
+
+**datasets/HUMANEVAL/HumanEval-textprompts.jsonl** 借助 gpt-4 翻译获得的中文版 HumanEval 数据集。
+
+**datasets/HUMANEVAL/HumanEval-instructions.jsonl**  处理为指令跟随形式的HumanEval 数据集。
+
+**datasets/HUMANEVAL/HumanEval-instructions.jsonl**  处理为指令跟随形式的HumanEval 数据集，并在提示词中加入了few-shot。
+
+## 评测
+
+### 简介
+
+**examples/eval\_humaneval\_2xM32.sh.** 通过运行该程序，可以获得Yuan2.0-M32 模型在 HumanEval 评测数据集的评估结果。
+
+在运行评测程序之前，你仅需在 bash 脚本中指定以下 checkpoint\_path参数，其他必要的路径已经设置好了：
+
+| 参数名称              | 参数描述                 |
+| ----------------- | -------------------- |
+| `CHECKPOINT_PATH` | 待评测的checkpoint的保存路径。 |
+
+### 环境要求
+
+在 Yuan2.0-M32 checkpoint上运行 HumanEval 评测之前，确保已安装 HumanEval 程序。
+
+```text
+$ git clone https://github.com/openai/human-eval
+$ pip install -e human-eval
+```
+
+安装好HumanEval后，请移步至此脚本：
+
+```纯文本
+/usr/local/lib/python3.10/dist-packages/human_eval-1.0-py3.10.egg/human_eval/execution.py
+```
+
+并对 "check\_correctness "函数中的 "check\_program "变量作如下修改，以确保生成的代码中没有重复的函数签名。
+
+```text
+check_program = (
+    #problem["prompt"] +
+    completion + "\n" +
+    problem["test"] + "\n" +
+    f"check({problem['entry_point']})"
+)
+
+```
+
+此外，如果您是第一次使用 HumanEval ，必须删除 "check\_program "中多余的 "#"，就在 "exec(check\_program, exec\_globals) "行之前。
+
+```text
+# WARNING
+# This program exists to execute untrusted model-generated code. Although
+# it is highly unlikely that model-generated code will do something overtly
+# malicious in response to this test suite, model-generated code may act
+# destructively due to a lack of model capability or alignment.
+# Users are strongly encouraged to sandbox this evaluation suite so that it
+# does not perform destructive actions on their host or network. For more
+# information on how OpenAI sandboxes its code, see the accompanying paper.
+# Once you have read this disclaimer and taken appropriate precautions,
+# uncomment the following line and proceed at your own risk:
+                         exec(check_program, exec_globals)
+                result.append("passed")
+            except TimeoutException:
+                result.append("timed out")
+            except BaseException as e:
+                result.append(f"failed: {e}")
+
+```
+
+
+
+### 使用
+
+运行以下命令评测 Yuan2.0-M32 模型在 HumanEval 数据集上的表现。运行 bash 脚本前，应将目录更改为 "Yuan2.0-M32 "主目录，并且需要在 bash 脚本中指定存放 checkpoint的路径。
+
+
+
+在HumanEval数据集上评测Yuan2.0-M32模型：
+
+```纯文本
+cd <Specify Path>/Yuan2.0-M32/
+bash examples/eval_humaneval_2x32.sh
+```
+
+### 结果
+
+评测结果将收集在 \$OUTPUT\_PATH 中的 samples.jsonl 中。生成所有任务后，HumanEval 的 "evaluate\_functional\_correctness "函数将自动评测结果并返回准确度。
+
--- a/docs/eval_math.md
+++ b/docs/eval_math.md
+# eval_math
+
+## Dataset
+**`datasets/MATH/math.txt`.** math test set containing 458 questions.
+
+In the text, the content before `[SEP]` is the question, and the content after `[SEP]` is the standard answer to that question.
+
+## Evaluation
+
+### Introduction
+**`examples/eval_math_2x32B.sh`.** The evaluation results for math could be obtained by running this program. 
+
+The variables in the code should be set as follows: 
+
+| Variable name               | Description          |
+| ------------------- | --------------------------------------------- |
+| `CHECKPOINT_PATH`    | the path that saves the checkpoint to be evaluated.       |
+| `TOKENIZER_MODEL_PATH`    | the path that saves the tokenizer.                  |
+| `MATH_DATA`    | the path that saves the evaluation set.                  |
+| `OUTPUT_PATH`    | the path that saves the evaluation results.                  |
+
+### Usage
+
+Run the following command to evaluate the model's performance on the test set:
+```
+bash -x examples/eval_math_2x32B.sh
+```
+
+### Result
+The evaluation result will be saved in the path of `OUTPUT_PATH`. In the text, the content before `[SEP]` is the question, and the content after `[SEP]` is the answer of our model to that question.
+
+## Accuracy
+### Introduction
+**`tasks/MATH/score_math.py`.** The accuracy of evaluation results for math could be obtained by running this program.
+
+The path variables in the code should be set as follows: 
+
+| Variable name               | Description          |
+| ------------------- | --------------------------------------------- |
+| `origin_file_path`  | Path of evaluation set file.                 |
+| `eval_file_path`    | Path for saving the evaluation result file.                  |
+| `txt_eval_res_dir`  | Path for storing distinguished results. Files ending with _true contain correctly results, while those ending in _false contain incorrectly results. |
+
+### Usage
+Run the following command to evaluate the model's performance on the test set:
+```
+python score_math.py
+```
+### Result
+"Number of correct answers" and "Number of incorrect answers" respectively represent the number of correct answers and the number of incorrect answers, while "accuracy" indicates the accuracy . 
+
--- a/docs/eval_math_cn.md
+++ b/docs/eval_math_cn.md
+# eval_math
+
+## 数据集
+**`datasets/MATH/math.txt`.** math测试集，共包含458个问题。
+
+其中，“[SEP]”之前的内容是原始问题，“[SEP]”之后的内容是该问题的标准答案。
+
+## 评测
+
+### 说明
+**`examples/eval_math_2x32B.sh`.** 运行该程序即可获得模型在math数据集上的推理结果。
+
+代码中的变量设置如下：
+
+| 变量名            | 解释          |
+| ------------------- | --------------------------------------------- |
+| `CHECKPOINT_PATH`    | 待评测checkpoint的路径 |
+| `TOKENIZER_MODEL_PATH` | tokenizer的路径          |
+| `MATH_DATA`    | 待测试数据集的路径       |
+| `OUTPUT_PATH`    | 推理结果的保存路径         |
+
+### 运行
+
+运行以下命令获得推理结果：
+```
+bash -x examples/eval_math_2x32B.sh
+```
+
+### 结果
+评测结果将保存在 `OUTPUT_PATH`中。其中，“[SEP]”之前的内容为原始问题，“[SEP]”之后的内容是模型对该问题的解析。
+
+## 准确率
+### 说明
+**`tasks/MATH/score_math.py`.** 运行该程序即可获得math评测结果的准确率。
+
+代码中的变量设置如下：
+
+| 变量名称               | 说明          |
+| ------------------- | --------------------------------------------- |
+| `origin_file_path`  | 测试集的保存路径               |
+| `eval_file_path`    | 评测结果文件的保存路径       |
+| `txt_eval_res_dir`  | 准确率评判结果的保存路径，以"true"结尾的文件中为正确结果，以"false"结尾的文件中为错误结果。 |
+
+### 运行
+执行以下命令以评估模型在测试集上的准确率：
+```
+python score_math.py
+```
+### 结果
+“Number of correct answers”和“Number of incorrect answers”分别表示回答正确答案数和回答错误答案数，“accuracy”表示准确率。
+
--- a/docs/eval_mmlu.md
+++ b/docs/eval_mmlu.md
+# eval_mmlu
+
+## Dataset
+**`datasets/MMLU/`.** Test set containing 57 subtasks.
+
+In every text, the content before `[SEP]` is the question, and the content after `[SEP]` is the standard answer to that question.
+
+## Evaluation
+
+### Introduction
+**`examples/eval_mmlu_2x32B.sh`.** The evaluation results for mmlu could be obtained by running this program. 
+
+The variables in the code should be set as follows: 
+
+| Variable name               | Description          |
+| ------------------- | --------------------------------------------- |
+| `CHECKPOINT_PATH`    | the path that saves the checkpoint to be evaluated.       |
+| `TOKENIZER_MODEL_PATH`    | the path that saves the tokenizer.                  |
+| `MATH_DATA`    | the path that saves the evaluation set.                  |
+| `OUTPUT_PATH`    | the path that saves the evaluation results.                  |
+
+### Usage
+
+Run the following command to evaluate the model's performance on the test set:
+```
+bash -x examples/eval_mmlu_2x32B.sh
+```
+
+### Result
+The evaluation result will be saved in the path of `OUTPUT_PATH`. In the text, the content before `[SEP]` is the question, and the content after `[SEP]` is the answer of our model to that question.
+
+## Accuracy
+### Introduction
+**`tasks/MMLU/score_mmlu.py`.** The accuracy of evaluation results for mmlu could be obtained by running this program.
+
+The path variables in the code should be set as follows: 
+
+| Variable name               | Description          |
+| ------------------- | --------------------------------------------- |
+| `origin_file_path`  | Path of evaluation set file.                 |
+| `eval_file_path`    | Path for saving the evaluation result file.                  |
+| `txt_eval_res_dir`  | Path for storing distinguished results. Files ending with _true contain correctly results, while those ending in _false contain incorrectly results. The CSV file and jsonl file store accuracy statistics for each subtask. |
+
+### Usage
+Run the following command to evaluate the model's performance on the test set:
+```
+python score_mmlu.py
+```
+### Result
+"Number of correct answers" and "Number of incorrect answers" respectively represent the total count of correct and incorrect answers for all tasks, while "accuracy" represents the overall accuracy.
+
--- a/docs/eval_mmlu_cn.md
+++ b/docs/eval_mmlu_cn.md
+# eval_mmlu
+
+## 数据集
+**`datasets/MMLU/`.** 包含了 57 个子任务的英文评测数据集。
+
+在每一个文本中，“[SEP]”之前的内容是原始问题，“[SEP]”之后的内容是该问题的标准答案。
+
+## 评测
+
+### 说明
+**`examples/eval_mmlu_2x32B.sh`.** 运行该程序即可获得模型在mmlu数据集上的推理结果。
+
+代码中的变量设置如下：
+
+| 变量名            | 解释          |
+| ------------------- | --------------------------------------------- |
+| `CHECKPOINT_PATH`    | 待评测checkpoint的路径 |
+| `TOKENIZER_MODEL_PATH` | tokenizer的路径          |
+| `MATH_DATA`    | 待测试数据集的路径       |
+| `OUTPUT_PATH`    | 推理结果的保存路径         |
+
+### 运行
+
+运行以下命令获得推理结果：
+```
+bash -x examples/eval_mmlu_2x32B.sh
+```
+
+### 结果
+评测结果将保存在 `OUTPUT_PATH`中。其中，“[SEP]”之前的内容为原始问题，“[SEP]”之后的内容是模型对该问题的解析。
+
+## 准确率
+### 说明
+**`tasks/MMLU/score_mmlu.py`.** 运行该程序即可获得mmlu评测结果的准确率。
+
+代码中的变量设置如下：
+
+| 变量名称               | 说明          |
+| ------------------- | --------------------------------------------- |
+| `origin_file_path`  | 测试集的保存路径               |
+| `eval_file_path`    | 评测结果文件的保存路径       |
+| `txt_eval_res_dir`  | 准确率评判结果的保存路径，以"true"结尾的文件中为正确结果，以"false"结尾的文件中为错误结果, csv和jsonl文件储存了每个子任务的准确率统计 |
+
+### 运行
+执行以下命令以评估模型在测试集上的准确率：
+```
+python score_mmlu.py
+```
+### 结果
+“Number of correct answers”和“Number of incorrect answers”分别表示所有任务总体回答正确答案数和回答错误答案数，“accuracy”表示总体准确率。
+
--- a/docs/images/distrib_optimizer/data_flow.png
+++ b/docs/images/distrib_optimizer/data_flow.png
--- a/docs/images/distrib_optimizer/sharding_scheme.png
+++ b/docs/images/distrib_optimizer/sharding_scheme.png
--- a/docs/inference_server.md
+++ b/docs/inference_server.md
+# Yuan2.0 Inference-Server
+
+## Introduction
+
+This document provides instructions for inference-server of Yuan2.0.
+
+
+  - [CKPT model Inference-Server](#CKPT model Inference-Server)
+  - [HuggingFace model Inference-Server](#HuggingFace model Inference-Server)
+  - [API Testing](#API Testing)
+  
+
+## CKPT model Inference-Server
+
+- First step，modify the script file
+
+   	`TOKENIZER_MODEL_PATH` indicates the storage path for TOKENIZER related files；
+   	`CHECKPOINT_PATH` indicates the storage path for model related files；
+   	`GPUS_PER_NODE` indicates the number of GPU cards used for this node, this number should be consistent with the number of parallel paths for model tensors；
+   	`CUDA_VISIBLE_DEVICES` indicates the GPU number used, the number of used numbers should be consistent with `GPUS_PER_NODE` ；
+   	`PORT` indicates the port number used by the service, one service occupies one port number, the user can modify it according to the actual situation.
+  
+- Second step,  run the script in the warehouse for deployment
+    ```bash
+    #2.1B deployment command
+    bash examples/run_inference_server_2.1B.sh
+    
+    #51B deployment command
+    bash examples/run_inference_server_51B.sh
+    
+    #102B deployment command
+    bash examples/run_inference_server_102B.sh
+    ```
+
+
+## HuggingFace model Inference-Server
+
+- First step，modify the script file: examples/run_inference_server_hf.sh
+
+    	`HF_PATH` indicates the storage path for HuggingFace model related files；
+    	`CUDA_VISIBLE_DEVICES` indicates the GPU number used；
+    	`PORT` indicates the port number used by the service, one service occupies one port number, the user can modify it according to the actual situation.
+  
+- Second step,  run the script in the warehouse for deployment
+
+   ```bash
+   bash examples/run_inference_server_hf.sh
+   ```
+   
+- Attention：if running in Windows/CPU, flash_atten needs to be turned off manually, and HuggingFace model file code needs to be modified as follows
+   ```
+   Modify "use_flash_attention" in config.json to false;
+   Comment lines 35 and 36 in yuan_hf_model.py;
+   Modify line 271 in yuan_hf_model.py to inference_hidden_states_memory = torch.empty(bsz, 2, hidden_states.shape[2], dtype=hidden_states.dtype)
+   ```
+
+
+## API Testing
+
+- Testing with Python
+
+Also, we have written a sample code to test the performance of the API calls. Before running, make sure to modify the 'ip' and 'port' in the code according to the API deployment situation.
+
+```bash
+python tools/start_inference_server_api.py
+```
+
+- Testing with Curl
+
+```
+#return the Unicode encoding
+curl http://127.0.0.1:8000/yuan -X PUT   \
+--header 'Content-Type: application/json' \
+--data '{"ques_list":[{"id":"000","ques":"请帮忙作一首诗，主题是冬至"}], "tokens_to_generate":500, "top_k":5}'
+
+# return the original form
+echo -en "$(curl -s  http://127.0.0.1:8000/yuan -X PUT  --header 'Content-Type: application/json' --data '{"ques_list":[{"id":"000","ques":"作一首词 ，主题是冬至"}], "tokens_to_generate":500, "top_k":5}')"
+```
+
--- a/docs/inference_server_cn.md
+++ b/docs/inference_server_cn.md
+# Yuan2.0 推理 API 部署
+
+  - [ckpt模型推理API部署](#ckpt模型推理API部署)
+  - [HuggingFace模型推理API部署](#HuggingFace模型推理API部署)
+  - [API部署效果测试](#API部署效果测试)
+  
+
+## ckpt模型推理API部署
+-  可以通过如下步骤进行部署：
+
+   第一步，修改脚本文件 examples/run_inference_server_~~x~~B.sh
+
+    	`TOKENIZER_MODEL_PATH` 表示TOKENIZER相关文件存放路径；
+    	`CHECKPOINT_PATH` 表示模型相关文件存放路径；
+    	`GPUS_PER_NODE` 表示使用该节点GPU卡数目，该数目应与模型张量并行路数保持一致；
+    	`CUDA_VISIBLE_DEVICES` 表示使用的GPU编号，不同编号之间用逗号隔开，编号数目和应与`GPUS_PER_NODE`保持一致；
+    	`PORT` 表示服务使用端口号，一个服务占用一个端口号，用户可根据实际情况自行修改；
+  
+   第二步，运行仓库中的脚本进行部署：
+
+   ```bash
+   #2.1B模型服务启动命令
+   bash examples/run_inference_server_2.1B.sh
+   
+   #51B模型服务启动命令
+   bash examples/run_inference_server_51B.sh
+   
+   #102B模型服务启动命令
+   bash examples/run_inference_server_102B.sh
+   ```
+
+## HuggingFace模型推理API部署
+- 可以通过如下步骤进行部署
+
+   第一步，修改脚本文件 examples/run_inference_server_hf.sh
+
+    	`HF_PATH` 表示HuggingFace模型相关文件存放路径；
+    	`CUDA_VISIBLE_DEVICES` 表示使用的GPU编号，不同编号之间用逗号隔开；
+    	`PORT` 表示服务使用端口号，一个服务占用一个端口号，用户可根据实际情况自行修改；
+  
+   第二步，运行仓库中的脚本进行部署：
+
+   ```bash
+   bash examples/run_inference_server_hf.sh
+   ```
+   
+- 需要特别注意：若在Windows/CPU中运行，需要手动关闭flash_atten，需要按以下方式修改HuggingFace模型文件代码
+   ```
+   修改 config.json中"use_flash_attention"为 false；
+   注释掉 yuan_hf_model.py中第35、36行；
+   修改yuan_hf_model.py中第271行为 inference_hidden_states_memory = torch.empty(bsz, 2, hidden_states.shape[2], dtype=hidden_states.dtype)
+   ```
+
+
+## API部署效果测试
+
+- 使用Python进行测试
+
+我们还编写了一个示例代码来测试API调用的性能，运行前注意将代码中 `ip`和`port` 根据api部署情况进行修改。
+
+```bash
+python tools/start_inference_server_api.py
+```
+
+- 使用Curl进行测试
+
+```
+#如下命令返回Unicode编码
+curl http://127.0.0.1:8000/yuan -X PUT   \
+--header 'Content-Type: application/json' \
+--data '{"ques_list":[{"id":"000","ques":"请帮忙作一首诗，主题是冬至"}], "tokens_to_generate":500, "top_k":5}'
+
+#如下命令返回原始形式
+echo -en "$(curl -s  http://127.0.0.1:8000/yuan -X PUT  --header 'Content-Type: application/json' --data '{"ques_list":[{"id":"000","ques":"作一首词 ，主题是冬至"}], "tokens_to_generate":500, "top_k":5}')"
+```
+
--- a/docs/instruct_tuning.md
+++ b/docs/instruct_tuning.md
+# Yuan2.0 Supervised Finetuning
+
+## Introduction
+
+This document provides instructions for supervised finetuning (SFT) of Yuan2.0.
+
+
+## Usage
+
+An example script to run Yuan-102B SFT is:
+
+```shell
+bash examples/pretrain_yuan2.0_102B_sft.sh
+```
+
+### Arguments setting
+
+Before running the script, the relevant arguments should be set correctly.
+
+Firstly,  make any desired modifications including setting the environment variables for `CHECKPOINT_PATH`, `DATA_PATH`,  `TOKENIZER_MODEL_PATH ` and `TENSORBOARD_PATH`.
+
+`--train-reset` allows you to begin your training iters from 0.
+`--sft-stage` is highly recommended to be on since it control the calculate of loss mask during SFT.
+`--override-opt-param-scheduler` allows you to set your own scheduler.
+`--finetune` load model for finetuning. do not load optimizer or rng state from checkpoint and set iters to 0. Assumed when loading a release checkpoint.
+
+If the dataset path is:
+
+```
+/path/dataset.bin
+```
+
+The `DATA_PATH` can be set :
+
+```shell
+DATA_PATH='1 /path/dataset'
+```
+
+For dataset preprocesss please refer to [documentation]().
+
+Further command line arguments are described in the source file [`arguments.py`](./megatron/arguments) and [REAMME.md](https://github.com/NVIDIA/Megatron-LM/blob/main/README.md)
+
+