Commit 4a637dad authored by dcuai's avatar dcuai
Browse files

Update README.md ——data部分

parent a1db219c
......@@ -82,8 +82,8 @@ pip install transformers=4.40.1
```
import json
jsonl_file_path = '.../data/dataset_new.jsonl'
json_file_path = '../data/dataset_new.json'
jsonl_file_path = './data/dataset_new.jsonl'
json_file_path = './data/dataset_new.json'
data = []
with open(jsonl_file_path, 'r', encoding='utf-8') as file:
for line in file:
......@@ -101,15 +101,19 @@ with open(json_file_path, 'w', encoding='utf-8') as file:
print(data)
```
项目中已提供用于试验训练的迷你数据集,训练数据目录结构如下,用于正常训练的完整数据集请按此目录结构进行制备:
训练数据目录结构如下,用于正常训练的完整数据集请按此目录结构进行制备,
```
cp LLaMA-Factory/data/dataset_info.json data/
── data
│   ├── computing_part.json
│ ├── consulting_part.json
│ ├── retrieval_part.json
│ └── task_part.json
│ ├── task_part.json
│ └── dataset_info.json
│——————————
```
项目中已提供用于试验训练的迷你数据集,即脚本中的默认数据集路径[LLaMA-Factory/data](https://developer.sourcefind.cn/codes/modelzoo/disc-finllm_pytorch/-/tree/main/data)
## 训练
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment