Update README.md

4118d6eb · wxj · c969d6d1 · 4118d6eb
Commit 4118d6eb authored Jan 08, 2025 by wxj
Show whitespace changes
Inline Side-by-side

Showing with 59 additions and 7 deletions

README.md README.md +59 -7

No files found.
--- a/README.md
+++ b/README.md
-最新测试镜像: torch2.4.1-py3.10-dtk25.04-beta-das-alpha
+NVIDIA NeMo 是基于 PyTorch 和 PyTorch Lightning 的一个开源训练框架，源代码完全公开在 GitHub 上。NeMo 的主要目标是使 AI 开发者能够快速构建对话式 AI 模型并开发相关应用。
-该镜像自带transformer_engine1.8
+目前支持GPT类模型的预训练和微调(SFT, lora等)
-git下载该项目
+# 1.docker设置
+最新可用镜像: torch2.4.1-py3.10-dtk25.04-beta-das-alpha(该镜像id是ce83b4a462d9, 自带transformer_engine1.8, 无需额外安装)
+git下载该项目: `git clone http://developer.sourcefind.cn/codes/sugon_wxj/nemo.git`
 启动容器: 
 ```bash
@@ -28,16 +32,64 @@ docker run -it \
 安装依赖
 ```bash
+cd /workspace/nemo
+# 安装依赖和nemo
 cd nemo_dtk25-2.0.0.rc0.beta
 pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple 
 pip install . -i https://pypi.tuna.tsinghua.edu.cn/simple 
+# 安装megatronlm-core
 cd .. && cd Megatron-LM-core_r0.7.0.beta
 pip install . -i https://pypi.tuna.tsinghua.edu.cn/simple 
+```
+# 2.下载模型权重并转换
+去`魔塔`或者`hugging face`下载一个`llama2-7b-hf`的模型权重, 然后用NeMo提供的模型转换方法进行模型转换
+```bash
+python ./NeMo-2.0.0.rc0.beta/scripts/checkpoint_converters/convert_llama_hf_to_nemo.py 
+    --input_name_or_path=./llama2-7b-hf/ 
+    --output_path=./llama2-7b.nemo
+```
+# 3.下载数据集并处理
+去`魔塔`或者`hugging face`下载一个`databricks-dolly-15k`的数据集, 然后用NeMo提供的模型转换方法进行数据集处理
+数据集处理脚本: https://github.com/NVIDIA/NeMo-Framework-Launcher/blob/main/launcher_scripts/nemo_launcher/collections/dataprep_scripts/dolly_dataprep/preprocess.py
+该脚本就是将格式从`{'context': ''}`转为`{'input': '', 'output': ''}`
+```bash
+python ./NeMo-2.0.0.rc0.beta/scripts/dataset_processing/nlp/dolly_dataprep/preprocess.py \
+    --input databricks-dolly-15k/databricks-dolly-15k.jsonl
 ```
+输出文件的第一行示例可能为:
+```bash
+head -n 1 databricks-dolly-15k/databricks-dolly-15k-output.jsonl
+{"input": "Virgin Australia, the trading name of Virgin Australia Airlines Pty Ltd, is an Australian-based airline. It is the largest airline by fleet size to use the Virgin brand. It commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route. It suddenly found itself as a major airline in Australia's domestic market after the collapse of Ansett Australia in September 2001. The airline has since grown to directly serve 32 cities in Australia, from hubs in Brisbane, Melbourne and Sydney.\n\nWhen did Virgin Australia start operating?", "output": "Virgin Australia commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route.", "category": "closed_qa"}
+```
+然后使用数据集划分脚本划分数据集(按80:15:5的比例):
+```bash
+python ./NeMo-2.0.0.rc0.beta/scripts/dataset_processing/nlp/dolly_dataprep/dolly_dataspilt.py \
+    --input ./databricks-dolly-15k/
+```
+最后共有5个json文件
+```bash
+# ls /data/nemo_dataset/databricks-dolly-15k
+databricks-dolly-15k.jsonl
+databricks-dolly-15k-output.jsonl
+training.jsonl
+validation.jsonl
+test.jsonl
+```
+## 4. 运行SFT微调脚本
+修改K100AI_finetune.sh脚本中的MODEL, TRAIN_DS, VALID_DS, TEST_DS等变量为实际目录
 执行微调脚本:
 单机八卡: `bash K100AI_finetune.sh >& K100AI_finetune.log`