Update 星辰语义大模型-TeleChat.md

Deleted README.md

Update 星辰语义大模型-TeleChat.md
Deleted README.md
8bda55ee · lvzhen · a894bcdd · a894bcdd · 8bda55ee
Commit 8bda55ee authored Aug 22, 2024 by lvzhen
Hide whitespace changes
Inline Side-by-side

Showing with 112 additions and 2 deletions

README.md README.md +0 -2

星辰语义大模型-TeleChat.md 星辰语义大模型-TeleChat.md +112 -0

No files found.
--- a/README.md
+++ b/README.md
-# TeleChat
--- a/星辰语义大模型-TeleChat.md
+++ b/星辰语义大模型-TeleChat.md
+# 星辰语义大模型-TeleChat
+## 论文
+[TeleChat Technical Report](https://arxiv.org/abs/2401.03804)
+## 模型结构
+采用标准的 `Decoder-only` 结构设计了 **TeleChat** 模型，并在模型维度做了如下的一些改进：
+- **位置编码**：使用 [Rotary Embedding](https://arxiv.org/pdf/2104.09864.pdf) 的位置编码方法，该方法将相对位置信息依赖集成到 self-attention 中，并且具有较好的位置外推性。Rotary Embedding还可以较好地与Flash-Attention v2 配合使用，将模型的训练速度提升约20%。
+- **激活函数**：使用 [SwiGLU](https://arxiv.org/pdf/2002.05202.pdf) 激活函数来替代GELU激活函数 , 为了减少计算量，将`ffn_hidden_size`设置为小于原始SwiGLU中的4倍隐藏层大小。
+- **层标准化**: 基于 [RMSNorm](https://arxiv.org/abs/1910.07467) 的 Pre-Normalization。
+- **词嵌入层与输出层解耦**：将**TeleChat-12B**的词嵌入层和输出lm head层参数分开，有助于增强训练稳定性和收敛性。
+|      | layer_num | hidden_size | ffn_hidden_size | head_num | tie_word_embeddings |
+| ---- | --------- | ----------- | --------------- | -------- | ------------------- |
+| 1B   | 16        | 2048        | 5460            | 32       | 否                  |
+| 7B   | 30        | 4096        | 12288           | 32       | 是                  |
+| 12B  | 38        | 5120        | 12288           | 32       | 否                  |
+## 环境配置
+### Docker方法
+```
+拉取镜像：
+docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
+创建并启动容器：
+docker run --shm-size 80g --network=host --name=telechat --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined  -v /opt/hyhal:/opt/hyhal:ro -v <Host Path>:<Container Path> -it <Your Image ID> bash
+安装依赖：
+cd TeleChat
+pip install -r requirements.txt -i https://pypi.mirrors.ustc.edu.cn/simple/
+pip install 'ms-swift[llm]' -U -i https://pypi.mirrors.ustc.edu.cn/simple/
+pip install optimum -i https://pypi.mirrors.ustc.edu.cn/simple/
+pip install auto-gptq -i https://pypi.mirrors.ustc.edu.cn/simple/
+```
+## 推理测试
+### TeleChat-7B
+进入Telechat/inference_telechat
+```
+bash infer.sh
+```
+### TeleChat-12B/TeleChat-12B-V2
+修改模型文件路径里面的config.json，把flash_attn改为false
+进入Telechat/inference_telechat
+```
+bash infer.sh
+```
+## 微调训练
+```
+cd Telechat/ms-swift/examples/pytorch/llm
+```
+### 单机单卡LORA微调
+```
+bash sft_single_lora.sh
+```
+### 单机多卡LORA微调
+```
+bash sft_multi_lora.sh
+```
+### 单机多卡全参微调
+```
+bash sft_multi_full.sh
+```
+### 训练后的推理
+```
+与微调训练同文件夹下
+bash infer.sh
+```
+结果如下：
+![image-20240821162322664](image-20240821162322664.png)
+## 量化
+### GPTQ量化
+```
+HIP_VISIBLE_DEVICES=0 swift export --model_type telechat-7b \
+    --quant_bits 8 --quant_method gptq --model_id_or_path /path/to/telechat-7b \
+    --quant_output_dir ./quant_out
+```
+### 量化后推理
+```
+HSA_FORCE_FINE_GRAIN_PCIE=1 swift infer --model_type telechat-7b --model_id_or_path quant_out/
+```
+![image-20240822083141593](image-20240822083141593.png)