Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
Telechat_pytorch
Commits
8bda55ee
Commit
8bda55ee
authored
Aug 22, 2024
by
lvzhen
Browse files
Update 星辰语义大模型-TeleChat.md
Deleted README.md
parent
a894bcdd
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
112 additions
and
2 deletions
+112
-2
README.md
README.md
+0
-2
星辰语义大模型-TeleChat.md
星辰语义大模型-TeleChat.md
+112
-0
No files found.
README.md
deleted
100644 → 0
View file @
a894bcdd
# TeleChat
星辰语义大模型-TeleChat.md
0 → 100644
View file @
8bda55ee
# 星辰语义大模型-TeleChat
## 论文
[
TeleChat Technical Report
](
https://arxiv.org/abs/2401.03804
)
## 模型结构
采用标准的
`Decoder-only`
结构设计了
**TeleChat**
模型,并在模型维度做了如下的一些改进:
-
**位置编码**
:使用
[
Rotary Embedding
](
https://arxiv.org/pdf/2104.09864.pdf
)
的位置编码方法,该方法将相对位置信息依赖集成到 self-attention 中,并且具有较好的位置外推性。Rotary Embedding还可以较好地与Flash-Attention v2 配合使用,将模型的训练速度提升约20%。
-
**激活函数**
:使用
[
SwiGLU
](
https://arxiv.org/pdf/2002.05202.pdf
)
激活函数来替代GELU激活函数 , 为了减少计算量,将
`ffn_hidden_size`
设置为小于原始SwiGLU中的4倍隐藏层大小。
-
**层标准化**
: 基于
[
RMSNorm
](
https://arxiv.org/abs/1910.07467
)
的 Pre-Normalization。
-
**词嵌入层与输出层解耦**
:将
**TeleChat-12B**
的词嵌入层和输出lm head层参数分开,有助于增强训练稳定性和收敛性。
| | layer_num | hidden_size | ffn_hidden_size | head_num | tie_word_embeddings |
| ---- | --------- | ----------- | --------------- | -------- | ------------------- |
| 1B | 16 | 2048 | 5460 | 32 | 否 |
| 7B | 30 | 4096 | 12288 | 32 | 是 |
| 12B | 38 | 5120 | 12288 | 32 | 否 |
## 环境配置
### Docker方法
```
拉取镜像:
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
创建并启动容器:
docker run --shm-size 80g --network=host --name=telechat --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /opt/hyhal:/opt/hyhal:ro -v <Host Path>:<Container Path> -it <Your Image ID> bash
安装依赖:
cd TeleChat
pip install -r requirements.txt -i https://pypi.mirrors.ustc.edu.cn/simple/
pip install 'ms-swift[llm]' -U -i https://pypi.mirrors.ustc.edu.cn/simple/
pip install optimum -i https://pypi.mirrors.ustc.edu.cn/simple/
pip install auto-gptq -i https://pypi.mirrors.ustc.edu.cn/simple/
```
## 推理测试
### TeleChat-7B
进入Telechat/inference_telechat
```
bash infer.sh
```
### TeleChat-12B/TeleChat-12B-V2
修改模型文件路径里面的config.json,把flash_attn改为false
进入Telechat/inference_telechat
```
bash infer.sh
```
## 微调训练
```
cd Telechat/ms-swift/examples/pytorch/llm
```
### 单机单卡LORA微调
```
bash sft_single_lora.sh
```
### 单机多卡LORA微调
```
bash sft_multi_lora.sh
```
### 单机多卡全参微调
```
bash sft_multi_full.sh
```
### 训练后的推理
```
与微调训练同文件夹下
bash infer.sh
```
结果如下:

## 量化
### GPTQ量化
```
HIP_VISIBLE_DEVICES=0 swift export --model_type telechat-7b \
--quant_bits 8 --quant_method gptq --model_id_or_path /path/to/telechat-7b \
--quant_output_dir ./quant_out
```
### 量化后推理
```
HSA_FORCE_FINE_GRAIN_PCIE=1 swift infer --model_type telechat-7b --model_id_or_path quant_out/
```

Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment