Commit 15970ada authored by Rayyyyy's avatar Rayyyyy
Browse files

Update README

parent 45a7867b
# Contributors
None
\ No newline at end of file
# llama3 # deeoseek-v2
## 论文 ## 论文
[deepseek-v2](https://arxiv.org/abs/2405.04434) [deepseek-v2](https://arxiv.org/abs/2405.04434)
## 模型结构 ## 模型结构
DeepSeek-V2对模型框架进行了全方位的创新,提出了媲美MHA的MLA(Multi-head Latent Attention)架构,大幅减少计算量和推理显存;自研Sparse结构DeepSeekMoE进一步将计算量降低到极致,两者结合最终实现模型性能跨级别的提升。
<div align=center> <div align=center>
<img src="./doc/model.png"/> <img src="./doc/model.png"/>
</div> </div>
## 算法原理
DeepSeek-V2对模型框架进行了全方位的创新,提出了媲美MHA的MLA(Multi-head Latent Attention)架构,大幅减少计算量和推理显存;自研Sparse结构DeepSeekMoE进一步将计算量降低到极致,两者结合最终实现模型性能跨级别的提升。
## 环境配置 ## 环境配置
-v 路径、docker_name和imageID根据实际情况修改 -v 路径、docker_name和imageID根据实际情况修改
### Docker(方法一) ### Docker(方法一)
```bash ```bash
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk24.04-py310 docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk24.04-py310
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=80G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=80G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
...@@ -26,10 +25,9 @@ export HF_ENDPOINT=https://hf-mirror.com ...@@ -26,10 +25,9 @@ export HF_ENDPOINT=https://hf-mirror.com
``` ```
### Dockerfile(方法二) ### Dockerfile(方法二)
```bash ```bash
cd docker cd docker
docker build --no-cache -t llama3:latest . docker build --no-cache -t deepseek-v2:latest .
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=80G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=80G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
cd /your_code_path/deepseek-v2_pytorch cd /your_code_path/deepseek-v2_pytorch
...@@ -61,7 +59,7 @@ export HF_ENDPOINT=https://hf-mirror.com ...@@ -61,7 +59,7 @@ export HF_ENDPOINT=https://hf-mirror.com
暂无 暂无
## 推理 ## 推理
基于Huggingface's Transformers进行推理,根据本地模型地址设置`model_name_or_path`参数。 基于**Huggingface's Transformers**进行推理,根据本地模型地址设置`model_name_or_path`参数。
如未下载预训练模型,代码会根据选择自动进行下载,当前可用模型为:"deepseek-ai/DeepSeek-V2-Lite"、"deepseek-ai/DeepSeek-V2-Lite-Chat"。 如未下载预训练模型,代码会根据选择自动进行下载,当前可用模型为:"deepseek-ai/DeepSeek-V2-Lite"、"deepseek-ai/DeepSeek-V2-Lite-Chat"。
...@@ -81,6 +79,12 @@ export USE_MIOPEN_BATCHNORM=1 ...@@ -81,6 +79,12 @@ export USE_MIOPEN_BATCHNORM=1
python chat_completion.py python chat_completion.py
``` ```
## result
</div>
<img src="./doc/chat_completion.png" width="500" height="300"/>
</div>
### 精度 ### 精度
暂无 暂无
...@@ -92,6 +96,8 @@ python chat_completion.py ...@@ -92,6 +96,8 @@ python chat_completion.py
金融,广媒,教育 金融,广媒,教育
## 预训练权重 ## 预训练权重
[Huggingface-deepseek-ai](https://huggingface.co/deepseek-ai)
模型目录结构如下: 模型目录结构如下:
```bash ```bash
├── model_save_path ├── model_save_path
...@@ -101,10 +107,11 @@ python chat_completion.py ...@@ -101,10 +107,11 @@ python chat_completion.py
│ ├── config.json │ ├── config.json
│ ├── configuration_deepseek.py │ ├── configuration_deepseek.py
│ ├── generation_config.json │ ├── generation_config.json
│ ├── model-00001-of-000004.safetensors │ ├── model-00001-of-000055.safetensors
│ ├── model-00002-of-000004.safetensors │ ├── model-00002-of-000055.safetensors
│ ├── model-00003-of-000004.safetensors │ ...
│ ├── model-00004-of-000004.safetensors │ ├── model-00054-of-000055.safetensors
│ ├── model-00055-of-000055.safetensors
│ ├── model.safetensors.index.json │ ├── model.safetensors.index.json
│ ├── modeling_deepseek.py │ ├── modeling_deepseek.py
│ ├── tokenization_deepseek_fast.py │ ├── tokenization_deepseek_fast.py
...@@ -116,11 +123,10 @@ python chat_completion.py ...@@ -116,11 +123,10 @@ python chat_completion.py
│ ├── config.json │ ├── config.json
│ ├── configuration_deepseek.py │ ├── configuration_deepseek.py
│ ├── generation_config.json │ ├── generation_config.json
│ ├── model-00001-of-000055.safetensors │ ├── model-00001-of-000004.safetensors
│ ├── model-00002-of-000055.safetensors │ ├── model-00002-of-000004.safetensors
│ ├── model-00054-of-000055.safetensors │ ├── model-00003-of-000004.safetensors
│ ... │ ├── model-00004-of-000004.safetensors
│ ├── model-00055-of-000055.safetensors
│ ├── model.safetensors.index.json │ ├── model.safetensors.index.json
│ ├── modeling_deepseek.py │ ├── modeling_deepseek.py
│ ├── tokenization_deepseek_fast.py │ ├── tokenization_deepseek_fast.py
......
FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk24.04-py310
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment