update modelzoo std

d344a814 · zhuwenwen · e08694b4 · d344a814 · d344a814
Commit d344a814 authored Aug 24, 2023 by zhuwenwen
Show whitespace changes
Inline Side-by-side

Showing with 61 additions and 21 deletions

README.md README.md +57 -19

model.properties model.properties +4 -2

No files found.
--- a/README.md
+++ b/README.md
@@ -2,24 +2,30 @@
 * @Author: zhuww
 * @email: zhuww@sugon.com
 * @Date: 2023-03-31 17:09:07
- * @LastEditTime: 2023-04-25 14:07:01
+ * @LastEditTime: 2023-08-24 09:07:01
 -->
 # FastFold
-## 模型介绍
-FastFold基于蛋白质结构预测模型,进行推理的性能优化
+## 论文
+- [https://arxiv.org/abs/2203.00854](https://arxiv.org/abs/2203.00854)
+
 ## 模型结构
 模型基于Transformer架构,主要结构包括Evofomer(48 blocks)和Struture module(8 blocks)两个模块。
-## 数据集
-推荐使用AlphaFold2中的开源数据集，包括BFD、MGnify、PDB70、Uniclust、Uniref90等,数据集大小约3TB。

-我们提供了一个脚本download_all_data.sh用于下载使用的数据集和模型文件：
-
-    ./scripts/download_all_data.sh 数据集下载目录
+## 算法原理
+FastFold通过搜索同源序列和模板进行特征构造，基于蛋白质结构预测模型,进行推理的性能优化，预测蛋白质的结构。

-## 推理
-### 环境配置
+## 环境配置
 提供[光源](https://www.sourcefind.cn/#/service-details)拉取推理的docker镜像：
-* 推理镜像：docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:fastfold-0.2.1-centos7.6-dtk-22.10-patch4-py38-latest
+```
+docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:fastfold-0.2.1-centos7.6-dtk-22.10-patch4-py38-latest
+docker run -it --name fastfold --shm-size=32G  --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video image.sourcefind.cn:5000/dcu/admin/base/custom:fastfold-0.2.1-centos7.6-dtk-22.10-patch4-py38-latest /bin/bash
+```
+
+镜像版本依赖：
+* DTK驱动：dtk22.10
+* Pytorch: 1.10
+* fastfold: 0.2.1
+* python: python3.8

 激活镜像环境：
 `source /opt/dtk-22.10/env.sh`
@@ -28,10 +34,16 @@ FastFold基于蛋白质结构预测模型,进行推理的性能优化
 测试目录：
 `/opt/docker/test`

-### 推理
-我们分别提供了基于Pytorch的单体和多体的推理脚本，版本依赖：
-* Pytorch(DCU版本) >= 1.10.0a0
-#### 单体
+## 数据集
+推荐使用AlphaFold2中的开源数据集，包括BFD、MGnify、PDB70、Uniclust、Uniref90等,数据集大小约3TB。
+
+我们提供了一个脚本download_all_data.sh用于下载使用的数据集和模型文件：
+
+    ./scripts/download_all_data.sh 数据集下载目录
+
+## 推理
+我们分别提供了基于Pytorch的单体和多体的推理脚本。
+### 单体

    python inference.py T1024.fasta data/pdb_mmcif/mmcif_files/ \
    --output_dir ./ \
@@ -52,7 +64,7 @@ FastFold基于蛋白质结构预测模型,进行推理的性能优化

 或者使用`./inference.sh`

-##### 单体推理参数说明
+#### 单体推理参数说明
 T1024.fasta为推理的单体序列；data修改为数据集下载目录；
 `--output_dir`为输出目录；`--gpus`为使用的gpu数量；`--use_precomputed_alignments`为搜索对齐目录，可以加载已经搜索对齐的序列，若不添加则进行搜索对齐；
 `--param_path`为加载单体模型路径，需要和`--model_name`保持一致,默认为model_1；`--chunk_size`为分块数量，设置为4，并且使用`--inplace`来降低显存占用；
@@ -62,7 +74,7 @@ T1024.fasta为推理的单体序列；data修改为数据集下载目录；
 Alphafold的数据预处理需要花费大量时间，因此我们通过[ray](https://docs.ray.io/en/latest/workflows/concepts.html)加快了数据预处理工作流程。
 要使用ray工作流运行推理，应将参数--enable_workflow添加到cmdline或`./inference.sh`脚本中。

-#### 多体
+### 多体
    python inference.py SUGP1.fasta data/pdb_mmcif/mmcif_files/ \
    --output_dir ./ \
    --gpus 2 \
@@ -86,10 +98,24 @@ Alphafold的数据预处理需要花费大量时间，因此我们通过[ray](ht

 或者使用`./inference_multimer.sh`

-##### 多体推理参数说明
+#### 多体推理参数说明
 SUGP1.fasta为推理的多体序列；`--param_path`为加载多体模型路径，需要和`--model_name`保持一致，其他参数同单体推理参数说明一致.

-## 准确率数据
+## result
+`--output_dir`目录结构如下：
+```
+alignments/
+    <target_name>/
+        bfd_uniclust_hits.a3m
+        mgnify_hits.sto
+        uniref90_hits.sto
+        ...
+{target_name}_{model_name}_output_dict.pkl
+{target_name}_{model_name}_unrelaxed.pdb
+{target_name}_{model_name}_relaxed.pdb
+```
+
+## 精度
 测试数据：[casp14](https://www.predictioncenter.org/casp14/targetlist.cgi)、[uniprot](https://www.uniprot.org/)，使用的加速卡:4张 DCU 1代-16G

 准确性数据：
@@ -99,8 +125,20 @@ SUGP1.fasta为推理的多体序列；`--param_path`为加载多体模型路径
 | fp32 | 单体 | T1053  | 580  | 0.937 | 0.782 | 92.284 | 0.984 | 0.929 | 1.105 |
 | fp32 | 单体 | Q9NYK1 | 1046 | 0.907 | 0.744 | 86.642 | 0.962 | 0.905 | 5.757 |

+## 应用场景
+
+### 算法类别
+NLP
+
+### 应用行业
+医疗,科研
+
+### 算法框架
+pytorch
+
 ## 源码仓库及问题反馈
 * https://developer.hpccube.com/codes/modelzoo/FastFold
+
 ## 参考
 * [https://github.com/deepmind/alphafold](https://github.com/deepmind/alphafold)
 * [https://github.com/hpcaitech/FastFold](https://github.com/hpcaitech/FastFold)
--- a/model.properties
+++ b/model.properties
+# 模型唯一标识
+modelCode = 101
 # 模型名称
-modelName=FastFold_Pytorch
+modelName=fastfold_pytorch
 # 模型描述
 modelDescription=FastFold基于蛋白质结构预测模型,进行推理的性能优化
 # 应用场景(多个标签以英文逗号分割)
 appScenario=推理,NLP,蛋白质结构预测
 # 框架类型(多个标签以英文逗号分割)
-frameType=PyTorch
+frameType=pytorch