README.md 5.84 KB
Newer Older
zhuwenwen's avatar
zhuwenwen committed
1
2
3
4
<!--
 * @Author: zhuww
 * @email: zhuww@sugon.com
 * @Date: 2023-06-14 17:07:00
zhuwenwen's avatar
zhuwenwen committed
5
 * @LastEditTime: 2023-10-21 09:00:00
zhuwenwen's avatar
zhuwenwen committed
6
-->
dcuai's avatar
dcuai committed
7
# RFDesign
zhuwenwen's avatar
zhuwenwen committed
8
9
10
## 论文
- [https://www.biorxiv.org/content/10.1101/2021.11.10.468128v2](https://www.biorxiv.org/content/10.1101/2021.11.10.468128v2)

zhuwenwen's avatar
zhuwenwen committed
11
12
13
## 模型结构
RFDesign是一个使用Rosetta软件实现的蛋白质设计方法,模型结构包括特征提取器,用于从蛋白质序列和结构中提取特征的;序列-结构耦合模型,用于将蛋白质的序列信息和结构信息进行耦合,以捕捉它们之间的关联性;功能评估器,用于评估蛋白质的功能性;优化器用于对蛋白质进行优化,以改善其稳定性和功能。

zhuwenwen's avatar
zhuwenwen committed
14
15
![img](./docs/rfdesign.png)

zhuwenwen's avatar
zhuwenwen committed
16
17
## 算法原理
RFDesign基于Rosetta(一个广泛应用于蛋白质结构预测和蛋白质设计的开源软件包)开发,支持蛋白质分子设计任务,使用预先训练的蛋白质模型来预测和优化蛋白质的稳定性和功能。
zhuwenwen's avatar
zhuwenwen committed
18

zhuwenwen's avatar
zhuwenwen committed
19
![img](./docs/rfdesign_1.png)
zhuwenwen's avatar
zhuwenwen committed
20

zhuwenwen's avatar
zhuwenwen committed
21
## 环境配置
zhuwenwen's avatar
zhuwenwen committed
22
提供[光源](https://www.sourcefind.cn/#/service-details)拉取推理的docker镜像:
zhuwenwen's avatar
zhuwenwen committed
23
```
yuhai's avatar
yuhai committed
24
docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:rfdesign-torch2.1.0-dtk24.04-ubuntu20.04-py310
zhuwenwen's avatar
zhuwenwen committed
25
26
27
# <Image ID>用上面拉取docker镜像的ID替换
# <Host Path>主机端路径
# <Container Path>容器映射路径
yuhai's avatar
yuhai committed
28
docker run -it --name rfdesign --shm-size=32G  --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v /opt/hyhal:/opt/hyhal:ro -v <Host Path>:<Container Path> <Image ID> /bin/bash
zhuwenwen's avatar
zhuwenwen committed
29
30
31
```

镜像版本依赖:
yuhai's avatar
yuhai committed
32
* DTK驱动:dtk24.04.1
dcuai's avatar
dcuai committed
33
* Pytorch: 2.1.0
yuhai's avatar
yuhai committed
34
35
* Tensorflow: 2.13.1
* Jax: 0.4.23
zhuwenwen's avatar
zhuwenwen committed
36
* dgl: 0.9.1
yuhai's avatar
yuhai committed
37
* python: python3.10
zhuwenwen's avatar
zhuwenwen committed
38
39

激活镜像环境:
yuhai's avatar
yuhai committed
40
`source /opt/dtk-24.04.1/env.sh`
zhuwenwen's avatar
zhuwenwen committed
41
42

测试目录:
yuhai's avatar
yuhai committed
43
`/RFDesign`
zhuwenwen's avatar
zhuwenwen committed
44

zhuwenwen's avatar
zhuwenwen committed
45
## 数据集
zhuwenwen's avatar
zhuwenwen committed
46

zhuwenwen's avatar
zhuwenwen committed
47

zhuwenwen's avatar
zhuwenwen committed
48
49
## 推理
### 下载权重
zhuwenwen's avatar
zhuwenwen committed
50

yuhai's avatar
yuhai committed
51
    cd /RFDesign/hallucination/weights/rf_Nov05
chenzk's avatar
chenzk committed
52
53
    wget http://files.ipd.uw.edu/pub/rfdesign/weights/BFF_last.pt

zhuwenwen's avatar
zhuwenwen committed
54

yuhai's avatar
yuhai committed
55
    cd /RFDesign/inpainting/weights/
chenzk's avatar
chenzk committed
56
    wget http://files.ipd.uw.edu/pub/rfdesign/weights/BFF_mix_epoch25.pt
zhuwenwen's avatar
zhuwenwen committed
57

zhuwenwen's avatar
zhuwenwen committed
58
### hallucination
zhuwenwen's avatar
zhuwenwen committed
59
60
基于hallucination的测试命令:

yuhai's avatar
yuhai committed
61
62
    cd /RFDesign/hallucination/tests/
    ./run_tests.sh  # 结果默认保存在/RFDesign/hallucination/tests/output
zhuwenwen's avatar
zhuwenwen committed
63

zhuwenwen's avatar
zhuwenwen committed
64
参数说明:--pdb是模板pdb结构,--out是结果保存路径的前缀,--len是hallucination蛋白的长度范围, --contigs是以逗号分隔的pdb范围列表来参考pdb,--steps是逗号分隔的优化步骤数列表,--num是是设计数量
zhuwenwen's avatar
zhuwenwen committed
65

zhuwenwen's avatar
zhuwenwen committed
66
### inpainting
zhuwenwen's avatar
zhuwenwen committed
67
68
基于inpainting的测试命令:

yuhai's avatar
yuhai committed
69
70
    cd /RFDesign/inpainting/tests/
    ./run_tests.sh  # 结果默认保存在/RFDesign/inpainting/tests/out
zhuwenwen's avatar
zhuwenwen committed
71

zhuwenwen's avatar
zhuwenwen committed
72
参数说明:--pdb是模板蛋白质结构(序列)的pdb文件,--out 是结果保存路径的前缀,--contigs是指定保留、移除和修复蛋白质的部分,--num_designs是生成的设计数量
zhuwenwen's avatar
zhuwenwen committed
73

zhuwenwen's avatar
zhuwenwen committed
74
### 准备输入和后处理以及hallucination评分结果
zhuwenwen's avatar
zhuwenwen committed
75
76
运行hallucination或inpainting后,首先生成一个带侧链的松弛模型(.fas、.pdb、.npz、.trb 文件),该步骤需要.pdb和.npz文件,完成后,会生成一个FOLDER/trf_relax文件夹(包含松弛结构的pdb)

yuhai's avatar
yuhai committed
77
    cd /RFDesign/scripts
zhuwenwen's avatar
zhuwenwen committed
78
    ./trfold_relax.sh FOLDER  # FOLDER文件夹包含hallucination或inpainting的结果
zhuwenwen's avatar
zhuwenwen committed
79
80
    

zhuwenwen's avatar
zhuwenwen committed
81
#### AlphaFold2
zhuwenwen's avatar
zhuwenwen committed
82
83
根据hallucination设计模型和模板结构,进行AlphaFold2预测和计算RMSD:

dcuai's avatar
dcuai committed
84

chenzk's avatar
chenzk committed
85
    wget https://storage.googleapis.com/alphafold/alphafold_params_2021-07-14.tar
yuhai's avatar
yuhai committed
86
87
    tar -xvf alphafold_params_2021-07-14.tar
    ./af2_metrics.py FOLDER/trf_relax # 修改第241行的data_dir为自己的alphafold2模型路径(即上两行的下载和解压)
zhuwenwen's avatar
zhuwenwen committed
88
89
90

该步骤会将AF2模型输出到FOLDER/trf_relax/af2/,并将指标输出到FOLDER/af2_metrics.csv

zhuwenwen's avatar
zhuwenwen committed
91
#### Pyrosetta指标
zhuwenwen's avatar
zhuwenwen committed
92
93
94
95
96

    ./pyrosetta_metrics.py FOLDER/trf_relax

该步骤会计算hallucination(RoseTTAFold)设计模型和参考结构之间的RMSD,以及回转半径、二级结构、拓扑结构(即HHH或HEEH)

zhuwenwen's avatar
zhuwenwen committed
97
#### 在PyMOL中对齐模型
zhuwenwen's avatar
zhuwenwen committed
98
99
100
101
102
103
使设计与受限区域上的参考结构对齐的pymol会话:

    ./pymol_align.py -- -o OUTPUT.pse FOLDER/*pdb

该步骤会在当前文件夹中创建一个名为OUTPUT.pse的会话,其中包含来自REFERENCE.pdb的原始结构,所有设计都与FOLDER/*.pdb对齐

zhuwenwen's avatar
zhuwenwen committed
104
## result
zhuwenwen's avatar
zhuwenwen committed
105
106

```
yuhai's avatar
yuhai committed
107
/RFDesign/
zhuwenwen's avatar
zhuwenwen committed
108
    hallucination/
zhuwenwen's avatar
zhuwenwen committed
109
110
111
112
113
114
115
116
117
118
119
120
121
122
        tests/output/
                trf_relax/
                    af2/
                        test1_0.npz
                        test1_0.pdb
                        ...
                    af2_metrics.csv
                    pyrosetta_metrics.csv
                    test1_0.pdb
                    ...
                test1_0.fas
                test1_0.pdb
                ...

zhuwenwen's avatar
zhuwenwen committed
123
    inpainting/
zhuwenwen's avatar
zhuwenwen committed
124
125
126
127
128
129
130
131
132
133
134
135
136
        tests/out/
            trf_relax/
                af2/
                    2KL8_test_0.npz
                    2KL8_test_0.pdb
                    ...
                af2_metrics.csv
                pyrosetta_metrics.csv
                2KL8_test_0.pdb
                ...
            2KL8_test_0.npz
            2KL8_test_0.pdb
            ...
zhuwenwen's avatar
zhuwenwen committed
137
```
zhuwenwen's avatar
zhuwenwen committed
138
139

## 精度
zhuwenwen's avatar
zhuwenwen committed
140
141
测试数据:`/opt/RFDesign/hallucination/tests``/opt/RFDesign/inpainting/tests/2KL8.pdb`,使用的加速卡:1张 DCU Z100L-32G

zhuwenwen's avatar
zhuwenwen committed
142
准确率数据:
zhuwenwen's avatar
zhuwenwen committed
143
144
145
146
147
148
149
| pdb | af2_lddt | rmsd_af2_des | contig_rmsd_af2_des | contig_rmsd_af2 |
| :------: | :------: | :------: | :------: | :------: |
| C3d_relaxed | 63.826 | 18.537 | 16.375 | 16.435 | 
| pd1 | 48.188 | 5.731 | 3.110 | 3.527 | 
| rsvf-v_5tpn | 75.460 | 2.685 | 1.536 | 3.917 |  
| 2KL8 | 89.197 | 0.813 | 0.824 | 0.865 |

zhuwenwen's avatar
zhuwenwen committed
150
151
152
## 应用场景

### 算法类别
zhuwenwen's avatar
zhuwenwen committed
153
蛋白质结构预测
zhuwenwen's avatar
zhuwenwen committed
154
155
156
157

### 热点应用行业
医疗,科研,教育

zhuwenwen's avatar
zhuwenwen committed
158
159
160
## 源码仓库及问题反馈
* [https://developer.hpccube.com/codes/modelzoo/rfdesign_rosetta](https://developer.hpccube.com/codes/modelzoo/rfdesign_rosetta) 

zhuwenwen's avatar
zhuwenwen committed
161
## 参考资料
zhuwenwen's avatar
zhuwenwen committed
162
* [https://github.com/RosettaCommons/RFDesign](https://github.com/RosettaCommons/RFDesign)