README.md 7.38 KB
Newer Older
zhuwenwen's avatar
zhuwenwen committed
1
# AF2
mashun1's avatar
mashun1 committed
2

zhuwenwen's avatar
zhuwenwen committed
3
4
## 论文
- [https://www.nature.com/articles/s41586-021-03819-2](https://www.nature.com/articles/s41586-021-03819-2)
Augustin-Zidek's avatar
Augustin-Zidek committed
5

zhuwenwen's avatar
zhuwenwen committed
6
7
## 模型结构
模型核心是一个基于Transformer架构的神经网络,包括两个主要组件:Sequence to Sequence Model和Structure Model,这两个组件通过迭代训练进行优化,以提高其预测准确性。
Augustin-Zidek's avatar
Augustin-Zidek committed
8

zhuwenwen's avatar
zhuwenwen committed
9
![img](./docs/alphafold2.png)
Augustin-Zidek's avatar
Augustin-Zidek committed
10

zhuwenwen's avatar
zhuwenwen committed
11
12
## 算法原理
AlphaFold2通过从蛋白质序列和结构数据中提取信息,使用神经网络模型来预测蛋白质三维结构。
Augustin Zidek's avatar
Augustin Zidek committed
13

zhuwenwen's avatar
zhuwenwen committed
14
![img](./docs/alphafold2_1.png)
Augustin Zidek's avatar
Augustin Zidek committed
15

mashun1's avatar
mashun1 committed
16

mashun1's avatar
mashun1 committed
17
18
19
20
21
## 环境配置

### Docker(方法一)

    # 使用该方法不需要下载本仓库,镜像中已包含可运行代码,但需要挂载相应的数据文件
mashun1's avatar
mashun1 committed
22

mashun1's avatar
mashun1 committed
23
24
25
26
27
    docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:alphafold2-dtk24.04.1-py310

    docker run --shm-size 100g --network=host --name=alphafold2 --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 本地数据地址:镜像数据地址 -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash

### Docker(方法二)
mashun1's avatar
mashun1 committed
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
    
    docker pull image.sourcefind.cn:5000/dcu/admin/base/jax:0.4.23-ubuntu20.04-dtk24.04.1-py3.10

    docker run --shm-size 50g --network=host --name=alphafold2 --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash

    # 1. 一般依赖项安装
    pip install -r requirements_dcu.txt

    pip install dm-haiku==0.0.11 flax==0.7.1 jmp==0.0.2 tabulate==0.8.9 --no-deps jax

    pip install orbax==0.1.6 orbax-checkpoint==0.1.6 optax==0.2.2

    python setup.py install

    # 2、hh-suite 

    git clone https://github.com/soedinglab/hh-suite.git
    mkdir -p hh-suite/build && cd hh-suite/build
    cmake -DCMAKE_INSTALL_PREFIX=. ..
    make -j 4 && make install
    export PATH="$(pwd)/bin:$(pwd)/scripts:$PATH"

    wget https://github.com/TimoLassmann/kalign/archive/refs/tags/v3.4.0.zip
mashun1's avatar
mashun1 committed
51
    unzip v3.4.0.zip && cd kalign-3.4.0
mashun1's avatar
mashun1 committed
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
    mkdir build 
    cd build
    cmake .. 
    make 
    make test 
    make install

    # 3. openmm + pdbfixer

    sudo apt install doxygen

    wget https://github.com/openmm/openmm/archive/refs/tags/8.0.0.zip

    unzip 8.0.0.zip && cd openmm-8.0.0 && mkdir build && cd build

    cmake .. && make && sudo make install && sudo make PythonInstall

    wget https://github.com/openmm/pdbfixer/archive/refs/tags/1.9.zip

mashun1's avatar
mashun1 committed
71
    unzip 1.9.zip && cd pdbfixer-1.9 && python setup.py install 
mashun1's avatar
mashun1 committed
72

mashun1's avatar
mashun1 committed
73
74
    sudo apt install hmmer -y

Augustin-Zidek's avatar
Augustin-Zidek committed
75

zhuwenwen's avatar
zhuwenwen committed
76
77
## 数据集
推荐使用AlphaFold2中的开源数据集,包括BFD、MGnify、PDB70、Uniclust、Uniref90等,数据集大小约2.62TB。数据集格式如下:
Augustin-Zidek's avatar
Augustin-Zidek committed
78
```
zhuwenwen's avatar
zhuwenwen committed
79
80
81
82
83
84
85
$DOWNLOAD_DIR/                             
    bfd/  
        bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_hhm.ffindex
        bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_hhm.ffdata 
        bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_cs219.ffindex                           
        ...
    mgnify/                                
Augustin Zidek's avatar
Augustin Zidek committed
86
        mgy_clusters_2022_05.fa
zhuwenwen's avatar
zhuwenwen committed
87
88
89
90
91
92
93
94
95
96
97
    params/                                
        params_model_1.npz
        params_model_2.npz
        params_model_3.npz
        ...
    pdb70/                                
        pdb_filter.dat
        pdb70_hhm.ffindex
        pdb70_hhm.ffdata
        ...
    pdb_mmcif/                            
Augustin-Zidek's avatar
Augustin-Zidek committed
98
        mmcif_files/
zhuwenwen's avatar
zhuwenwen committed
99
100
101
102
            100d.cif
            101d.cif
            101m.cif
            ...
Augustin-Zidek's avatar
Augustin-Zidek committed
103
        obsolete.dat
zhuwenwen's avatar
zhuwenwen committed
104
    pdb_seqres/                            
105
        pdb_seqres.txt
zhuwenwen's avatar
zhuwenwen committed
106
    small_bfd/                           
107
        bfd-first_non_consensus_sequences.fasta
zhuwenwen's avatar
zhuwenwen committed
108
109
110
111
112
113
    uniref30/                            
        UniRef30_2021_03_hhm.ffindex
        UniRef30_2021_03_hhm.ffdata
        UniRef30_2021_03_cs219.ffindex
        ...
    uniprot/                               
114
        uniprot.fasta
zhuwenwen's avatar
zhuwenwen committed
115
    uniref90/                             
Augustin-Zidek's avatar
Augustin-Zidek committed
116
117
118
        uniref90.fasta
```

zhuwenwen's avatar
zhuwenwen committed
119
此处提供了一个脚本download_all_data.sh用于下载使用的数据集和模型文件:
Augustin-Zidek's avatar
Augustin-Zidek committed
120

zhuwenwen's avatar
zhuwenwen committed
121
    ./scripts/download_all_data.sh 数据集下载目录
122

chenzk's avatar
chenzk committed
123
124
125
数据集快速下载中心:[SCNet AIDatasets](http://113.200.138.88:18080/aidatasets) ,项目中数据集可从快速下载通道下载:[alphafold](http://113.200.138.88:18080/aidatasets/project-dependency/alphafold)


zhuwenwen's avatar
zhuwenwen committed
126
## 推理
mashun1's avatar
mashun1 committed
127
128
129

注意:在运行前请修改相应脚本中的参数。

zhuwenwen's avatar
zhuwenwen committed
130
分别提供了基于Jax的单体和多体的推理脚本.
131
```bash
zhuwenwen's avatar
zhuwenwen committed
132
    # 进入工程目录
zhuwenwen's avatar
zhuwenwen committed
133
    cd alphafold2_jax
134
135
```

zhuwenwen's avatar
zhuwenwen committed
136
### 单体
137
```bash
zhuwenwen's avatar
zhuwenwen committed
138
    ./run_monomer.sh
139
```
zhuwenwen's avatar
zhuwenwen committed
140
单体推理参数说明:download_dir为数据集下载目录,monomer.fasta为推理的单体序列;`--output_dir`为输出目录;`model_names`为推理的模型名称,`--model_preset=monomer`为单体模型配置;`--run_relax=true`为进行relax操作;`--use_gpu_relax=true`为使用gpu进行relax操作(速度更快,但可能不太稳定),`--use_gpu_relax=false`为使用CPU进行relax操作(速度慢,但稳定);若添加--use_precomputed_msas=true则可以加载已有的MSAs,否则默认运行MSA工具。
141

zhuwenwen's avatar
zhuwenwen committed
142
### 多体
143
```bash
zhuwenwen's avatar
zhuwenwen committed
144
    ./run_multimer.sh
145
```
zhuwenwen's avatar
zhuwenwen committed
146
多体推理参数说明:multimer.fasta为推理的多体序列,`--model_preset=multimer`为多体模型配置;`--num_multimer_predictions_per_model`为每个模型预测数量,其他参数同单体推理参数说明一致。
147

zhuwenwen's avatar
zhuwenwen committed
148
149
## result
`--output_dir`目录结构如下:
Augustin-Zidek's avatar
Augustin-Zidek committed
150
```
151
<target_name>/
Augustin-Zidek's avatar
Augustin-Zidek committed
152
153
154
155
156
157
158
159
    features.pkl
    ranked_{0,1,2,3,4}.pdb
    ranking_debug.json
    relaxed_model_{1,2,3,4,5}.pdb
    result_model_{1,2,3,4,5}.pkl
    timings.json
    unrelaxed_model_{1,2,3,4,5}.pdb
    msas/
zhuwenwen's avatar
zhuwenwen committed
160
        bfd_uniclust_hits.a3m
Augustin-Zidek's avatar
Augustin-Zidek committed
161
162
        mgnify_hits.sto
        uniref90_hits.sto
zhuwenwen's avatar
zhuwenwen committed
163
        ...
Augustin-Zidek's avatar
Augustin-Zidek committed
164
165
```

zhuwenwen's avatar
zhuwenwen committed
166
[查看蛋白质3D结构](https://www.pdbus.org/3d-view)
mashun1's avatar
mashun1 committed
167
168
169
170
171
172

ID: 8U23

蓝色的为预测结构,黄色为真实结构

![alt text](image.png)
Augustin-Zidek's avatar
Augustin-Zidek committed
173

chenzk's avatar
chenzk committed
174
### 精度
zhuwenwen's avatar
zhuwenwen committed
175
测试数据:[casp15](https://www.predictioncenter.org/casp15/targetlist.cgi)[uniprot](https://www.uniprot.org/)
zhuwenwen's avatar
zhuwenwen committed
176
使用的加速卡:1张 Z100L-32G
Augustin-Zidek's avatar
Augustin-Zidek committed
177

zhuwenwen's avatar
zhuwenwen committed
178
179
180
181
182
183
1、plddts/iptm+ptm

单体见<target_name>/ranking_debug.json中的`plddts`,多体见<target_name>/ranking_debug.json中的`iptm+ptm`


2、其它精度值计算:[https://zhanggroup.org/TM-score/](https://zhanggroup.org/TM-score/)
184

zhuwenwen's avatar
zhuwenwen committed
185
准确性数据:
zhuwenwen's avatar
zhuwenwen committed
186
| 数据类型 | 序列类型 | 序列 | 长度 | GDT-TS | GDT-HA | plddts/iptm+ptm | TM score | MaxSub | RMSD |
zhuwenwen's avatar
zhuwenwen committed
187
188
189
190
| :------: | :------: | :------: |:------: |:------: | :------: | :------: | :------: |:------: |:------: |
| fp32 | 单体 | T1029 | 125 | 0.434 | 0.256 | 93.984 | 0.471 | 0.297 | 7.202 |
| fp32 | 单体 | T1024 | 408 | 0.664 | 0.470 | 87.076 | 0.829 | 0.518 | 3.516 |
| fp32 | 多体 | H1106 | 236 | 0.203 | 0.144 | 0.860 | 0.181 | 0.151 | 20.457 |
Augustin Zidek's avatar
Augustin Zidek committed
191

mashun1's avatar
mashun1 committed
192
193


zhuwenwen's avatar
zhuwenwen committed
194
## 应用场景
Augustin Zidek's avatar
Augustin Zidek committed
195

zhuwenwen's avatar
zhuwenwen committed
196
### 算法类别
chenzk's avatar
chenzk committed
197
蛋白质预测
Augustin Zidek's avatar
Augustin Zidek committed
198

zhuwenwen's avatar
zhuwenwen committed
199
200
### 热点应用行业
医疗,科研,教育
Augustin Zidek's avatar
Augustin Zidek committed
201

chenzk's avatar
chenzk committed
202
203
204
## 预训练权重
预训练权重快速下载中心:[SCNet AIModels](http://113.200.138.88:18080/aimodels) ,项目中的预训练权重可从快速下载通道下载:[alphafold](http://113.200.138.88:18080/aimodels/findsource-dependency/alphafold-params)

zhuwenwen's avatar
zhuwenwen committed
205
206
## 源码仓库及问题反馈
* [https://developer.hpccube.com/codes/modelzoo/alphafold2_jax](https://developer.hpccube.com/codes/modelzoo/alphafold2_jax)
DeepMind's avatar
DeepMind committed
207

zhuwenwen's avatar
zhuwenwen committed
208
## 参考
mashun1's avatar
mashun1 committed
209
* [https://github.com/deepmind/alphafold](https://github.com/deepmind/alphafold)