README.md 3.65 KB
Newer Older
zhangqha@sugon.com's avatar
zhangqha@sugon.com committed
1
# Uni-Fold
2

zhangqha@sugon.com's avatar
zhangqha@sugon.com committed
3
## 论文
zhangqha's avatar
zhangqha committed
4

zhangqha@sugon.com's avatar
zhangqha@sugon.com committed
5
6
Uni-Fold: An Open-Source Platform for Developing Protein Folding Models beyond AlphaFold
https://www.biorxiv.org/content/biorxiv/early/2022/08/06/2022.08.04.502811.full.pdf
zhangqha's avatar
zhangqha committed
7
## 模型结构
zhangqha@sugon.com's avatar
zhangqha@sugon.com committed
8
模型核心是一个基于Transformer架构的神经网络,包括两个主要组件:Sequence to Sequence Model和Structure Model,这两个组件通过迭代训练进行优化,以提高其预测准确性。
zhangqha's avatar
zhangqha committed
9

zhangqha@sugon.com's avatar
zhangqha@sugon.com committed
10
![img](./alphafold2.png)
zhangqha's avatar
zhangqha committed
11

zhangqha@sugon.com's avatar
zhangqha@sugon.com committed
12
13
## 算法原理
通过从蛋白质序列和结构数据中提取信息,使用神经网络模型来预测蛋白质三维结构。
zhangqha's avatar
zhangqha committed
14

zhangqha@sugon.com's avatar
zhangqha@sugon.com committed
15
![img](./alphafold2_1.png)
zhangqha's avatar
zhangqha committed
16

zhangqha@sugon.com's avatar
zhangqha@sugon.com committed
17
## 环境配置
18

zhangqha's avatar
zhangqha committed
19
提供[光源](https://www.sourcefind.cn/#/service-details)拉取的训练的docker镜像:
20

zhangqha's avatar
zhangqha committed
21
```
zhangqha's avatar
zhangqha committed
22
docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:unifold-latest
zhangqha@sugon.com's avatar
zhangqha@sugon.com committed
23
docker run -it -v /path/your_code_data/:/path/your_code_data/ --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
24

zhangqha's avatar
zhangqha committed
25
cd /root/Uni-Fold-main
zhangqha's avatar
zhangqha committed
26
```
zhangqha's avatar
zhangqha committed
27
安装requirement.txt中的工具,镜像中已经安装好,加载方式
zhangqha's avatar
zhangqha committed
28
```
zhangqha's avatar
zhangqha committed
29
export PATH=/root/software/hmmer/bin${PATH:+:${PATH}}
30

zhangqha's avatar
zhangqha committed
31
export PATH=/root/software/hh-suite-master/bin${PATH:+:${PATH}}
32

zhangqha's avatar
zhangqha committed
33
export PATH=/root/software/kalign/bin${PATH:+:${PATH}}
34

zhangqha's avatar
zhangqha committed
35
export LD_LIBRARY_PATH=/root/software/hh-suite-master/lib${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
zhangqha's avatar
zhangqha committed
36
```
zhangqha@sugon.com's avatar
zhangqha@sugon.com committed
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
## 数据集
推荐使用AlphaFold2中的开源数据集,包括BFD、MGnify、PDB70、Uniclust、Uniref90等,数据集大小约2.62TB。数据集格式如下:
```
$DOWNLOAD_DIR/                             
    bfd/  
        bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_hhm.ffindex
        bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_hhm.ffdata 
        bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_cs219.ffindex                           
        ...
    mgnify/                                
        mgy_clusters_2022_05.fa
    params/                                
        params_model_1.npz
        params_model_2.npz
        params_model_3.npz
        ...
    pdb70/                                
        pdb_filter.dat
        pdb70_hhm.ffindex
        pdb70_hhm.ffdata
        ...
    pdb_mmcif/                            
        mmcif_files/
            100d.cif
            101d.cif
            101m.cif
            ...
        obsolete.dat
    pdb_seqres/                            
        pdb_seqres.txt
    small_bfd/                           
        bfd-first_non_consensus_sequences.fasta
    uniref30/                            
        UniRef30_2021_03_hhm.ffindex
        UniRef30_2021_03_hhm.ffdata
        UniRef30_2021_03_cs219.ffindex
        ...
    uniprot/                               
        uniprot.fasta
    uniref90/                             
        uniref90.fasta
```
此处提供了一个脚本download_all_data.sh用于下载使用的数据集和模型文件:
```
bash scripts/download/download_all_data.sh /path/to/database/directory
```

## 推理

zhangqha's avatar
zhangqha committed
86
### 安装
87
#### 安装Uni-Core-main(如使用镜像,则无需再次安装)
zhangqha's avatar
zhangqha committed
88
```
zhangqha's avatar
zhangqha committed
89
cd Uni-Core-main
90

zhangqha's avatar
zhangqha committed
91
export CUDA_HOME=/opt/dtk-22.04.2
92

zhangqha's avatar
zhangqha committed
93
python3 setup.py install
zhangqha's avatar
zhangqha committed
94
```
95
#### 安装Uni-Fold-main(如使用镜像,则无需再次安装)
zhangqha's avatar
zhangqha committed
96
```
zhangqha's avatar
zhangqha committed
97
pip install -e .
zhangqha's avatar
zhangqha committed
98
```
zhangqha's avatar
zhangqha committed
99
100
### 单卡测试
#### 多聚体参考脚本,需要根据实际情况修改路径配置
zhangqha's avatar
zhangqha committed
101
```
zhangqha's avatar
zhangqha committed
102
sh run_multimer.sh 
zhangqha's avatar
zhangqha committed
103
```
zhangqha's avatar
zhangqha committed
104
#### 单聚体参考脚本,需要根据实际情况修改路径配置
zhangqha's avatar
zhangqha committed
105
```
zhangqha's avatar
zhangqha committed
106
sh run_monomer.sh
zhangqha's avatar
zhangqha committed
107
```
zhangqha@sugon.com's avatar
zhangqha@sugon.com committed
108
109
110
111
112
113
114
115
116
117
118
119
120
## result


## 精度

## 应用场景

### 算法类别
NLP

### 热点应用行业
医疗,科研,教育

zhangqha's avatar
update  
zhangqha committed
121
## 源码仓库及问题反馈 
122
* https://developer.hpccube.com/codes/modelzoo/uni-fold
zhangqha's avatar
zhangqha committed
123
124
125
126

## 参考
* https://github.com/dptech-corp/Uni-Fold