README.md 4.21 KB
Newer Older
wangsen's avatar
wangsen committed
1

wangsen's avatar
init  
wangsen committed
2
3
4
# 论文
Transfer learning enables predictions in network biology
https://www.nature.com/articles/s41586-023-06139-9
wangsen's avatar
wangsen committed
5
6
7



wangsen's avatar
init  
wangsen committed
8
# 模型结构
wangsen's avatar
wangsen committed
9
![img](./media/image1.png)
wangsen's avatar
init  
wangsen committed
10
11
12
13
14
15
16

迁移学习通过利用在大规模通用数据集上预训练的深度学习模型,彻底改变了自然语言理解和计算机视觉等领域,然后可以对具有有限任务特定数据的大量下游任务进行微调。在这里,我们开发了一个基于上下文感知、注意力的深度学习模型Geneformer,该模型在大约3000万个单细胞转录组的大规模语料库上进行了预训练,以便在网络生物学数据有限的情况下进行特定于上下文的预测。在预训练过程中,Geneformer对网络动力学有了基本的了解,以完全自我监督的方式将网络层次编码在模型的注意力权重中。使用有限的任务特定数据对与染色质和网络动力学相关的下游任务进行微调,表明Geneformer始终提高了预测准确性。应用于有限患者数据的疾病建模,Geneformer确定了心肌病的候选治疗靶点。总体而言,Geneformer代表了一种预训练的深度学习模型,可以从中对广泛的下游应用进行微调,以加速发现关键的网络调节因子和候选治疗靶点。



# 算法原理
预训练的Geneformer架构。每个单细胞转录组被编码成排序值编码[秩编码],然后通过6层transformer编码器单元进行编码,输入大小为2048(完全代表Geneformer-30M中排序值编码的93%),256个嵌入维度,每层四个注意力头,前馈大小为512。Geneformer在2048的输入大小上使用full dense 自注意力。可提取的输出包括上下文基因和细胞嵌入编码、上下文注意力权重和上下文预测
wangsen's avatar
wangsen committed
17
![img](./media/image2.png)
wangsen's avatar
wangsen committed
18

wangsen's avatar
init  
wangsen committed
19
20
21
22
23
24
25
26

# 环境配置
Docker(方式一)
推荐使用docker方式运行,提供拉取的docker镜像:
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
docker run -dit --shm-size 80g --network=host --name=geneformer --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /opt/hyhal/:/opt/hyhal/:ro image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10 /bin/bash
docker exec -it geneformer /bin/bash
wangsen's avatar
wangsen committed
27
28
```

wangsen's avatar
init  
wangsen committed
29
安装docker中没有的依赖:
wangsen's avatar
wangsen committed
30
31

```
wangsen's avatar
init  
wangsen committed
32
pip install -r requirements.txt  -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
wangsen's avatar
wangsen committed
33
34
35
36
```



wangsen's avatar
init  
wangsen committed
37
Dockerfile(方式二)
wangsen's avatar
wangsen committed
38
39
40


```
wangsen's avatar
init  
wangsen committed
41
42
43
docker build -t geneformer:latest .
docker run -dit --shm-size 80g --network=host --name=geneformer --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /opt/hyhal/:/opt/hyhal/:ro geneformer:latest /bin/bash
docker exec -it geneformer /bin/bash
wangsen's avatar
wangsen committed
44

wangsen's avatar
init  
wangsen committed
45
```
wangsen's avatar
wangsen committed
46

wangsen's avatar
wangsen committed
47

wangsen's avatar
wangsen committed
48

wangsen's avatar
init  
wangsen committed
49
Conda(方式三)
wangsen's avatar
wangsen committed
50

wangsen's avatar
init  
wangsen committed
51
1.创建conda虚拟环境:
wangsen's avatar
wangsen committed
52

wangsen's avatar
init  
wangsen committed
53
54
55
56
```
conda create -n geneformer python=3.10
conda activate geneformer 
```
wangsen's avatar
wangsen committed
57

wangsen's avatar
init  
wangsen committed
58
59
60
2.关于本项目DCU显卡所需的工具包、深度学习库等均可从光合开发者社区下载安装。
- [DTK 24.04.1](https://cancon.hpccube.com:65024/directlink/1/DTK-24.04.1/Ubuntu20.04.1/DTK-24.04.1-Ubuntu20.04.1-x86_64.tar.gz)
- [Pytorch 2.1](https://cancon.hpccube.com:65024/directlink/4/pytorch/DAS1.2/torch-2.1.0+das.opt1.dtk24042-cp310-cp310-manylinux_2_28_x86_64.whl)
wangsen's avatar
wangsen committed
61
62


wangsen's avatar
wangsen committed
63
Tips:以上dtk驱动、torch等工具版本需要严格一一对应。
wangsen's avatar
wangsen committed
64

wangsen's avatar
wangsen committed
65

wangsen's avatar
init  
wangsen committed
66
67
68
3. 其它依赖库参照requirements.txt安装:
```
pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
wangsen's avatar
wangsen committed
69
```
wangsen's avatar
wangsen committed
70

wangsen's avatar
wangsen committed
71
72


wangsen's avatar
init  
wangsen committed
73
74
75
76
77
78
# 下载
## 安装git-lfs 
```
sudo apt-get update
sudo apt-get install git-lfs
```
wangsen's avatar
wangsen committed
79
80


wangsen's avatar
init  
wangsen committed
81
82
83
84
85
86
87
## 下载数据集
```
#git clone https://hf-mirror.com/datasets/ctheodoris/Genecorpus-30M 
mkdir -p /path/to/
cd /path/to
git clone  https://hf-mirror.com/datasets/ctheodoris/Genecorpus-30M
```
wangsen's avatar
wangsen committed
88

wangsen's avatar
wangsen committed
89
## geneformer模型下载
wangsen's avatar
wangsen committed
90

wangsen's avatar
wangsen committed
91
模型下载以及安装geneformer
wangsen's avatar
init  
wangsen committed
92
93
94
95
96
 
```
cd /path/to
git clone  -b pr146_branch   https://hf-mirror.com/ctheodoris/Geneformer
cd Geneformer
wangsen's avatar
wangsen committed
97
pip install -e . 
wangsen's avatar
init  
wangsen committed
98
```
wangsen's avatar
wangsen committed
99
100
101
102
103





wangsen's avatar
init  
wangsen committed
104
# 模型训练
wangsen's avatar
wangsen committed
105

wangsen's avatar
init  
wangsen committed
106
107
108
单卡运行 gene classification
```
cd geneformer/
wangsen's avatar
wangsen committed
109
python  train_cell.py
wangsen's avatar
init  
wangsen committed
110
111
```
详情可以参考 Geneformer/examples/cell_classification.ipynb
wangsen's avatar
wangsen committed
112
113
114
115
116
117
118


# 参考
https://hf-mirror.com/ctheodoris/Geneformer