Commit 26c0ac58 authored by wangsen's avatar wangsen
Browse files

readme.md

parent c6386098
......@@ -8,49 +8,47 @@ https://doi.org/10.1038/s41587-021-01034-y
Higashi使用超图神经网络来揭示这个构造的超图中的高阶交互模式。Higashi可以为scHi-C制作嵌入物,用于下游分析。Higashi可以输入单细胞Hi-C接触图谱,从而能够以单细胞分辨率详细表征3D基因组特征,如TAD样结构域边界和A/B区分数。
![Alt text](./image/image.png)
# 算法原理
Higashi的关键算法设计是将scHi-C数据转换为超图。这种转化保留了scHi-C接触图谱的单细胞分辨率和3D基因组特征。具体来说,嵌入scHi-C数据的过程现在相当于学习超图的节点嵌入,输入scHi-C接触图就变成了预测超图中缺失的超边。在Higashi,我们使用我们最近开发的Hyper-SAGNN架构22,这是一个通用的超图表示学习框架,专门针对scHi-C分析进行了大量的新开发
![Alt text](./image/image-1.png)
# 环境配置
Docker(方式一)
推荐使用docker方式运行,提供拉取的docker镜像:
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
docker run -dit --shm-size 80g --network=host --name=geneformer --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /opt/hyhal/:/opt/hyhal/:ro image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10 /bin/bash
docker exec -it geneformer /bin/bash
docker run -dit --shm-size 80g --network=host --name=higashi --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /opt/hyhal/:/opt/hyhal/:ro image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10 /bin/bash
docker exec -it higashi /bin/bash
```
安装docker中没有的依赖:
```
pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
python setup.py install
```
Dockerfile(方式二)
```
docker build -t geneformer:latest .
docker run -dit --shm-size 80g --network=host --name=geneformer --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /opt/hyhal/:/opt/hyhal/:ro geneformer:latest /bin/bash
docker exec -it geneformer /bin/bash
docker build -t higashi:latest .
docker run -dit --shm-size 80g --network=host --name=higashi --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /opt/hyhal/:/opt/hyhal/:ro geneformer:latest /bin/bash
docker exec -it higashi /bin/bash
```
安装docker中没有的依赖:
```
pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
python setup.py install
```
Conda(方式三)
......@@ -58,8 +56,8 @@ Conda(方式三)
1.创建conda虚拟环境:
```
conda create -n geneformer python=3.10
conda activate geneformer
conda create -n higashi python=3.10
conda activate higashi
```
2.关于本项目DCU显卡所需的工具包、深度学习库等均可从光合开发者社区下载安装。
......@@ -72,152 +70,63 @@ Tips:以上dtk驱动、torch等工具版本需要严格一一对应。
3. 其它依赖库参照requirements.txt安装:
```
python setup.py install
pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
```
# 数据集
```
mkdir -p /work/magroup/ruochiz/Data/scHiC_collection/ramani
mkdir -p /work/magroup/ruochiz/Higashi/Temp/ramani
wget -P /work/magroup/ruochiz/Higashi/ https://mirror.ghproxy.com/https://raw.githubusercontent.com/hanfang/Topsorter/refs/heads/master/data/hg19.chrom.sizes.txt
wget -P /work/magroup/ruochiz/Higashi/ https://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/cytoBand.txt.gz
wget https://drive.google.com/drive/folders/1S0KOMAj60MxQP6mgPV1OKjn_J-lVpzKM?usp=sharing
```
# 测试
## 结合测试数据和Higashi模型生成具备超图分析与接触图嵌入能力的demo
```
python train.py
```
# 精度
bce: 0.5046, mse: 0.7233, acc: 86.692 %, pearson: 0.590, spearman: 0.514, elapse: 27.894 s
# 应用场景
生物
# 下载数据
# 热点应用行业
科研 单细胞预测 基因预测
# 源码仓库及问题反馈
```
wget https://github.com/hanfang/Topsorter/blob/master/data/hg19.chrom.sizes.txt
wget https://drive.google.com/drive/folders/1S0KOMAj60MxQP6mgPV1OKjn_J-lVpzKM?usp=sharing
wget https://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/cytoBand.txt.gz
```
# 创建环境
# 参考资料
```
conda create -n higashi python=3.10
source activate higashi
pip install https://cancon.hpccube.com:65024/directlink/4/pytorch/DAS1.1.1/torch-2.1.0+gitf643949.abi1.dtk2404-cp310-cp310-manylinux_2_31_x86_64.whl
git clone https://github.com/ma-compbio/Higashi/
cd Higashi
python setup.py install
pip install matplotlib==3.7.3 -i https://pypi.tuna.tsinghua.edu.cn/simple
```
安装后环境如下:
```
asciitree 0.3.3
bokeh 3.5.1
click 8.1.7
contourpy 1.2.1
cooler 0.9.0
cycler 0.12.1
Cython 0.29.24
cytoolz 0.12.3
dill 0.3.8
fbpca 1.0
filelock 3.15.4
fonttools 4.53.1
fsspec 2024.6.1
h5py 3.11.0
higashi 0.1.0a0
importlib_metadata 8.2.0
Jinja2 3.1.4
joblib 1.4.2
kiwisolver 1.4.5
llvmlite 0.43.0
MarkupSafe 2.1.5
matplotlib 3.7.3
mpmath 1.3.0
multiprocess 0.70.16
networkx 3.3
numba 0.60.0
numpy 1.23.0
packaging 24.1
pandas 1.3.4
pillow 10.4.0
pip 24.2
pyfaidx 0.8.1.2
pynndescent 0.5.13
pyparsing 3.1.2
python-dateutil 2.9.0.post0
pytz 2024.1
PyYAML 6.0.2
scikit-learn 1.5.1
scipy 1.7.3
seaborn 0.11.2
setuptools 72.1.0
simplejson 3.19.3
six 1.16.0
sympy 1.13.2
threadpoolctl 3.5.0
toolz 0.12.1
torch 2.1.0+das1.1.git3ac1bdd.abi1.dtk2404
tornado 6.4.1
tqdm 4.66.5
typing_extensions 4.12.2
tzdata 2024.1
umap-learn 0.5.6
wheel 0.43.0
xyzservices 2024.6.0
zipp 3.20.0
```
# 结合测试数据和Higashi模型生成具备超图分析与接触图嵌入能力的demo
```
from higashi.Higashi_wrapper import *
config = "/work/magroup/ruochiz/Higashi/config_dir/config_ramani.JSON" # 修改下载文件的路径,如客户对数据集有指定,则根据客户数据集进行修改
higashi_model = Higashi(config)
higashi_model.process_data()
higashi_model.prep_model()
higashi_model.train_for_embeddings()
```
# 验证单细胞Hi-C数据的超图分析与接触图嵌入能力
```
higashi_model.train_for_embeddings()
higashi_model.train_for_imputation_nbr_0()
higashi_model.impute_no_nbr()
higashi_model.train_for_imputation_with_nbr()
higashi_model.impute_with_nbr()
# Visualize embedding results
cell_embeddings = higashi_model.fetch_cell_embeddings()
print (cell_embeddings.shape)
from umap import UMAP
from sklearn.decomposition import PCA
import seaborn as sns
import matplotlib.pyplot as plt
cell_type = higashi_model.label_info['cell type']
fig = plt.figure(figsize=(14, 5))
ax = plt.subplot(1, 2, 1)
vec = PCA(n_components=2).fit_transform(cell_embeddings)
sns.scatterplot(x=vec[:, 0], y=vec[:, 1], hue=cell_type, ax=ax, s=6, linewidth=0)
handles, labels = ax.get_legend_handles_labels()
labels, handles = zip(*sorted(zip(labels, handles), key=lambda t: t[0]))
ax.legend(handles=handles, labels=labels, bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0., ncol=1)
ax = plt.subplot(1, 2, 2)
vec = UMAP(n_components=2).fit_transform(cell_embeddings)
sns.scatterplot(x=vec[:, 0], y=vec[:, 1], hue=cell_type, ax=ax, s=6, linewidth=0)
handles, labels = ax.get_legend_handles_labels()
labels, handles = zip(*sorted(zip(labels, handles), key=lambda t: t[0]))
ax.legend(handles=handles, labels=labels, bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0., ncol=1)
plt.tight_layout()
plt.show()
```
# 参考文档
https://github.com/ma-compbio/Higashi/
```
FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
RUN source /opt/dtk-24.04.1/env.sh
RUN cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && echo 'Asia/Shanghai' >/etc/timezone
......@@ -2,3 +2,6 @@ from higashi.Higashi_wrapper import *
config = "/work/magroup/ruochiz/Higashi/config_dir/config_ramani.JSON"
higashi_model = Higashi(config)
higashi_model.process_data()
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment