Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
Higashi_pytorch
Commits
26c0ac58
Commit
26c0ac58
authored
Oct 13, 2024
by
wangsen
Browse files
readme.md
parent
c6386098
Changes
4
Hide whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
64 additions
and
142 deletions
+64
-142
README.md
README.md
+51
-142
dockerfile
dockerfile
+4
-0
requirements.txt
requirements.txt
+6
-0
train.py
train.py
+3
-0
No files found.
README.md
View file @
26c0ac58
...
...
@@ -8,49 +8,47 @@ https://doi.org/10.1038/s41587-021-01034-y
Higashi使用超图神经网络来揭示这个构造的超图中的高阶交互模式。Higashi可以为scHi-C制作嵌入物,用于下游分析。Higashi可以输入单细胞Hi-C接触图谱,从而能够以单细胞分辨率详细表征3D基因组特征,如TAD样结构域边界和A/B区分数。

# 算法原理
Higashi的关键算法设计是将scHi-C数据转换为超图。这种转化保留了scHi-C接触图谱的单细胞分辨率和3D基因组特征。具体来说,嵌入scHi-C数据的过程现在相当于学习超图的节点嵌入,输入scHi-C接触图就变成了预测超图中缺失的超边。在Higashi,我们使用我们最近开发的Hyper-SAGNN架构22,这是一个通用的超图表示学习框架,专门针对scHi-C分析进行了大量的新开发

# 环境配置
Docker(方式一)
推荐使用docker方式运行,提供拉取的docker镜像:
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
docker run -dit --shm-size 80g --network=host --name=
geneformer
--privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /opt/hyhal/:/opt/hyhal/:ro image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10 /bin/bash
docker exec -it
geneformer
/bin/bash
docker run -dit --shm-size 80g --network=host --name=
higashi
--privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /opt/hyhal/:/opt/hyhal/:ro image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10 /bin/bash
docker exec -it
higashi
/bin/bash
```
安装docker中没有的依赖:
```
pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
python setup.py install
```
Dockerfile(方式二)
```
docker build -t
geneformer
:latest .
docker run -dit --shm-size 80g --network=host --name=
geneformer
--privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /opt/hyhal/:/opt/hyhal/:ro geneformer:latest /bin/bash
docker exec -it
geneformer
/bin/bash
docker build -t
higashi
:latest .
docker run -dit --shm-size 80g --network=host --name=
higashi
--privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /opt/hyhal/:/opt/hyhal/:ro geneformer:latest /bin/bash
docker exec -it
higashi
/bin/bash
```
安装docker中没有的依赖:
```
pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
python setup.py install
```
Conda(方式三)
...
...
@@ -58,8 +56,8 @@ Conda(方式三)
1.
创建conda虚拟环境:
```
conda create -n
geneformer
python=3.10
conda activate
geneformer
conda create -n
higashi
python=3.10
conda activate
higashi
```
2.
关于本项目DCU显卡所需的工具包、深度学习库等均可从光合开发者社区下载安装。
...
...
@@ -72,152 +70,63 @@ Tips:以上dtk驱动、torch等工具版本需要严格一一对应。
3.
其它依赖库参照requirements.txt安装:
```
python setup.py install
pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
```
# 数据集
```
mkdir -p /work/magroup/ruochiz/Data/scHiC_collection/ramani
mkdir -p /work/magroup/ruochiz/Higashi/Temp/ramani
wget -P /work/magroup/ruochiz/Higashi/ https://mirror.ghproxy.com/https://raw.githubusercontent.com/hanfang/Topsorter/refs/heads/master/data/hg19.chrom.sizes.txt
wget -P /work/magroup/ruochiz/Higashi/ https://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/cytoBand.txt.gz
wget https://drive.google.com/drive/folders/1S0KOMAj60MxQP6mgPV1OKjn_J-lVpzKM?usp=sharing
```
# 测试
## 结合测试数据和Higashi模型生成具备超图分析与接触图嵌入能力的demo
```
python train.py
```
# 精度
bce: 0.5046, mse: 0.7233, acc: 86.692 %, pearson: 0.590, spearman: 0.514, elapse: 27.894 s
# 应用场景
生物
# 下载数据
# 热点应用行业
科研 单细胞预测 基因预测
# 源码仓库及问题反馈
```
wget https://github.com/hanfang/Topsorter/blob/master/data/hg19.chrom.sizes.txt
wget https://drive.google.com/drive/folders/1S0KOMAj60MxQP6mgPV1OKjn_J-lVpzKM?usp=sharing
wget https://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/cytoBand.txt.gz
```
# 创建环境
# 参考资料
```
conda create -n higashi python=3.10
source activate higashi
pip install https://cancon.hpccube.com:65024/directlink/4/pytorch/DAS1.1.1/torch-2.1.0+gitf643949.abi1.dtk2404-cp310-cp310-manylinux_2_31_x86_64.whl
git clone https://github.com/ma-compbio/Higashi/
cd Higashi
python setup.py install
pip install matplotlib==3.7.3 -i https://pypi.tuna.tsinghua.edu.cn/simple
```
安装后环境如下:
```
asciitree 0.3.3
bokeh 3.5.1
click 8.1.7
contourpy 1.2.1
cooler 0.9.0
cycler 0.12.1
Cython 0.29.24
cytoolz 0.12.3
dill 0.3.8
fbpca 1.0
filelock 3.15.4
fonttools 4.53.1
fsspec 2024.6.1
h5py 3.11.0
higashi 0.1.0a0
importlib_metadata 8.2.0
Jinja2 3.1.4
joblib 1.4.2
kiwisolver 1.4.5
llvmlite 0.43.0
MarkupSafe 2.1.5
matplotlib 3.7.3
mpmath 1.3.0
multiprocess 0.70.16
networkx 3.3
numba 0.60.0
numpy 1.23.0
packaging 24.1
pandas 1.3.4
pillow 10.4.0
pip 24.2
pyfaidx 0.8.1.2
pynndescent 0.5.13
pyparsing 3.1.2
python-dateutil 2.9.0.post0
pytz 2024.1
PyYAML 6.0.2
scikit-learn 1.5.1
scipy 1.7.3
seaborn 0.11.2
setuptools 72.1.0
simplejson 3.19.3
six 1.16.0
sympy 1.13.2
threadpoolctl 3.5.0
toolz 0.12.1
torch 2.1.0+das1.1.git3ac1bdd.abi1.dtk2404
tornado 6.4.1
tqdm 4.66.5
typing_extensions 4.12.2
tzdata 2024.1
umap-learn 0.5.6
wheel 0.43.0
xyzservices 2024.6.0
zipp 3.20.0
```
# 结合测试数据和Higashi模型生成具备超图分析与接触图嵌入能力的demo
```
from higashi.Higashi_wrapper import *
config = "/work/magroup/ruochiz/Higashi/config_dir/config_ramani.JSON" # 修改下载文件的路径,如客户对数据集有指定,则根据客户数据集进行修改
higashi_model = Higashi(config)
higashi_model.process_data()
higashi_model.prep_model()
higashi_model.train_for_embeddings()
```
# 验证单细胞Hi-C数据的超图分析与接触图嵌入能力
```
higashi_model.train_for_embeddings()
higashi_model.train_for_imputation_nbr_0()
higashi_model.impute_no_nbr()
higashi_model.train_for_imputation_with_nbr()
higashi_model.impute_with_nbr()
# Visualize embedding results
cell_embeddings = higashi_model.fetch_cell_embeddings()
print (cell_embeddings.shape)
from umap import UMAP
from sklearn.decomposition import PCA
import seaborn as sns
import matplotlib.pyplot as plt
cell_type = higashi_model.label_info['cell type']
fig = plt.figure(figsize=(14, 5))
ax = plt.subplot(1, 2, 1)
vec = PCA(n_components=2).fit_transform(cell_embeddings)
sns.scatterplot(x=vec[:, 0], y=vec[:, 1], hue=cell_type, ax=ax, s=6, linewidth=0)
handles, labels = ax.get_legend_handles_labels()
labels, handles = zip(*sorted(zip(labels, handles), key=lambda t: t[0]))
ax.legend(handles=handles, labels=labels, bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0., ncol=1)
ax = plt.subplot(1, 2, 2)
vec = UMAP(n_components=2).fit_transform(cell_embeddings)
sns.scatterplot(x=vec[:, 0], y=vec[:, 1], hue=cell_type, ax=ax, s=6, linewidth=0)
handles, labels = ax.get_legend_handles_labels()
labels, handles = zip(*sorted(zip(labels, handles), key=lambda t: t[0]))
ax.legend(handles=handles, labels=labels, bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0., ncol=1)
plt.tight_layout()
plt.show()
```
# 参考文档
https://github.com/ma-compbio/Higashi/
```
dockerfile
View file @
26c0ac58
FROM
image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
RUN
source
/opt/dtk-24.04.1/env.sh
RUN
cp
/usr/share/zoneinfo/Asia/Shanghai /etc/localtime
&&
echo
'Asia/Shanghai'
>
/etc/timezone
requirements.txt
View file @
26c0ac58
matplotlib
==3.7.3
cytoolz
asciitree
multiprocess
joblib
threadpoolctl
pandas
==1.3.4
t
est
.py
→
t
rain
.py
View file @
26c0ac58
...
...
@@ -2,3 +2,6 @@ from higashi.Higashi_wrapper import *
config
=
"/work/magroup/ruochiz/Higashi/config_dir/config_ramani.JSON"
higashi_model
=
Higashi
(
config
)
higashi_model
.
process_data
()
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment