# 论文

Higashi: Multiscale and integrative scHi-C analysis
https://doi.org/10.1038/s41587-021-01034-y

# 模型结构


Higashi使用超图神经网络来揭示这个构造的超图中的高阶交互模式。Higashi可以为scHi-C制作嵌入物，用于下游分析。Higashi可以输入单细胞Hi-C接触图谱，从而能够以单细胞分辨率详细表征3D基因组特征，如TAD样结构域边界和A/B区分数。


![Alt text](./image/image.png)




# 算法原理

Higashi的关键算法设计是将scHi-C数据转换为超图。这种转化保留了scHi-C接触图谱的单细胞分辨率和3D基因组特征。具体来说，嵌入scHi-C数据的过程现在相当于学习超图的节点嵌入，输入scHi-C接触图就变成了预测超图中缺失的超边。在Higashi，我们使用我们最近开发的Hyper-SAGNN架构22，这是一个通用的超图表示学习框架，专门针对scHi-C分析进行了大量的新开发



![Alt text](./image/image-1.png)



# 环境配置
Docker(方式一)
推荐使用docker方式运行，提供拉取的docker镜像：
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
docker run -dit --shm-size 80g --network=host --name=geneformer --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /opt/hyhal/:/opt/hyhal/:ro image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10 /bin/bash
docker exec -it geneformer /bin/bash
```

安装docker中没有的依赖:

```
pip install -r requirements.txt  -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
```



Dockerfile(方式二)


```
docker build -t geneformer:latest .
docker run -dit --shm-size 80g --network=host --name=geneformer --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /opt/hyhal/:/opt/hyhal/:ro geneformer:latest /bin/bash
docker exec -it geneformer /bin/bash

```



Conda(方式三)

1.创建conda虚拟环境：

```
conda create -n geneformer python=3.10
conda activate geneformer 
```

2.关于本项目DCU显卡所需的工具包、深度学习库等均可从光合开发者社区下载安装。
- [DTK 24.04.1](https://cancon.hpccube.com:65024/directlink/1/DTK-24.04.1/Ubuntu20.04.1/DTK-24.04.1-Ubuntu20.04.1-x86_64.tar.gz)
- [Pytorch 2.1](https://cancon.hpccube.com:65024/directlink/4/pytorch/DAS1.2/torch-2.1.0+das.opt1.dtk24042-cp310-cp310-manylinux_2_28_x86_64.whl)


Tips：以上dtk驱动、torch等工具版本需要严格一一对应。


3. 其它依赖库参照requirements.txt安装：
```
pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
```









# 下载数据

```
wget https://github.com/hanfang/Topsorter/blob/master/data/hg19.chrom.sizes.txt
wget https://drive.google.com/drive/folders/1S0KOMAj60MxQP6mgPV1OKjn_J-lVpzKM?usp=sharing
wget https://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/cytoBand.txt.gz
```

# 创建环境


```
conda create -n higashi python=3.10
source activate higashi 
pip install https://cancon.hpccube.com:65024/directlink/4/pytorch/DAS1.1.1/torch-2.1.0+gitf643949.abi1.dtk2404-cp310-cp310-manylinux_2_31_x86_64.whl
git clone https://github.com/ma-compbio/Higashi/
cd Higashi
python setup.py install
 pip install matplotlib==3.7.3  -i https://pypi.tuna.tsinghua.edu.cn/simple
```
安装后环境如下：

```
asciitree          0.3.3
bokeh              3.5.1
click              8.1.7
contourpy          1.2.1
cooler             0.9.0
cycler             0.12.1
Cython             0.29.24
cytoolz            0.12.3
dill               0.3.8
fbpca              1.0
filelock           3.15.4
fonttools          4.53.1
fsspec             2024.6.1
h5py               3.11.0
higashi            0.1.0a0
importlib_metadata 8.2.0
Jinja2             3.1.4
joblib             1.4.2
kiwisolver         1.4.5
llvmlite           0.43.0
MarkupSafe         2.1.5
matplotlib         3.7.3
mpmath             1.3.0
multiprocess       0.70.16
networkx           3.3
numba              0.60.0
numpy              1.23.0
packaging          24.1
pandas             1.3.4
pillow             10.4.0
pip                24.2
pyfaidx            0.8.1.2
pynndescent        0.5.13
pyparsing          3.1.2
python-dateutil    2.9.0.post0
pytz               2024.1
PyYAML             6.0.2
scikit-learn       1.5.1
scipy              1.7.3
seaborn            0.11.2
setuptools         72.1.0
simplejson         3.19.3
six                1.16.0
sympy              1.13.2
threadpoolctl      3.5.0
toolz              0.12.1
torch              2.1.0+das1.1.git3ac1bdd.abi1.dtk2404
tornado            6.4.1
tqdm               4.66.5
typing_extensions  4.12.2
tzdata             2024.1
umap-learn         0.5.6
wheel              0.43.0
xyzservices        2024.6.0
zipp               3.20.0

```

# 结合测试数据和Higashi模型生成具备超图分析与接触图嵌入能力的demo

```
from higashi.Higashi_wrapper import *
config = "/work/magroup/ruochiz/Higashi/config_dir/config_ramani.JSON"     # 修改下载文件的路径，如客户对数据集有指定，则根据客户数据集进行修改
higashi_model = Higashi(config)
higashi_model.process_data()
higashi_model.prep_model()
higashi_model.train_for_embeddings()

```


# 验证单细胞Hi-C数据的超图分析与接触图嵌入能力

```
higashi_model.train_for_embeddings()
higashi_model.train_for_imputation_nbr_0()
higashi_model.impute_no_nbr()
higashi_model.train_for_imputation_with_nbr()
higashi_model.impute_with_nbr()
# Visualize embedding results
cell_embeddings = higashi_model.fetch_cell_embeddings()
print (cell_embeddings.shape)

from umap import UMAP
from sklearn.decomposition import PCA
import seaborn as sns
import matplotlib.pyplot as plt

cell_type = higashi_model.label_info['cell type']
fig = plt.figure(figsize=(14, 5))
ax = plt.subplot(1, 2, 1)
vec = PCA(n_components=2).fit_transform(cell_embeddings)
sns.scatterplot(x=vec[:, 0], y=vec[:, 1], hue=cell_type, ax=ax, s=6, linewidth=0)
handles, labels = ax.get_legend_handles_labels()
labels, handles = zip(*sorted(zip(labels, handles), key=lambda t: t[0]))
ax.legend(handles=handles, labels=labels, bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0., ncol=1)
ax = plt.subplot(1, 2, 2)
vec = UMAP(n_components=2).fit_transform(cell_embeddings)
sns.scatterplot(x=vec[:, 0], y=vec[:, 1], hue=cell_type, ax=ax, s=6, linewidth=0)
handles, labels = ax.get_legend_handles_labels()
labels, handles = zip(*sorted(zip(labels, handles), key=lambda t: t[0]))
ax.legend(handles=handles, labels=labels, bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0., ncol=1)
plt.tight_layout()
plt.show()

```


# 参考文档

https://github.com/ma-compbio/Higashi/




