README.md


# 数据集


# 环境搭建
```
git clone    https://github.com/HazyResearch/hyena-dna.git  && cd hyena-dna
pip install -r requirements.txt  -i https://pypi.tuna.tsinghua.edu.cn/simple


```


```
numpy
scipy
pandas
scikit-learn
matplotlib
tqdm
rich
pytorch-lightning==1.9.4
hydra-core
omegaconf
wandb
einops
opt_einsum
cmake # For pykeops support
# pykeops # Only for S4D. If there are installation problems with pykeops==2.x, try pykeops==1.5
transformers==4.26.1 # For some schedulers and tokenizers
#torchvision
timm==0.9.16
prettytable
numerize
git-lfs

# Dataset specific packages
torchtext==0.16.0 # this needs to align with the pytorch version
#torchtext # this needs to align with the pytorch version
datasets # LRA

# genomic specific
pyfaidx
polars
genomic-benchmarks
loguru
liftover
```


# 训练


数据集

```
data
|-- hg38/
    |-- hg38.ml.fa
    |-- human-sequences.bed

mkdir -p data/hg38/
curl https://storage.googleapis.com/basenji_barnyard2/hg38.ml.fa.gz > data/hg38/hg38.ml.fa.gz
curl https://storage.googleapis.com/basenji_barnyard2/sequences_human.bed > data/hg38/human-sequences.bed
cd  data/hg38/  gzip -d hg38.ml.fa.gz


```
预训练

```
cd ../../
python -m train wandb=null experiment=hg38/hg38_hyena model.d_model=128 model.n_layer=2 dataset.batch_size=256 train.global_batch_size=256 dataset.max_length=1024 optimizer.lr=6e-4 trainer.devices=1
```


# 推理


# 参考链接

https://github.com/HazyResearch/hyena-dna.git