readme.md

da9012d8 · wangsen · da9012d8
Commit da9012d8 authored Sep 02, 2024 by wangsen
Hide whitespace changes
Inline Side-by-side

Showing with 93 additions and 0 deletions

README.md README.md +93 -0

No files found.
--- a/README.md
+++ b/README.md
+# 数据集
+# 环境搭建
+```
+git clone    https://github.com/HazyResearch/hyena-dna.git  && cd hyena-dna
+pip install -r requirements.txt  -i https://pypi.tuna.tsinghua.edu.cn/simple
+```
+```
+numpy
+scipy
+pandas
+scikit-learn
+matplotlib
+tqdm
+rich
+pytorch-lightning==1.9.4
+hydra-core
+omegaconf
+wandb
+einops
+opt_einsum
+cmake # For pykeops support
+# pykeops # Only for S4D. If there are installation problems with pykeops==2.x, try pykeops==1.5
+transformers==4.26.1 # For some schedulers and tokenizers
+#torchvision
+timm==0.9.16
+prettytable
+numerize
+git-lfs
+# Dataset specific packages
+torchtext==0.16.0 # this needs to align with the pytorch version
+#torchtext # this needs to align with the pytorch version
+datasets # LRA
+# genomic specific
+pyfaidx
+polars
+genomic-benchmarks
+loguru
+liftover
+```
+# 训练
+数据集
+```
+data
+|-- hg38/
+    |-- hg38.ml.fa
+    |-- human-sequences.bed
+mkdir -p data/hg38/
+curl https://storage.googleapis.com/basenji_barnyard2/hg38.ml.fa.gz > data/hg38/hg38.ml.fa.gz
+curl https://storage.googleapis.com/basenji_barnyard2/sequences_human.bed > data/hg38/human-sequences.bed
+cd  data/hg38/  gzip -d hg38.ml.fa.gz
+```
+预训练
+```
+cd ../../
+python -m train wandb=null experiment=hg38/hg38_hyena model.d_model=128 model.n_layer=2 dataset.batch_size=256 train.global_batch_size=256 dataset.max_length=1024 optimizer.lr=6e-4 trainer.devices=1
+```
+# 推理
+# 参考链接
+https://github.com/HazyResearch/hyena-dna.git
\ No newline at end of file