"...text-generation-inference.git" did not exist on "64def8e298efc50caf23cc051d38fd3182fc242d"
Commit da9012d8 authored by wangsen's avatar wangsen
Browse files

readme.md

parents
# 数据集
# 环境搭建
```
git clone https://github.com/HazyResearch/hyena-dna.git && cd hyena-dna
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
```
```
numpy
scipy
pandas
scikit-learn
matplotlib
tqdm
rich
pytorch-lightning==1.9.4
hydra-core
omegaconf
wandb
einops
opt_einsum
cmake # For pykeops support
# pykeops # Only for S4D. If there are installation problems with pykeops==2.x, try pykeops==1.5
transformers==4.26.1 # For some schedulers and tokenizers
#torchvision
timm==0.9.16
prettytable
numerize
git-lfs
# Dataset specific packages
torchtext==0.16.0 # this needs to align with the pytorch version
#torchtext # this needs to align with the pytorch version
datasets # LRA
# genomic specific
pyfaidx
polars
genomic-benchmarks
loguru
liftover
```
# 训练
数据集
```
data
|-- hg38/
|-- hg38.ml.fa
|-- human-sequences.bed
mkdir -p data/hg38/
curl https://storage.googleapis.com/basenji_barnyard2/hg38.ml.fa.gz > data/hg38/hg38.ml.fa.gz
curl https://storage.googleapis.com/basenji_barnyard2/sequences_human.bed > data/hg38/human-sequences.bed
cd data/hg38/ gzip -d hg38.ml.fa.gz
```
预训练
```
cd ../../
python -m train wandb=null experiment=hg38/hg38_hyena model.d_model=128 model.n_layer=2 dataset.batch_size=256 train.global_batch_size=256 dataset.max_length=1024 optimizer.lr=6e-4 trainer.devices=1
```
# 推理
# 参考链接
https://github.com/HazyResearch/hyena-dna.git
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment