Unverified Commit 94ab346e authored by Gustaf Ahdritz's avatar Gustaf Ahdritz Committed by GitHub
Browse files

Merge pull request #1 from aqlaboratory/mpvenkatesh/changes

minor fixes
parents 6298a3e6 04fa3548
......@@ -40,13 +40,13 @@ scripts/install_third_party_dependencies.sh
To activate the environment, run:
```bash
source scripts/activate_conda_venv.sh
source scripts/activate_conda_env.sh
```
To deactivate it, run:
```bash
source scripts/deactivate_conda_venv.sh
source scripts/deactivate_conda_env.sh
```
To install the HH-suite to `/usr/bin`, run
......@@ -65,8 +65,7 @@ scripts/download_all_data.sh data/
### Inference
To run inference on a sequence using a set of DeepMind's pretrained parameters,
run e.g.
To run inference on a sequence `target.fasta` (e.g., `wget https://www.rcsb.org/fasta/entry/4DSN`) using a set of DeepMind's pretrained parameters, run e.g.
```bash
python3 run_pretrained_openfold.py \
......@@ -78,15 +77,22 @@ python3 run_pretrained_openfold.py \
data/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
--output_dir ./ \
--bfd_database_path data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--device cuda:1
--device cuda:1 \
--jackhmmer_binary_path lib/conda/envs/openfold_venv/bin/jackhmmer \
--hhblits_binary_path lib/conda/envs/openfold_venv/bin/hhblits \
--hhsearch_binary_path lib/conda/envs/openfold_venv/bin/hhsearch \
--kalign_binary_path lib/conda/envs/openfold_venv/bin/kalign
```
where `data` is the same directory as in the previous step.
where `data` is the same directory as in the previous step. If `jackhmmer`, `hhblits`, `hhsearch` and `kalign` are available at the default path of `/usr/bin`, their `binary_path` command-line arguments can be dropped.
### Training
To train the model, you will first need to precompute protein alignments. After
installing OpenFold using `setup.py`, do so with:
After activating the OpenFold environment with `source scripts/activate_conda_env.sh`, install OpenFold by running
```bash
python setup.py install
```
To train the model, you will first need to precompute protein alignments. Create `mmcif_dir/` and download `.cif` files from the PDB (e.g., `wget https://files.rcsb.org/download/4DSN.cif`). Then run:
```bash
python3 scripts/precompute_alignments.py mmcif_dir/ alignment_dir/ \
......@@ -96,9 +102,13 @@ python3 scripts/precompute_alignments.py mmcif_dir/ alignment_dir/ \
data/pdb_mmcif/mmcif_files/ \
data/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
--bfd_database_path data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--cpus 16
--cpus 16 \
--jackhmmer_binary_path lib/conda/envs/openfold_venv/bin/jackhmmer \
--hhblits_binary_path lib/conda/envs/openfold_venv/bin/hhblits \
--hhsearch_binary_path lib/conda/envs/openfold_venv/bin/hhsearch \
--kalign_binary_path lib/conda/envs/openfold_venv/bin/kalign
```
As noted before, you can skip the `binary_path` arguments if these binaries are at `/usr/bin`.
Expect this step to take a very long time, even for small numbers of proteins.
Next, generate a cache of certain datapoints in the mmCIF files:
......
......@@ -4,6 +4,7 @@ import logging
from multiprocessing import Pool
import os
import sys
import json
sys.path.append(".") # an innocent hack to get this to run from the top level
from tqdm import tqdm
......
......@@ -21,6 +21,10 @@ conda update -qy conda \
openmm=7.5.1 \
pdbfixer
# Comment out if you have these already installed on your system, for example in /usr/bin/
conda install -c bioconda aria2
conda install -y -c bioconda hmmer==3.3.2 hhsuite==3.3.0 kalign2==2.04
# Install DeepMind's OpenMM patch
OPENFOLD_DIR=$PWD
pushd lib/conda/envs/$ENV_NAME/lib/python3.7/site-packages/ \
......
......@@ -3,9 +3,14 @@ import logging
import os
import tempfile
import openfold.features.mmcif_parsing as mmcif_parsing
from openfold.features.data_pipeline import AlignmentRunner
from scripts.utils import add_data_args
import openfold.data.mmcif_parsing as mmcif_parsing
from openfold.data.data_pipeline import AlignmentRunner
from utils import add_data_args
#python3 scripts/precompute_alignments.py mmcif_dir/ alignment_dir/ data/uniref90/uniref90.fasta data/mgnify/mgy_clusters_2018_12.fa data/pdb70/pdb70 data/pdb_mmcif/mmcif_files/ data/uniclust30/uniclust30_2018_08/uniclust30_2018_08 --bfd_database_path data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt --cpus 16 --jackhmmer_binary_path /home/u00u98too4mkqFBu8M357/openfold/lib/conda/envs/openfold_venv/bin/jackhmmer --hhblits_binary_path /home/u00u98too4mkqFBu8M357/openfold/lib/conda/envs/openfold_venv/bin/hhblits --hhsearch_binary_path /home/u00u98too4mkqFBu8M357/openfold/lib/conda/envs/openfold_venv/bin/hhsearch --kalign_binary_path /home/u00u98too4mkqFBu8M357/openfold/lib/conda/envs/openfold_venv/bin/kalign
logging.basicConfig(level=logging.DEBUG)
def main(args):
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment