Commit 44e57338 authored by jnwei's avatar jnwei Committed by Jennifer Wei
Browse files

Add addtional inference pages

parent 9c98e574
...@@ -13,7 +13,7 @@ This guide will focus on monomer prediction, the next sections will describe Mul ...@@ -13,7 +13,7 @@ This guide will focus on monomer prediction, the next sections will describe Mul
### Pre-requisites: ### Pre-requisites:
- OpenFold Conda Environment. See [OpenFold Installation](installation.md) for instructions on how to build this environment. - OpenFold Conda Environment. See [OpenFold Installation](Installation.md) for instructions on how to build this environment.
- Downloading sequence databases for performing multiple sequence alignments. We provide a script to download the AlphaFold databases [here](https://github.com/aqlaboratory/openfold/blob/main/scripts/download_alphafold_dbs.sh). - Downloading sequence databases for performing multiple sequence alignments. We provide a script to download the AlphaFold databases [here](https://github.com/aqlaboratory/openfold/blob/main/scripts/download_alphafold_dbs.sh).
......
# Multimer Inference
To run inference on a complex or multiple complexes using a set of DeepMind's pretrained parameters, you will need:
- AlphaFold Multimer v2.3 parameters
- Updated sequence databases, with UniRef and PDB Seqres databases.
## Upgrade from a previous OpenFold Installation
If you had previously downloaded OpenFold parameters and or AlphaFold databases, you will need to download updated versions. Here are some instructions for upgrading from an existing openfold installations.
### Download AlphaFold-Multimer v2.3 Model Parameters
1. Re-download the alphafold parameters to get the latest
AlphaFold-Multimer v2.3 weights:
```bash
bash scripts/download_alphafold_params.sh openfold/resources
```
### Download AlphaFold Databases for Multimer
1. Download the [UniProt](https://www.uniprot.org/uniprotkb/)
and [PDB SeqRes](https://www.rcsb.org/) databases:
```bash
bash scripts/download_uniprot.sh data/
```
The PDB SeqRes and PDB databases must be from the same date to avoid potential
errors during template searching. Remove the existing `data/pdb_mmcif` directory
and download both databases:
```bash
bash scripts/download_pdb_mmcif.sh data/
bash scripts/download_pdb_seqres.sh data/
```
1. Additionally, AlphaFold-Multimer uses upgraded versions of the [MGnify](https://www.ebi.ac.uk/metagenomics)
and [UniRef30](https://uniclust.mmseqs.com/) (previously UniClust30) databases. To download the upgraded databases, run:
```bash
bash scripts/download_uniref30.sh data/
bash scripts/download_mgnify.sh data/
```
```{note}
Multimer inference can also run with the older database versions if desired.
```
## Inference command
```bash
python3 run_pretrained_openfold.py \
fasta_dir \
data/pdb_mmcif/mmcif_files/ \
--uniref90_database_path data/uniref90/uniref90.fasta \
--mgnify_database_path data/mgnify/mgy_clusters_2022_05.fa \
--pdb_seqres_database_path data/pdb_seqres/pdb_seqres.txt \
--uniref30_database_path data/uniref30/UniRef30_2021_03 \
--uniprot_database_path data/uniprot/uniprot.fasta \
--bfd_database_path data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--jackhmmer_binary_path lib/conda/envs/openfold_venv/bin/jackhmmer \
--hhblits_binary_path lib/conda/envs/openfold_venv/bin/hhblits \
--hmmsearch_binary_path lib/conda/envs/openfold_venv/bin/hmmsearch \
--hmmbuild_binary_path lib/conda/envs/openfold_venv/bin/hmmbuild \
--kalign_binary_path lib/conda/envs/openfold_venv/bin/kalign \
--config_preset "model_1_multimer_v3" \
--model_device "cuda:0" \
--output_dir ./
```
As with monomer inference, if you've already computed alignments for the query, you can use
the `--use_precomputed_alignments` option. Note that template searching in the multimer pipeline
uses HMMSearch with the PDB SeqRes database, replacing HHSearch and PDB70 used in the monomer pipeline.
\ No newline at end of file
# Notes on OpenFold Training and Parameters
For OpenFold model parameters, v. 06_22. For OpenFold model parameters, v. 06_22.
# Training details: ## Training details
Trained using OpenFold on 44 A100s using the training schedule from Table 4 in OpenFold was trained using OpenFold on 44 A100s using the training schedule from Table 4 in
the AlphaFold supplement. AlphaFold was used as the pre-distillation model. the AlphaFold supplement. AlphaFold was used as the pre-distillation model.
Training data is hosted publicly in the "OpenFold Training Data" RODA repository. Training data is hosted publicly in the "OpenFold Training Data" RODA repository.
To improve model diversity, we forked training after the initial training phase To improve model diversity, we forked training after the initial training phase
and finetuned an additonal branch without templates. and finetuned an additonal branch without templates.
# Parameter files: ## Parameter files
Parameter files fall into the following categories: Parameter files fall into the following categories:
......
### Soloseq inference
MSA-free sequence to structure prediction using the [ESM-1b model](https://github.com/facebookresearch/esm) embeddings.
To run inference for a sequence using the SoloSeq single-sequence model, you can either precompute ESM-1b embeddings in bulk, or you can generate them during inference.
For generating ESM-1b embeddings in bulk, use the provided script: `scripts/precompute_embeddings.py`. The script takes a directory of FASTA files (one sequence per file) and generates ESM-1b embeddings in the same format and directory structure as required by SoloSeq. Following is an example command to use the script:
```shell
python scripts/precompute_embeddings.py fasta_dir/ embeddings_output_dir/
```
In the same per-label subdirectories inside `embeddings_output_dir`, you can also place `*.hhr` files (outputs from HHSearch), which can contain the details about the structures that you want to use as templates. If you do not place any such file, templates will not be used and only the ESM-1b embeddings will be used to predict the structure. If you want to use templates, you need to pass the PDB MMCIF dataset to the command.
Then download the SoloSeq model weights, e.g.:
```shell
bash scripts/download_openfold_soloseq_params.sh openfold/resources
```
Now, you are ready to run inference:
```shell
python run_pretrained_openfold.py \
fasta_dir \
data/pdb_mmcif/mmcif_files/ \
--use_precomputed_alignments embeddings_output_dir \
--output_dir ./ \
--model_device "cuda:0" \
--config_preset "seq_model_esm1b_ptm" \
--openfold_checkpoint_path openfold/resources/openfold_soloseq_params/seq_model_esm1b_ptm.pt
```
For generating the embeddings during inference, skip the `--use_precomputed_alignments` argument. The `*.hhr` files will be generated as well if you pass the paths to the relevant databases and tools, as specified in the command below. If you skip the database and tool arguments, HHSearch will not be used to find templates and only generated ESM-1b embeddings will be used to predict the structure.
```shell
python3 run_pretrained_openfold.py \
fasta_dir \
data/pdb_mmcif/mmcif_files/ \
--output_dir ./ \
--model_device "cuda:0" \
--config_preset "seq_model_esm1b_ptm" \
--openfold_checkpoint_path openfold/resources/openfold_soloseq_params/seq_model_esm1b_ptm.pt \
--uniref90_database_path data/uniref90/uniref90.fasta \
--pdb70_database_path data/pdb70/pdb70 \
--jackhmmer_binary_path lib/conda/envs/openfold_venv/bin/jackhmmer \
--hhsearch_binary_path lib/conda/envs/openfold_venv/bin/hhsearch \
--kalign_binary_path lib/conda/envs/openfold_venv/bin/kalign \
```
For generating template information, you will need the UniRef90 and PDB70 databases and the JackHmmer and HHSearch binaries.
SoloSeq allows you to use the same flags and optimizations as the MSA-based OpenFold. For example, you can skip relaxation using `--skip_relaxation`, save all model outputs using `--save_outputs`, and generate output files in MMCIF format using `--cif_output`.
```{note}
Due to the nature of the ESM-1b embeddings, the sequence length for inference using the SoloSeq model is limited to 1022 residues. Sequences longer than that will be truncated.
```
\ No newline at end of file
...@@ -6,7 +6,7 @@ This guide covers how to train an OpenFold model. These instructions focus on tr ...@@ -6,7 +6,7 @@ This guide covers how to train an OpenFold model. These instructions focus on tr
### Pre-requisites: ### Pre-requisites:
This guide requires the following: This guide requires the following:
- [Installation of OpenFold and dependencies](installation.md) (Including jackhmmer and hhblits depedencies) - [Installation of OpenFold and dependencies](Installation.md) (Including jackhmmer and hhblits depedencies)
- A preprocessed dataset: - A preprocessed dataset:
- For this guide, we will use the original OpenFold dataset which is available on RODA. This dataset can be downloaded with the following command: - For this guide, we will use the original OpenFold dataset which is available on RODA. This dataset can be downloaded with the following command:
`./scripts/download_roda_dbs.sh <dst_path>`[Download the dataset used to train the OpenFold model] `./scripts/download_roda_dbs.sh <dst_path>`[Download the dataset used to train the OpenFold model]
......
...@@ -14,6 +14,8 @@ Here, you will find guides for: ...@@ -14,6 +14,8 @@ Here, you will find guides for:
- Learn how to [run inference with OpenFold](Inference.md) - Learn how to [run inference with OpenFold](Inference.md)
- [Train your own OpenFold models](Training_OpenFold.md) - [Train your own OpenFold models](Training_OpenFold.md)
Some portions of the documentation are still under migration from the original README, which can be found [here](original_readme.md)
# Features # Features
OpenFold carefully reproduces (almost) all of the features of the original open OpenFold carefully reproduces (almost) all of the features of the original open
...@@ -98,7 +100,9 @@ Any work that cites OpenFold should also cite [AlphaFold](https://www.nature.com ...@@ -98,7 +100,9 @@ Any work that cites OpenFold should also cite [AlphaFold](https://www.nature.com
:caption: Guides :caption: Guides
Installation.md Installation.md
Inference.md Inference.md
OpenFold_Training_setup.md Single_Sequence_Inference.md
Multimer_Inference.md
OpenFold_Training_Setup.md
Training_OpenFold.md Training_OpenFold.md
``` ```
......
# Setup Guide # Setting Up OpenFold
In this guide, we will OpenFold and its dependencies. In this guide, we will OpenFold and its dependencies.
...@@ -42,16 +42,12 @@ The script is a thin wrapper around Python's `unittest` suite, and recognizes ...@@ -42,16 +42,12 @@ The script is a thin wrapper around Python's `unittest` suite, and recognizes
**Alphafold Comparison tests:** **Alphafold Comparison tests:**
Certain tests perform equivalence comparisons with the AlphaFold implementation. Instructions to run this level of tests requires an environment with both AlphaFold 2.0.1 and OpenFold installed, and is not covered in this guide. These tests are skipped by default if no installation of AlphaFold is found. Certain tests perform equivalence comparisons with the AlphaFold implementation. Instructions to run this level of tests requires an environment with both AlphaFold 2.0.1 and OpenFold installed, and is not covered in this guide. These tests are skipped by default if no installation of AlphaFold is found.
## Modifications ## Environment specific modifications
### CUDA 11 environment ### CUDA 12
To use OpenFold on CUDA 11 environment rather than a CUDA 12 environment. To use OpenFold on CUDA 12 environment rather than a CUDA 11 environment.
In step 1, replace the github repository link with [OpenFold version v2](https://github.com/aqlaboratory/openfold/tree/v2.0.0) In step 1, use the branch [`pl_upgrades`](https://github.com/aqlaboratory/openfold/tree/pl_upgrades) rather than the main branch, i.e. replace the URL in step 1 with https://github.com/aqlaboratory/openfold/tree/pl_upgrades
Follow the rest of the steps of [Installation Guide](#installation) Follow the rest of the steps of [Installation Guide](#Installation)
```{note}
Replace with link to last stable pre-update version
```
### Install OpenFold parameters without aws ### Install OpenFold parameters without aws
If you don't have access to `aws` on your system, you can use a different download source: If you don't have access to `aws` on your system, you can use a different download source:
...@@ -61,9 +57,7 @@ If you don't have access to `aws` on your system, you can use a different downlo ...@@ -61,9 +57,7 @@ If you don't have access to `aws` on your system, you can use a different downlo
### Docker setup ### Docker setup
```{note} A [`Dockerfile`] is provided to build an OpenFold Docker image. Additional notes for setting up a docker container for OpenFold and running inference can be found [here](original_readme.md#building-and-using-the-docker-container).
Add / check docker installation instructions
```
## Troubleshooting FAQ ## Troubleshooting FAQ
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment