"git@developer.sourcefind.cn:OpenDAS/dynamo.git" did not exist on "9be75482abdabb808c93890733d55077742e9934"
Commit 2d4fe4f4 authored by Sachin Kadyan's avatar Sachin Kadyan
Browse files

README: Update details about SoloSeq and inference-time embedding generation.

parent 86b990d6
...@@ -233,14 +233,15 @@ at once. The `run_pretrained_openfold.py` script can enable this config option w ...@@ -233,14 +233,15 @@ at once. The `run_pretrained_openfold.py` script can enable this config option w
`--long_sequence_inference` command line option `--long_sequence_inference` command line option
#### SoloSeq Inference #### SoloSeq Inference
To run inference for a sequence using the SoloSeq single-sequence model, use the provided script: `scripts/precompute_embeddings.py`. The script takes a directory of FASTA files and generates ESM-1b embeddings in the same format and directory structure as required by SoloSeq. Following is an example command to use the script: To run inference for a sequence using the SoloSeq single-sequence model, you can either precompute ESM-1b embeddings in bulk, or you can generate them during inference.
For generating ESM-1b embeddings in bulk, use the provided script: `scripts/precompute_embeddings.py`. The script takes a directory of FASTA files and generates ESM-1b embeddings in the same format and directory structure as required by SoloSeq. Following is an example command to use the script:
```bash ```bash
python scripts/precompute_embeddings.py fasta_dir/ embeddings_output_dir/ python scripts/precompute_embeddings.py fasta_dir/ embeddings_output_dir/
``` ```
In the same per-label subdirectories inside `embeddings_output_dir`, you can also place `*.hhr` files (outputs from HHSearch), which can contain the details about the structures that you want to use as templates. If you do not place any such file, templates will not be used and only the ESM-1b embeddings will be used to predict the structure.
In the same per-label subdirectories inside `embeddings_output_dir`, you can also place `*.hhr` files, which can contain the details about the structures that you want to use as templates.
Now, you are ready to run inference: Now, you are ready to run inference:
```bash ```bash
...@@ -254,6 +255,26 @@ python run_pretrained_openfold.py \ ...@@ -254,6 +255,26 @@ python run_pretrained_openfold.py \
--openfold_checkpoint_path openfold/resources/openfold_params/seq_model_esm1b_ptm.pt --openfold_checkpoint_path openfold/resources/openfold_params/seq_model_esm1b_ptm.pt
``` ```
For generating the embeddings during inference, skip the `--use_precomputed_alignments` argument. The `*.hhr` files will be generated as well if you pass the paths to the relevant databases and tools, as specified in the command below. If you skip the database and tool arguments, HHSearch will not be used to find templates and only generated ESM-1b embeddings will be used to predict the structure.
```bash
python3 run_pretrained_openfold.py \
fasta_dir \
data/pdb_mmcif/mmcif_files/ \
--output_dir ./ \
--model_device "cuda:0" \
--config_preset "seq_model_esm1b_ptm" \
--openfold_checkpoint_path openfold/resources/openfold_params/seq_model_esm1b_ptm.pt \
--uniref90_database_path data/uniref90/uniref90.fasta \
--pdb70_database_path data/pdb70/pdb70 \
--jackhmmer_binary_path lib/conda/envs/openfold_venv/bin/jackhmmer \
--hhsearch_binary_path lib/conda/envs/openfold_venv/bin/hhsearch \
--kalign_binary_path lib/conda/envs/openfold_venv/bin/kalign \
```
SoloSeq allows you to use the same flags and optimizations as the MSA-based OpenFold. For example, you can skip relaxation using `--skip_relaxation`, save all model outputs using `--save_outputs`, and generate output files in MMCIF format using `--cif_output`.
**NOTE:** Due to the nature of the ESM-1b embeddings, the sequence length for inference using the SoloSeq model is limited to 1022 residues. Sequences longer than that will be truncated.
### Training ### Training
To train the model, you will first need to precompute protein alignments. To train the model, you will first need to precompute protein alignments.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment