README: Add details about using the bulk embedding generation script

4c8e3764 · Sachin Kadyan · 92835fd5 · 4c8e3764
Commit 4c8e3764 authored Oct 23, 2023 by Sachin Kadyan
Hide whitespace changes
Inline Side-by-side

Showing with 6 additions and 6 deletions

README.md README.md +6 -6

No files found.
--- a/README.md
+++ b/README.md
@@ -232,22 +232,22 @@ efficent AlphaFold-Multimer more than double the time. Use the
 at once. The `run_pretrained_openfold.py` script can enable this config option with the 
 `--long_sequence_inference` command line option
-#### Single-Sequence Model Inference
+#### SoloSeq Inference
-To run inference for a sequence using the single-sequence model, first you would need the ESM-1b embedding for the sequence. For this you need to set up the [ESM](https://www.github.com/facebookresearch/esm.git) model on your system. Once you have the the setup ready, use the following command in the ESM model directory to generate an embedding:
+To run inference for a sequence using the SoloSeq single-sequence model, use the provided script: `scripts/precompute_embeddings.py`. The script takes a directory of FASTA files and generates ESM-1b embeddings in the same format and directory structure as required by SoloSeq. Following is an example command to use the script:
 ```bash
-cd <esm_dir>
+python scripts/precompute_embeddings.py fasta_dir/ embeddings_output_dir/
-python scripts/extract.py esm1b_t33_650M_UR50S <fasta> output_dir --include per_tok
 ```
-Once you have the `*.pt` embedding file, you can place it in that sequence's  alignments directory (same as that used by the MSA model of OF). That is, inside the top-level alignments directory, there will be one subdirectory for each sequence you want to run inference on, like so: `alignments_dir/{sequence_id}/{sequence_id}.pt`. You can also place a `*.hhr` file in the same directory, which can contain the details about the structures that you want to use as templates.
+In the same per-label subdirectories inside `embeddings_output_dir`, you can also place `*.hhr` files, which can contain the details about the structures that you want to use as templates.
 Now, you are ready to run inference:
 ```bash
 python run_pretrained_openfold.py \
    fasta_dir \
    data/pdb_mmcif/mmcif_files/ \
-    --use_precomputed_alignments alignments_dir \
+    --use_precomputed_alignments embeddings_output_dir \
    --output_dir ./ \
    --model_device "cuda:0" \
    --config_preset "seq_model_esm1b_ptm" \