Commit 4c8e3764 authored by Sachin Kadyan's avatar Sachin Kadyan
Browse files

README: Add details about using the bulk embedding generation script

parent 92835fd5
...@@ -232,22 +232,22 @@ efficent AlphaFold-Multimer more than double the time. Use the ...@@ -232,22 +232,22 @@ efficent AlphaFold-Multimer more than double the time. Use the
at once. The `run_pretrained_openfold.py` script can enable this config option with the at once. The `run_pretrained_openfold.py` script can enable this config option with the
`--long_sequence_inference` command line option `--long_sequence_inference` command line option
#### Single-Sequence Model Inference #### SoloSeq Inference
To run inference for a sequence using the single-sequence model, first you would need the ESM-1b embedding for the sequence. For this you need to set up the [ESM](https://www.github.com/facebookresearch/esm.git) model on your system. Once you have the the setup ready, use the following command in the ESM model directory to generate an embedding: To run inference for a sequence using the SoloSeq single-sequence model, use the provided script: `scripts/precompute_embeddings.py`. The script takes a directory of FASTA files and generates ESM-1b embeddings in the same format and directory structure as required by SoloSeq. Following is an example command to use the script:
```bash ```bash
cd <esm_dir> python scripts/precompute_embeddings.py fasta_dir/ embeddings_output_dir/
python scripts/extract.py esm1b_t33_650M_UR50S <fasta> output_dir --include per_tok
``` ```
Once you have the `*.pt` embedding file, you can place it in that sequence's alignments directory (same as that used by the MSA model of OF). That is, inside the top-level alignments directory, there will be one subdirectory for each sequence you want to run inference on, like so: `alignments_dir/{sequence_id}/{sequence_id}.pt`. You can also place a `*.hhr` file in the same directory, which can contain the details about the structures that you want to use as templates.
In the same per-label subdirectories inside `embeddings_output_dir`, you can also place `*.hhr` files, which can contain the details about the structures that you want to use as templates.
Now, you are ready to run inference: Now, you are ready to run inference:
```bash ```bash
python run_pretrained_openfold.py \ python run_pretrained_openfold.py \
fasta_dir \ fasta_dir \
data/pdb_mmcif/mmcif_files/ \ data/pdb_mmcif/mmcif_files/ \
--use_precomputed_alignments alignments_dir \ --use_precomputed_alignments embeddings_output_dir \
--output_dir ./ \ --output_dir ./ \
--model_device "cuda:0" \ --model_device "cuda:0" \
--config_preset "seq_model_esm1b_ptm" \ --config_preset "seq_model_esm1b_ptm" \
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment