Commit be37a41d authored by Augustin Zidek's avatar Augustin Zidek Committed by Copybara-Service
Browse files

Explain better how to run the AlphaFold-Multimer system.

* Remove a confusing example that folds multiple complexes at once.
* Add examples on how to create the multimer input FASTA files.
* Add a note about the `model_preset` flag to the API changes.

PiperOrigin-RevId: 407774039
Change-Id: I80277c47febc977ba3956d0615944e62b0d5c3fc
parent 91b43223
...@@ -196,6 +196,10 @@ change the following: ...@@ -196,6 +196,10 @@ change the following:
happens inside the Multimer model. happens inside the Multimer model.
* The `preset` flag in `run_alphafold.py` and `run_docker.py` was split into * The `preset` flag in `run_alphafold.py` and `run_docker.py` was split into
`db_preset` and `model_preset`. `db_preset` and `model_preset`.
* The models to use are not specified using `model_names` but rather using the
`model_preset` flag. If you want to customize which models are used for each
preset, you will have to modify the the `MODEL_PRESETS` dictionary in
`alphafold/model/config.py`.
* Setting the `data_dir` flag is now needed when using `run_docker.py`. * Setting the `data_dir` flag is now needed when using `run_docker.py`.
...@@ -299,18 +303,124 @@ All steps are the same as when running the monomer system, but you will have to ...@@ -299,18 +303,124 @@ All steps are the same as when running the monomer system, but you will have to
whether all input sequences in the given fasta file are prokaryotic. If that whether all input sequences in the given fasta file are prokaryotic. If that
is not the case or the origin is unknown, set to `false` for that fasta. is not the case or the origin is unknown, set to `false` for that fasta.
An example that folds two protein complexes `multimer1` and `multimer2` where An example that folds a protein complex `multimer.fasta` that is prokaryotic:
the first is prokaryotic and the second isn't:
```bash ```bash
python3 docker/run_docker.py \ python3 docker/run_docker.py \
--fasta_paths=multimer1.fasta,multimer2.fasta \ --fasta_paths=multimer.fasta \
--is_prokaryote_list=true,false \ --is_prokaryote_list=true \
--max_template_date=2020-05-14 \ --max_template_date=2020-05-14 \
--model_preset=multimer \ --model_preset=multimer \
--data_dir=$DOWNLOAD_DIR --data_dir=$DOWNLOAD_DIR
``` ```
### Examples
Below are examples on how to use AlphaFold in different scenarios.
#### Folding a monomer
Say we have a monomer with the sequence `<SEQUENCE>`. The input fasta should be:
```fasta
>sequence_name
<SEQUENCE>
```
Then run the following command:
```bash
python3 docker/run_docker.py \
--fasta_paths=monomer.fasta \
--max_template_date=2021-11-01 \
--model_preset=monomer \
--data_dir=$DOWNLOAD_DIR
```
#### Folding a homomer
Say we have a homomer from a prokaryote with 3 copies of the same sequence
`<SEQUENCE>`. The input fasta should be:
```fasta
>sequence_1
<SEQUENCE>
>sequence_2
<SEQUENCE>
>sequence_3
<SEQUENCE>
```
Then run the following command:
```bash
python3 docker/run_docker.py \
--fasta_paths=homomer.fasta \
--is_prokaryote_list=true \
--max_template_date=2021-11-01 \
--model_preset=multimer \
--data_dir=$DOWNLOAD_DIR
```
#### Folding a heteromer
Say we have a heteromer A2B3 of unknown origin, i.e. with 2 copies of
`<SEQUENCE A>` and 3 copies of `<SEQUENCE B>`. The input fasta should be:
```fasta
>sequence_1
<SEQUENCE A>
>sequence_2
<SEQUENCE A>
>sequence_3
<SEQUENCE B>
>sequence_4
<SEQUENCE B>
>sequence_5
<SEQUENCE B>
```
Then run the following command:
```bash
python3 docker/run_docker.py \
--fasta_paths=heteromer.fasta \
--is_prokaryote_list=false \
--max_template_date=2021-11-01 \
--model_preset=multimer \
--data_dir=$DOWNLOAD_DIR
```
#### Folding multiple monomers one after another
Say we have a two monomers, `monomer1.fasta` and `monomer2.fasta`.
We can fold both sequentially by using the following command:
```bash
python3 docker/run_docker.py \
--fasta_paths=monomer1.fasta,monomer2.fasta \
--max_template_date=2021-11-01 \
--model_preset=monomer \
--data_dir=$DOWNLOAD_DIR
```
#### Folding multiple multimers one after another
Say we have a two multimers, `multimer1.fasta` and `multimer2.fasta`. Both are
from a prokaryotic organism.
We can fold both sequentially by using the following command:
```bash
python3 docker/run_docker.py \
--fasta_paths=multimer1.fasta,multimer2.fasta \
--is_prokaryote_list=true,true \
--max_template_date=2021-11-01 \
--model_preset=multimer \
--data_dir=$DOWNLOAD_DIR
```
### AlphaFold output ### AlphaFold output
The outputs will be saved in a subdirectory of the directory provided via the The outputs will be saved in a subdirectory of the directory provided via the
......
...@@ -32,17 +32,17 @@ flags.DEFINE_string( ...@@ -32,17 +32,17 @@ flags.DEFINE_string(
'gpu_devices', 'all', 'gpu_devices', 'all',
'Comma separated list of devices to pass to NVIDIA_VISIBLE_DEVICES.') 'Comma separated list of devices to pass to NVIDIA_VISIBLE_DEVICES.')
flags.DEFINE_list( flags.DEFINE_list(
'fasta_paths', None, 'fasta_paths', None, 'Paths to FASTA files, each containing a prediction '
'Paths to FASTA files, each containing one sequence. Paths should be ' 'target that will be folded one after another. If a FASTA file contains '
'multiple sequences, then it will be folded as a multimer. Paths should be '
'separated by commas. All FASTA paths must have a unique basename as the ' 'separated by commas. All FASTA paths must have a unique basename as the '
'basename is used to name the output directories for each prediction.') 'basename is used to name the output directories for each prediction.')
flags.DEFINE_list('is_prokaryote_list', None, 'Optional for multimer system, ' flags.DEFINE_list(
'not used by the single chain system. ' 'is_prokaryote_list', None, 'Optional for multimer system, not used by the '
'This list should contain a boolean for each fasta ' 'single chain system. This list should contain a boolean for each fasta '
'specifying true where the target complex is from a ' 'specifying true where the target complex is from a prokaryote, and false '
'prokaryote, and false where it is not, or where the ' 'where it is not, or where the origin is unknown. These values determine '
'origin is unknown. These values determine the pairing ' 'the pairing method for the MSA.')
'method for the MSA.')
flags.DEFINE_string( flags.DEFINE_string(
'output_dir', '/tmp/alphafold', 'output_dir', '/tmp/alphafold',
'Path to a directory that will store the results.') 'Path to a directory that will store the results.')
......
...@@ -43,18 +43,18 @@ from alphafold.model import data ...@@ -43,18 +43,18 @@ from alphafold.model import data
logging.set_verbosity(logging.INFO) logging.set_verbosity(logging.INFO)
flags.DEFINE_list('fasta_paths', None, 'Paths to FASTA files, each containing ' flags.DEFINE_list(
'a prediction target. Paths should be separated by commas. ' 'fasta_paths', None, 'Paths to FASTA files, each containing a prediction '
'All FASTA paths must have a unique basename as the ' 'target that will be folded one after another. If a FASTA file contains '
'basename is used to name the output directories for ' 'multiple sequences, then it will be folded as a multimer. Paths should be '
'each prediction.') 'separated by commas. All FASTA paths must have a unique basename as the '
flags.DEFINE_list('is_prokaryote_list', None, 'Optional for multimer system, ' 'basename is used to name the output directories for each prediction.')
'not used by the single chain system. ' flags.DEFINE_list(
'This list should contain a boolean for each fasta ' 'is_prokaryote_list', None, 'Optional for multimer system, not used by the '
'specifying true where the target complex is from a ' 'single chain system. This list should contain a boolean for each fasta '
'prokaryote, and false where it is not, or where the ' 'specifying true where the target complex is from a prokaryote, and false '
'origin is unknown. These values determine the pairing ' 'where it is not, or where the origin is unknown. These values determine '
'method for the MSA.') 'the pairing method for the MSA.')
flags.DEFINE_string('data_dir', None, 'Path to directory of supporting data.') flags.DEFINE_string('data_dir', None, 'Path to directory of supporting data.')
flags.DEFINE_string('output_dir', None, 'Path to a directory that will ' flags.DEFINE_string('output_dir', None, 'Path to a directory that will '
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment