Commit e3231de1 authored by DeepMind's avatar DeepMind Committed by Copybara-Service
Browse files

Update of README.md

PiperOrigin-RevId: 501279176
Change-Id: I9cf92212322b29691844973ded9e337e81b3a9fd
parent 8f1ebd58
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
# AlphaFold # AlphaFold
This package provides an implementation of the inference pipeline of AlphaFold This package provides an implementation of the inference pipeline of AlphaFold
v2.0. For simplicity, we refer to this model as AlphaFold throughout the rest of v2. For simplicity, we refer to this model as AlphaFold throughout the rest of
this document. this document.
We also provide: We also provide:
...@@ -36,12 +36,14 @@ If you have any questions, please contact the AlphaFold team at ...@@ -36,12 +36,14 @@ If you have any questions, please contact the AlphaFold team at
![CASP14 predictions](imgs/casp14_predictions.gif) ![CASP14 predictions](imgs/casp14_predictions.gif)
## First time setup ## Installation and running your first prediction
You will need a machine running Linux, AlphaFold does not support other You will need a machine running Linux, AlphaFold does not support other
operating systems. operating systems. Full installation requires up to 3 TB of disk space to keep
genetic databases (SSD storage is recommended) and a modern NVIDIA GPU (GPUs
with more memory can predict larger protein structures).
The following steps are required in order to run AlphaFold: Please follow these steps:
1. Install [Docker](https://www.docker.com/). 1. Install [Docker](https://www.docker.com/).
* Install * Install
...@@ -49,8 +51,36 @@ The following steps are required in order to run AlphaFold: ...@@ -49,8 +51,36 @@ The following steps are required in order to run AlphaFold:
for GPU support. for GPU support.
* Setup running * Setup running
[Docker as a non-root user](https://docs.docker.com/engine/install/linux-postinstall/#manage-docker-as-a-non-root-user). [Docker as a non-root user](https://docs.docker.com/engine/install/linux-postinstall/#manage-docker-as-a-non-root-user).
1. Download genetic databases (see below).
1. Download model parameters (see below). 1. Clone this repository and `cd` into it.
```bash
git clone https://github.com/deepmind/alphafold.git
cd ./alphafold
```
1. Download genetic databases and model parameters:
* Install `aria2c` (on most Linux distributions it is available via the
package manager).
* Please use the script `scripts/download_all_data.sh` to download
and set up full databases. This may take substantial time (download size is
556 GB), so we recommend running this script in the background:
```bash
scripts/download_all_data.sh <DOWNLOAD_DIR> > download.log 2> download_all.log &
```
* **Note: The download directory `<DOWNLOAD_DIR>` should *not* be a
subdirectory in the AlphaFold repository directory.** If it is, the Docker
build will be slow as the large databases will be copied into the docker
build context.
* It is possible to run AlphaFold with reduced databases; please refer to
the [complete documentation](#genetic-databases).
1. Check that AlphaFold will be able to use a GPU by running: 1. Check that AlphaFold will be able to use a GPU by running:
```bash ```bash
...@@ -63,10 +93,58 @@ The following steps are required in order to run AlphaFold: ...@@ -63,10 +93,58 @@ The following steps are required in order to run AlphaFold:
or take a look at the following or take a look at the following
[NVIDIA Docker issue](https://github.com/NVIDIA/nvidia-docker/issues/1447#issuecomment-801479573). [NVIDIA Docker issue](https://github.com/NVIDIA/nvidia-docker/issues/1447#issuecomment-801479573).
If you wish to run AlphaFold using Singularity (a common containerization If you wish to run AlphaFold using Singularity (a common containerization
platform on HPC systems) we recommend using some of the third party Singularity platform on HPC systems) we recommend using some of the third party Singularity
setups as linked in https://github.com/deepmind/alphafold/issues/10 or setups as linked in https://github.com/deepmind/alphafold/issues/10 or
https://github.com/deepmind/alphafold/issues/24. https://github.com/deepmind/alphafold/issues/24.
1. Build the Docker image:
```bash
docker build -f docker/Dockerfile -t alphafold .
```
If you encounter the following error:
```
W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease' is not signed.
```
use the workaround described in
https://github.com/deepmind/alphafold/issues/463#issuecomment-1124881779.
1. Install the `run_docker.py` dependencies. Note: You may optionally wish to
create a
[Python Virtual Environment](https://docs.python.org/3/tutorial/venv.html)
to prevent conflicts with your system's Python environment.
```bash
pip3 install -r docker/requirements.txt
```
1. Make sure that the output directory exists (the default is `/tmp/alphafold`)
and that you have sufficient permissions to write into it.
1. Run `run_docker.py` pointing to a FASTA file containing the protein
sequence(s) for which you wish to predict the structure (`--fasta_paths`
parameter). AlphaFold will search for the available templates before the
date specified by the `--max_template_date` parameter; this could be used to
avoid certain templates during modeling. `--data_dir` is the directory with
downloaded genetic databases and `--output_dir` is the absolute path to the
output directory.
```bash
python3 docker/run_docker.py \
--fasta_paths=your_protein.fasta \
--max_template_date=2022-01-01 \
--data_dir=$DOWNLOAD_DIR \
--output_dir=/home/user/absolute_path_to_the_output_dir
```
1. Once the run is over, the output directory shall contain predicted
structures of the target protein. Please check the documentation below for
additional options and troubleshooting tips.
### Genetic databases ### Genetic databases
...@@ -86,7 +164,7 @@ AlphaFold needs multiple genetic (sequence) databases to run: ...@@ -86,7 +164,7 @@ AlphaFold needs multiple genetic (sequence) databases to run:
We provide a script `scripts/download_all_data.sh` that can be used to download We provide a script `scripts/download_all_data.sh` that can be used to download
and set up all of these databases: and set up all of these databases:
* Default: * Recommended default:
```bash ```bash
scripts/download_all_data.sh <DOWNLOAD_DIR> scripts/download_all_data.sh <DOWNLOAD_DIR>
...@@ -94,14 +172,16 @@ and set up all of these databases: ...@@ -94,14 +172,16 @@ and set up all of these databases:
will download the full databases. will download the full databases.
* With `reduced_dbs`: * With `reduced_dbs` parameter:
```bash ```bash
scripts/download_all_data.sh <DOWNLOAD_DIR> reduced_dbs scripts/download_all_data.sh <DOWNLOAD_DIR> reduced_dbs
``` ```
will download a reduced version of the databases to be used with the will download a reduced version of the databases to be used with the
`reduced_dbs` database preset. `reduced_dbs` database preset. This shall be used with the corresponding
AlphaFold parameter `--db_preset=reduced_dbs` later during the AlphaFold run
(please see [AlphaFold parameters](#running-alphafold) section).
:ledger: **Note: The download directory `<DOWNLOAD_DIR>` should *not* be a :ledger: **Note: The download directory `<DOWNLOAD_DIR>` should *not* be a
subdirectory in the AlphaFold repository directory.** If it is, the Docker build subdirectory in the AlphaFold repository directory.** If it is, the Docker build
...@@ -111,7 +191,7 @@ We don't provide exactly the database versions used in CASP14 – see the ...@@ -111,7 +191,7 @@ We don't provide exactly the database versions used in CASP14 – see the
[note on reproducibility](#note-on-casp14-reproducibility). Some of the [note on reproducibility](#note-on-casp14-reproducibility). Some of the
databases are mirrored for speed, see [mirrored databases](#mirrored-databases). databases are mirrored for speed, see [mirrored databases](#mirrored-databases).
:ledger: **Note: The total download size for the full databases is around 415 GB :ledger: **Note: The total download size for the full databases is around 556 GB
and the total size when unzipped is 2.62 TB. Please make sure you have a large and the total size when unzipped is 2.62 TB. Please make sure you have a large
enough hard drive space, bandwidth and time to download. We recommend using an enough hard drive space, bandwidth and time to download. We recommend using an
SSD for better genetic search performance.** SSD for better genetic search performance.**
...@@ -230,58 +310,11 @@ To use the deprecated v2.1.0 AlphaFold-Multimer model weights: ...@@ -230,58 +310,11 @@ To use the deprecated v2.1.0 AlphaFold-Multimer model weights:
**The simplest way to run AlphaFold is using the provided Docker script.** This **The simplest way to run AlphaFold is using the provided Docker script.** This
was tested on Google Cloud with a machine using the `nvidia-gpu-cloud-image` was tested on Google Cloud with a machine using the `nvidia-gpu-cloud-image`
with 12 vCPUs, 85 GB of RAM, a 100 GB boot disk, the databases on an additional with 12 vCPUs, 85 GB of RAM, a 100 GB boot disk, the databases on an additional
3 TB disk, and an A100 GPU. 3 TB disk, and an A100 GPU. For your first run, please follow the instructions
from [Installation and running your first prediction](#installation-and-running-your-first-prediction)
1. Clone this repository and `cd` into it. section.
```bash 1. By default, Alphafold will attempt to use all visible GPU devices. To use a
git clone https://github.com/deepmind/alphafold.git
```
1. Build the Docker image:
```bash
docker build -f docker/Dockerfile -t alphafold .
```
If you encounter the following error:
```
W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease' is not signed.
```
use the workaround described in
https://github.com/deepmind/alphafold/issues/463#issuecomment-1124881779.
1. Install the `run_docker.py` dependencies. Note: You may optionally wish to
create a
[Python Virtual Environment](https://docs.python.org/3/tutorial/venv.html)
to prevent conflicts with your system's Python environment.
```bash
pip3 install -r docker/requirements.txt
```
1. Make sure that the output directory exists (the default is `/tmp/alphafold`)
and that you have sufficient permissions to write into it.
1. Run `run_docker.py` pointing to a FASTA file containing the protein
sequence(s) for which you wish to predict the structure. If you are
predicting the structure of a protein that is already in PDB and you wish to
avoid using it as a template, then `max_template_date` must be set to be
before the release date of the structure. You must also provide the path to
the directory containing the downloaded databases. For example, for the
T1050 CASP14 target:
```bash
python3 docker/run_docker.py \
--fasta_paths=T1050.fasta \
--max_template_date=2020-05-14 \
--data_dir=$DOWNLOAD_DIR
```
By default, Alphafold will attempt to use all visible GPU devices. To use a
subset, specify a comma-separated list of GPU UUID(s) or index(es) using the subset, specify a comma-separated list of GPU UUID(s) or index(es) using the
`--gpu_devices` flag. See `--gpu_devices` flag. See
[GPU enumeration](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html#gpu-enumeration) [GPU enumeration](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html#gpu-enumeration)
...@@ -325,9 +358,24 @@ with 12 vCPUs, 85 GB of RAM, a 100 GB boot disk, the databases on an additional ...@@ -325,9 +358,24 @@ with 12 vCPUs, 85 GB of RAM, a 100 GB boot disk, the databases on an additional
--max_template_date=2020-05-14 \ --max_template_date=2020-05-14 \
--model_preset=monomer \ --model_preset=monomer \
--db_preset=reduced_dbs \ --db_preset=reduced_dbs \
--data_dir=$DOWNLOAD_DIR --data_dir=$DOWNLOAD_DIR \
--output_dir=/home/user/absolute_path_to_the_output_dir
``` ```
1. After generating the predicted model, by default AlphaFold runs a relaxation
step to improve geometrical quality. You can control this via `--run_relax=true`
(default) or `--run_relax=false`.
1. The relaxation step can be run on GPU (faster, but could be less stable) or
CPU (slow, but stable). This can be controlled with `--enable_gpu_relax=true`
(default) or `--enable_gpu_relax=false`.
1. AlphaFold can re-use MSAs (multiple sequence alignments) for the same
sequence via `--use_precomputed_msas=true` option; this can be useful for
trying different AlphaFold parameters. This option assumes that the
directory structure generated by the first AlphaFold run in the output
directory exists and that the protein sequence is the same.
### Running AlphaFold-Multimer ### Running AlphaFold-Multimer
All steps are the same as when running the monomer system, but you will have to All steps are the same as when running the monomer system, but you will have to
...@@ -342,7 +390,8 @@ python3 docker/run_docker.py \ ...@@ -342,7 +390,8 @@ python3 docker/run_docker.py \
--fasta_paths=multimer.fasta \ --fasta_paths=multimer.fasta \
--max_template_date=2020-05-14 \ --max_template_date=2020-05-14 \
--model_preset=multimer \ --model_preset=multimer \
--data_dir=$DOWNLOAD_DIR --data_dir=$DOWNLOAD_DIR \
--output_dir=/home/user/absolute_path_to_the_output_dir
``` ```
By default the multimer system will run 5 seeds per model (25 total predictions) By default the multimer system will run 5 seeds per model (25 total predictions)
...@@ -402,7 +451,8 @@ python3 docker/run_docker.py \ ...@@ -402,7 +451,8 @@ python3 docker/run_docker.py \
--fasta_paths=monomer.fasta \ --fasta_paths=monomer.fasta \
--max_template_date=2021-11-01 \ --max_template_date=2021-11-01 \
--model_preset=monomer \ --model_preset=monomer \
--data_dir=$DOWNLOAD_DIR --data_dir=$DOWNLOAD_DIR \
--output_dir=/home/user/absolute_path_to_the_output_dir
``` ```
#### Folding a homomer #### Folding a homomer
...@@ -426,7 +476,8 @@ python3 docker/run_docker.py \ ...@@ -426,7 +476,8 @@ python3 docker/run_docker.py \
--fasta_paths=homomer.fasta \ --fasta_paths=homomer.fasta \
--max_template_date=2021-11-01 \ --max_template_date=2021-11-01 \
--model_preset=multimer \ --model_preset=multimer \
--data_dir=$DOWNLOAD_DIR --data_dir=$DOWNLOAD_DIR \
--output_dir=/home/user/absolute_path_to_the_output_dir
``` ```
#### Folding a heteromer #### Folding a heteromer
...@@ -454,7 +505,8 @@ python3 docker/run_docker.py \ ...@@ -454,7 +505,8 @@ python3 docker/run_docker.py \
--fasta_paths=heteromer.fasta \ --fasta_paths=heteromer.fasta \
--max_template_date=2021-11-01 \ --max_template_date=2021-11-01 \
--model_preset=multimer \ --model_preset=multimer \
--data_dir=$DOWNLOAD_DIR --data_dir=$DOWNLOAD_DIR \
--output_dir=/home/user/absolute_path_to_the_output_dir
``` ```
#### Folding multiple monomers one after another #### Folding multiple monomers one after another
...@@ -468,7 +520,8 @@ python3 docker/run_docker.py \ ...@@ -468,7 +520,8 @@ python3 docker/run_docker.py \
--fasta_paths=monomer1.fasta,monomer2.fasta \ --fasta_paths=monomer1.fasta,monomer2.fasta \
--max_template_date=2021-11-01 \ --max_template_date=2021-11-01 \
--model_preset=monomer \ --model_preset=monomer \
--data_dir=$DOWNLOAD_DIR --data_dir=$DOWNLOAD_DIR \
--output_dir=/home/user/absolute_path_to_the_output_dir
``` ```
#### Folding multiple multimers one after another #### Folding multiple multimers one after another
...@@ -482,7 +535,8 @@ python3 docker/run_docker.py \ ...@@ -482,7 +535,8 @@ python3 docker/run_docker.py \
--fasta_paths=multimer1.fasta,multimer2.fasta \ --fasta_paths=multimer1.fasta,multimer2.fasta \
--max_template_date=2021-11-01 \ --max_template_date=2021-11-01 \
--model_preset=multimer \ --model_preset=multimer \
--data_dir=$DOWNLOAD_DIR --data_dir=$DOWNLOAD_DIR \
--output_dir=/home/user/absolute_path_to_the_output_dir
``` ```
### AlphaFold output ### AlphaFold output
...@@ -633,6 +687,7 @@ If you use the code or data in this package, please cite: ...@@ -633,6 +687,7 @@ If you use the code or data in this package, please cite:
In addition, if you use the AlphaFold-Multimer mode, please cite: In addition, if you use the AlphaFold-Multimer mode, please cite:
```bibtex ```bibtex
@article {AlphaFold-Multimer2021, @article {AlphaFold-Multimer2021,
author = {Evans, Richard and O{\textquoteright}Neill, Michael and Pritzel, Alexander and Antropova, Natasha and Senior, Andrew and Green, Tim and {\v{Z}}{\'\i}dek, Augustin and Bates, Russ and Blackwell, Sam and Yim, Jason and Ronneberger, Olaf and Bodenstein, Sebastian and Zielinski, Michal and Bridgland, Alex and Potapenko, Anna and Cowie, Andrew and Tunyasuvunakool, Kathryn and Jain, Rishub and Clancy, Ellen and Kohli, Pushmeet and Jumper, John and Hassabis, Demis}, author = {Evans, Richard and O{\textquoteright}Neill, Michael and Pritzel, Alexander and Antropova, Natasha and Senior, Andrew and Green, Tim and {\v{Z}}{\'\i}dek, Augustin and Bates, Russ and Blackwell, Sam and Yim, Jason and Ronneberger, Olaf and Bodenstein, Sebastian and Zielinski, Michal and Bridgland, Alex and Potapenko, Anna and Cowie, Andrew and Tunyasuvunakool, Kathryn and Jain, Rishub and Clancy, Ellen and Kohli, Pushmeet and Jumper, John and Hassabis, Demis},
...@@ -754,4 +809,5 @@ reference to the following: ...@@ -754,4 +809,5 @@ reference to the following:
(unmodified), by Mitchell AL et al., available free of all copyright (unmodified), by Mitchell AL et al., available free of all copyright
restrictions and made fully and freely available for both non-commercial and restrictions and made fully and freely available for both non-commercial and
commercial use under commercial use under
[CC0 1.0 Universal (CC0 1.0) Public Domain Dedication](https://creativecommons.org/publicdomain/zero/1.0/). [CC0 1.0 Universal (CC0 1.0) Public Domain Dedication](https://creativecommons.org/publicdomain/zero/1.0/).
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment