Commit bd0c4d48 authored by Gustaf Ahdritz's avatar Gustaf Ahdritz
Browse files

Update documentation

parent 2d6ef1fb
......@@ -10,14 +10,17 @@ source inference code. The sole exception is model ensembling, which fared
poorly in DeepMind's own ablation testing and is being phased out in future
DeepMind experiments. It is omitted here for the sake of reducing clutter. In
cases where the Nature paper differs from the source, we always defer to the
latter.
latter.
OpenFold is built to support inference with AlphaFold's original JAX weights.
Try it out with our [Colab notebook](https://colab.research.google.com/github/aqlaboratory/openfold/blob/main/notebooks/OpenFold.ipynb).
Unlike DeepMind's public code, OpenFold is also trainable. It can be trained
with or without [DeepSpeed](https://github.com/microsoft/deepspeed) and with
mixed precision. bfloat16 training is not currently supported, but will be
soon.
in the future.
## Installation
## Installation (Linux)
Python dependencies available through `pip` are provided in `requirements.txt`.
OpenFold also depends on `openmm==7.5.1` and `pdbfixer`, which are only
......@@ -45,6 +48,14 @@ source scripts/deactivate_conda_venv.sh
## Usage
To download the genetic databases used by AlphaFold/OpenFold, run:
```bash
scripts/download_all_data.sh data/
```
This script depends on `aria2c`.
To run inference on a sequence using a set of DeepMind's pretrained parameters,
run e.g.
......@@ -61,7 +72,7 @@ python3 run_pretrained_openfold.py \
--device cuda:1
```
where `data` is a directory populated by `scripts/download_all_data.sh`. Run
where `data` is the same directory as in the previous step. Run
```bash
python3 run_pretrained_openfold.py --help
......@@ -69,6 +80,50 @@ python3 run_pretrained_openfold.py --help
for a full list of options.
To train the model, you will first need to precompute protein alignments. After
installing OpenFold using `setup.py`, do so with:
```bash
python3 scripts/precompute_alignments.py mmcif_dir/ alignment_dir/ \
data/uniref90/uniref90.fasta \
data/mgnify/mgy_clusters_2018_12.fa \
data/pdb70/pdb70 \
data/pdb_mmcif/mmcif_files/ \
data/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
--bfd_database_path data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--cpus 16
```
Expect this step to take a very long time, even for small numbers of proteins.
Next, generate a cache of certain datapoints in the mmCIF files as follows:
```bash
python3 scripts/generate_mmcif_cache.py mmcif_dir/ mmcif_cache.json --no_workers 16
```
This cache is used to minimize the number of mmCIF parses performed during
training-time data preprocessing. Finally, call the training script:
```bash
python3 train_openfold.py mmcif_dir/ alignment_dir/ template_mmcif_dir/ \
2021-10-10 \
--template_release_dates_cache_path mmcif_cache.json \
--precision 16 \
--gpus 8 --replace_sampler_ddp=True \
--accelerator ddp \
--seed 42 \ # in multi-gpu settings, the seed must be specified
--deepspeed_config_path deepspeed_config.json
```
where `--template_release_dates_cache_path` is a path to the `.json` file
generated in the previous step. A suitable DeepSpeed configuration file can be
generated with `scripts/build_deepspeed_config.py`. The training script is
written with [PyTorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning)
and supports the full range of training options that entails, including
multi-node distributed training. For more information, consult PyTorch
Lightning documentation and the `--help` flag of the training script.
## Testing
To run unit tests, use
......@@ -87,7 +142,7 @@ scripts/run_unit_tests.sh -v tests.test_model
Certain tests require that AlphaFold be installed in the same Python
environment. These run components of AlphaFold and OpenFold side by side and
ensure that output activations are adequately similar. For most modules, we
target a maximum difference of 1e-4.
target a maximum difference of `1e-4`.
## Copyright notice
......@@ -96,3 +151,7 @@ the permissive Apache Licence, Version 2.0, DeepMind's pretrained parameters
remain under the more restrictive CC BY-NC 4.0 license, a copy of which is
downloaded to `openfold/resources/params` by the installation script. They are
thereby made unavailable for commercial use.
## Contributing
If you encounter problems using OpenFold, feel free to create an issue!
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment