Commit e41e6517 authored by jnwei's avatar jnwei
Browse files

Initial commit for sphinx documentation.

parent f1cd1381
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
@ECHO OFF
pushd %~dp0
REM Command file for Sphinx documentation
if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=source
set BUILDDIR=build
%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.https://www.sphinx-doc.org/
exit /b 1
)
if "%1" == "" goto help
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end
:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
:end
popd
This diff is collapsed.
# Configuration file for the Sphinx documentation builder.
#
# For the full list of built-in configuration values, see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html
# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
project = 'OpenFold'
copyright = '2024, OpenFold Team'
author = 'OpenFold Team'
release = '2.0.0'
# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
extensions = [
'myst_parser',
]
templates_path = ['_templates']
exclude_patterns = []
# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
html_theme = 'furo'
html_static_path = ['_static']
myst_enable_extensions = ["colon_fence"]
# OpenFold
```{figure} ../imgs/of_banner.png
:width: 900px
:align: center
:alt: Comparison of OpenFold and AlphaFold2 predictions to the experimental structure of PDB 7KDX, chain B._
```
A faithful but trainable PyTorch reproduction of DeepMind's
[AlphaFold 2](https://github.com/deepmind/alphafold).
Get started with OpenFold with our [Setup Guide](installation.md)!
# Features
OpenFold carefully reproduces (almost) all of the features of the original open
source monomer (v2.0.1) and multimer (v2.3.2) inference code. The sole exception is
model ensembling, which fared poorly in DeepMind's own ablation testing and is being
phased out in future DeepMind experiments. It is omitted here for the sake of reducing
clutter. In cases where the *Nature* paper differs from the source, we always defer to the
latter.
OpenFold is trainable in full precision, half precision, or `bfloat16` with or without DeepSpeed,
and we've trained it from scratch, matching the performance of the original.
We've publicly released model weights and our training data — some 400,000
MSAs and PDB70 template hit files — under a permissive license. Model weights
are available via scripts in this repository while the MSAs are hosted by the
[Registry of Open Data on AWS (RODA)](https://registry.opendata.aws/openfold).
Try out running inference for yourself with our [Colab notebook](https://colab.research.google.com/github/aqlaboratory/openfold/blob/main/notebooks/OpenFold.ipynb).
OpenFold also supports inference using AlphaFold's official parameters, and
vice versa (see `scripts/convert_of_weights_to_jax.py`).
OpenFold has the following advantages over the reference implementation:
- **Faster inference** on GPU, sometimes by as much as 2x. The greatest speedups are achieved on Ampere or higher architecture GPUs.
- **Inference on extremely long chains**, made possible by our implementation of low-memory attention
([Rabe & Staats 2021](https://arxiv.org/pdf/2112.05682.pdf)). OpenFold can predict the structures of
sequences with more than 4000 residues on a single A100, and even longer ones with CPU offloading.
- **Custom CUDA attention kernels** modified from [FastFold](https://github.com/hpcaitech/FastFold)'s
kernels support in-place attention during inference and training. They use
4x and 5x less GPU memory than equivalent FastFold and stock PyTorch
implementations, respectively.
- **Efficient alignment scripts** using the original AlphaFold HHblits/JackHMMER pipeline or [ColabFold](https://github.com/sokrypton/ColabFold)'s, which uses the faster MMseqs2 instead. We've used them to generate millions of alignments.
- **FlashAttention** support greatly speeds up MSA attention.
- **DeepSpeed DS4Sci_EvoformerAttention kernel** is a memory-efficient attention kernel developed as part of a collaboration between OpenFold and the DeepSpeed4Science initiative. The kernel provides substantial speedups for training and inference, and significantly reduces the model's peak device memory requirement by 13X. The model is 15% faster during the initial training and finetuning stages, and up to 4x faster during inference.
```{note}
TODO: Eventually replace this with some figures / results?
```
# Copyright Notice
While AlphaFold's and, by extension, OpenFold's source code is licensed under
the permissive Apache Licence, Version 2.0, DeepMind's pretrained parameters
fall under the CC BY 4.0 license, a copy of which is downloaded to
`openfold/resources/params` by the installation script. Note that the latter
replaces the original, more restrictive CC BY-NC 4.0 license as of January 2022.
## Contributing
If you encounter problems using OpenFold, feel free to create an issue! We also
welcome pull requests from the community.
## Citing this Work
Please cite our paper:
```bibtex
@article {Ahdritz2022.11.20.517210,
author = {Ahdritz, Gustaf and Bouatta, Nazim and Floristean, Christina and Kadyan, Sachin and Xia, Qinghui and Gerecke, William and O{\textquoteright}Donnell, Timothy J and Berenberg, Daniel and Fisk, Ian and Zanichelli, Niccolò and Zhang, Bo and Nowaczynski, Arkadiusz and Wang, Bei and Stepniewska-Dziubinska, Marta M and Zhang, Shang and Ojewole, Adegoke and Guney, Murat Efe and Biderman, Stella and Watkins, Andrew M and Ra, Stephen and Lorenzo, Pablo Ribalta and Nivon, Lucas and Weitzner, Brian and Ban, Yih-En Andrew and Sorger, Peter K and Mostaque, Emad and Zhang, Zhao and Bonneau, Richard and AlQuraishi, Mohammed},
title = {{O}pen{F}old: {R}etraining {A}lpha{F}old2 yields new insights into its learning mechanisms and capacity for generalization},
elocation-id = {2022.11.20.517210},
year = {2022},
doi = {10.1101/2022.11.20.517210},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/10.1101/2022.11.20.517210},
eprint = {https://www.biorxiv.org/content/early/2022/11/22/2022.11.20.517210.full.pdf},
journal = {bioRxiv}
}
```
If you use OpenProteinSet, please also cite:
```bibtex
@misc{ahdritz2023openproteinset,
title={{O}pen{P}rotein{S}et: {T}raining data for structural biology at scale},
author={Gustaf Ahdritz and Nazim Bouatta and Sachin Kadyan and Lukas Jarosch and Daniel Berenberg and Ian Fisk and Andrew M. Watkins and Stephen Ra and Richard Bonneau and Mohammed AlQuraishi},
year={2023},
eprint={2308.05326},
archivePrefix={arXiv},
primaryClass={q-bio.BM}
}
```
Any work that cites OpenFold should also cite [AlphaFold](https://www.nature.com/articles/s41586-021-03819-2) and [AlphaFold-Multimer](https://www.biorxiv.org/content/10.1101/2021.10.04.463034v1) if applicable.
```{note}
TODO: Replace with final versions of both papers
```
\ No newline at end of file
# Setup Guide
In this guide, we will OpenFold and its dependencies.
**Pre-requisites**
This package is currently supported for CUDA 12 and Pytorch 2.1. All dependencies are listed in the `environment.yml`
## Instructions
:::
### Installation:
1. Clone the repository, e.g. `git clone https://github.com/aqlaboratory/openfold.git`
1. From the `openfold` repo:
- Create a [Mamba]("https://github.com/conda-forge/miniforge/releases/latest/download/) environment, e.g.
`mamba env create -n openfold_env -f environment.yml`
Mamba is recommended as the dependencies required by OpenFold are quite large and mamba can speed up the process.
- Activate the environment, e.g `conda activate openfold_env`
1. Run the setup script to configure kernels and folding resources.
> scripts/install_third_party_dependencies.sh`
3. Prepend the conda environment to the $LD_LIBRARY_PATH., e.g.
`export $LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH``. You may optionally set this as a conda environment variable according to the [conda docs](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#saving-environment-variables) to activate each time the environment is used.
4. Download parameters. We recommend using a destination as `openfold/resources` as our unittests will look for the weights there.
- For AlphaFold2 weights, use
> ./scripts/download_alphafold_params.sh <dest>
- For OpenFold weights, use :
> ./scripts/download_openfold_params.sh <dest>
- For OpenFold SoloSeq weights, use:
> ./scripts/download_openfold_soloseq_params.sh <dest>
### Checking your build with unit tests:
To test your installation, you can run OpenFold unit tests. Make sure that the OpenFold and AlphaFold parameters have been downloaded, and that they are located (or symlinked) in the directory `openfold/resources`
Run with the following script:
> scripts/run_unit_tests.sh
The script is a thin wrapper around Python's `unittest` suite, and recognizes `unittest` arguments. E.g., to run a specific test verbosely:
> scripts/run_unit_tests.sh -v tests.test_model
**Alphafold Comparison tests:**
Certain tests perform equivalence comparisons with the AlphaFold implementation. Instructions to run this level of tests requires an environment with both AlphaFold 2.0.1 and OpenFold installed, and is not covered in this guide. These tests are skipped by default if no installation of AlphaFold is found.
## Modifications
### CUDA 11 environment
To use OpenFold on CUDA 11 environment rather than a CUDA 12 environment.
In step 1, replace the github repository link with [OpenFold version v2](https://github.com/aqlaboratory/openfold/tree/v2.0.0)
Follow the rest of the steps of [Installation Guide](#installation)
```{note}
Replace with link to last stable pre-update version
```
### Install OpenFold parameters without aws
If you don't have access to `aws` on your system, you can use a different download source:
- HuggingFace (requires `git-lts`): `scripts/download_openfold_params_huggingface.sh`
- Google Drive: `scripts/download_openfold_params_gdrive.sh`
### Docker setup
```{note}
Add / check docker installation instructions
```
## Troubleshooting FAQ
- In the unit tests, I see an error such as
```
ImportError: version GLIBCXX_3.4.30 not found
```
> Solution: Make sure that the `$LD_LIBRARY_PATH` environment has been set to include the conda path, e.g. `export $LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH`
- I see a CUDA mismatch error, eg.
```
The detected CUDA version (11.8) mismatches the version that was used to compile
PyTorch (12.1). Please make sure to use the same CUDA versions.
```
> Solution: Ensure that your system's CUDA driver and toolkit are is 12.x. You can check the CUDA driver version with a command such as `nvidia-smi`
- I get some error involving `fatal error: cuda_runtime.h: No such file or directory` and or `ninja: build stopped: subcommand failed.`.
> Solution: Something went wrong with setting up some of the custom kernels. Try running `install_third_party_dependencies.sh` again or try `python3 setup.py install` from inside the OpenFold folder. Make sure to prepend the conda environment as described above before running this.
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment