Initial commit for sphinx documentation.

e41e6517 · jnwei · f1cd1381 · e41e6517 · e41e6517 · e41e6517
Commit e41e6517 authored Mar 20, 2024 by jnwei
7 changed files
--- a/docs/Makefile
+++ b/docs/Makefile
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS    ?=
+SPHINXBUILD   ?= sphinx-build
+SOURCEDIR     = source
+BUILDDIR      = build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
--- a/docs/imgs/of_banner.png
+++ b/docs/imgs/of_banner.png
--- a/docs/make.bat
+++ b/docs/make.bat
+@ECHO OFF
+
+pushd %~dp0
+
+REM Command file for Sphinx documentation
+
+if "%SPHINXBUILD%" == "" (
+	set SPHINXBUILD=sphinx-build
+)
+set SOURCEDIR=source
+set BUILDDIR=build
+
+%SPHINXBUILD% >NUL 2>NUL
+if errorlevel 9009 (
+	echo.
+	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
+	echo.installed, then set the SPHINXBUILD environment variable to point
+	echo.to the full path of the 'sphinx-build' executable. Alternatively you
+	echo.may add the Sphinx directory to PATH.
+	echo.
+	echo.If you don't have Sphinx installed, grab it from
+	echo.https://www.sphinx-doc.org/
+	exit /b 1
+)
+
+if "%1" == "" goto help
+
+%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+goto end
+
+:help
+%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+
+:end
+popd
--- a/docs/source/README.md
+++ b/docs/source/README.md
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
+# Configuration file for the Sphinx documentation builder.
+#
+# For the full list of built-in configuration values, see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+
+# -- Project information -----------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
+
+project = 'OpenFold'
+copyright = '2024, OpenFold Team'
+author = 'OpenFold Team'
+release = '2.0.0'
+
+# -- General configuration ---------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
+
+extensions = [
+	'myst_parser',
+]
+
+templates_path = ['_templates']
+exclude_patterns = []
+
+
+
+# -- Options for HTML output -------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
+
+html_theme = 'furo'
+html_static_path = ['_static']
+myst_enable_extensions = ["colon_fence"]
+
--- a/docs/source/index.md
+++ b/docs/source/index.md
+# OpenFold
+
+```{figure} ../imgs/of_banner.png
+:width: 900px
+:align: center
+:alt: Comparison of OpenFold and AlphaFold2 predictions to the experimental structure of PDB 7KDX, chain B._
+```
+
+A faithful but trainable PyTorch reproduction of DeepMind's 
+[AlphaFold 2](https://github.com/deepmind/alphafold).
+
+Get started with OpenFold with our [Setup Guide](installation.md)!
+
+# Features
+
+OpenFold carefully reproduces (almost) all of the features of the original open
+source monomer (v2.0.1) and multimer (v2.3.2) inference code. The sole exception is 
+model ensembling, which fared poorly in DeepMind's own ablation testing and is being 
+phased out in future DeepMind experiments. It is omitted here for the sake of reducing 
+clutter. In cases where the *Nature* paper differs from the source, we always defer to the 
+latter.
+
+OpenFold is trainable in full precision, half precision, or `bfloat16` with or without DeepSpeed, 
+and we've trained it from scratch, matching the performance of the original. 
+We've publicly released model weights and our training data &mdash; some 400,000 
+MSAs and PDB70 template hit files &mdash; under a permissive license. Model weights 
+are available via scripts in this repository while the MSAs are hosted by the 
+[Registry of Open Data on AWS (RODA)](https://registry.opendata.aws/openfold). 
+Try out running inference for yourself with our [Colab notebook](https://colab.research.google.com/github/aqlaboratory/openfold/blob/main/notebooks/OpenFold.ipynb).
+
+OpenFold also supports inference using AlphaFold's official parameters, and 
+vice versa (see `scripts/convert_of_weights_to_jax.py`).
+
+OpenFold has the following advantages over the reference implementation:
+
+- **Faster inference** on GPU, sometimes by as much as 2x. The greatest speedups are achieved on Ampere or higher architecture GPUs.
+- **Inference on extremely long chains**, made possible by our implementation of low-memory attention 
+([Rabe & Staats 2021](https://arxiv.org/pdf/2112.05682.pdf)). OpenFold can predict the structures of
+  sequences with more than 4000 residues on a single A100, and even longer ones with CPU offloading.
+- **Custom CUDA attention kernels** modified from [FastFold](https://github.com/hpcaitech/FastFold)'s 
+kernels support in-place attention during inference and training. They use 
+4x and 5x less GPU memory than equivalent FastFold and stock PyTorch 
+implementations, respectively.
+- **Efficient alignment scripts** using the original AlphaFold HHblits/JackHMMER pipeline or [ColabFold](https://github.com/sokrypton/ColabFold)'s, which uses the faster MMseqs2 instead. We've used them to generate millions of alignments.
+- **FlashAttention** support greatly speeds up MSA attention.
+- **DeepSpeed DS4Sci_EvoformerAttention kernel** is a memory-efficient attention kernel developed as part of a collaboration between OpenFold and the DeepSpeed4Science initiative. The kernel provides substantial speedups for training and inference, and significantly reduces the model's peak device memory requirement by 13X. The model is 15% faster during the initial training and finetuning stages, and up to 4x faster during inference.
+
+```{note}
+TODO: Eventually replace this with some figures / results?
+```
+
+# Copyright Notice
+
+While AlphaFold's and, by extension, OpenFold's source code is licensed under
+the permissive Apache Licence, Version 2.0, DeepMind's pretrained parameters 
+fall under the CC BY 4.0 license, a copy of which is downloaded to 
+`openfold/resources/params` by the installation script. Note that the latter
+replaces the original, more restrictive CC BY-NC 4.0 license as of January 2022.
+
+## Contributing
+
+If you encounter problems using OpenFold, feel free to create an issue! We also
+welcome pull requests from the community.
+
+## Citing this Work
+
+Please cite our paper:
+
+```bibtex
+@article {Ahdritz2022.11.20.517210,
+	author = {Ahdritz, Gustaf and Bouatta, Nazim and Floristean, Christina and Kadyan, Sachin and Xia, Qinghui and Gerecke, William and O{\textquoteright}Donnell, Timothy J and Berenberg, Daniel and Fisk, Ian and Zanichelli, Niccolò and Zhang, Bo and Nowaczynski, Arkadiusz and Wang, Bei and Stepniewska-Dziubinska, Marta M and Zhang, Shang and Ojewole, Adegoke and Guney, Murat Efe and Biderman, Stella and Watkins, Andrew M and Ra, Stephen and Lorenzo, Pablo Ribalta and Nivon, Lucas and Weitzner, Brian and Ban, Yih-En Andrew and Sorger, Peter K and Mostaque, Emad and Zhang, Zhao and Bonneau, Richard and AlQuraishi, Mohammed},
+	title = {{O}pen{F}old: {R}etraining {A}lpha{F}old2 yields new insights into its learning mechanisms and capacity for generalization},
+	elocation-id = {2022.11.20.517210},
+	year = {2022},
+	doi = {10.1101/2022.11.20.517210},
+	publisher = {Cold Spring Harbor Laboratory},
+	URL = {https://www.biorxiv.org/content/10.1101/2022.11.20.517210},
+	eprint = {https://www.biorxiv.org/content/early/2022/11/22/2022.11.20.517210.full.pdf},
+	journal = {bioRxiv}
+}
+```
+If you use OpenProteinSet, please also cite:
+
+```bibtex
+@misc{ahdritz2023openproteinset,
+      title={{O}pen{P}rotein{S}et: {T}raining data for structural biology at scale}, 
+      author={Gustaf Ahdritz and Nazim Bouatta and Sachin Kadyan and Lukas Jarosch and Daniel Berenberg and Ian Fisk and Andrew M. Watkins and Stephen Ra and Richard Bonneau and Mohammed AlQuraishi},
+      year={2023},
+      eprint={2308.05326},
+      archivePrefix={arXiv},
+      primaryClass={q-bio.BM}
+}
+```
+Any work that cites OpenFold should also cite [AlphaFold](https://www.nature.com/articles/s41586-021-03819-2) and [AlphaFold-Multimer](https://www.biorxiv.org/content/10.1101/2021.10.04.463034v1) if applicable.
+
+
+
+```{note}
+TODO: Replace with final versions of both papers 
+```
\ No newline at end of file
--- a/docs/source/installation.md
+++ b/docs/source/installation.md
+# Setup Guide
+
+In this guide, we will OpenFold and its dependencies.
+
+**Pre-requisites**
+
+This package is currently supported for CUDA 12 and Pytorch 2.1. All dependencies are listed in the `environment.yml`
+
+## Instructions
+:::
+
+### Installation:
+1. Clone the repository, e.g. `git clone https://github.com/aqlaboratory/openfold.git`
+1. From the `openfold` repo:
+    - Create a [Mamba]("https://github.com/conda-forge/miniforge/releases/latest/download/) environment, e.g.
+        `mamba env create -n openfold_env -f environment.yml`
+      Mamba is recommended as the dependencies required by OpenFold are quite large and mamba can speed up the process.
+    - Activate the environment, e.g `conda activate openfold_env`
+1. Run the setup script to configure kernels and folding resources.
+	> scripts/install_third_party_dependencies.sh`
+3. Prepend the conda environment to the $LD_LIBRARY_PATH., e.g. 
+		`export $LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH``. You may optionally set this as a conda environment variable according to the [conda docs](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#saving-environment-variables) to activate each time the environment is used.
+4. Download parameters. We recommend using a destination as `openfold/resources` as our unittests will look for the weights there.
+	-  For AlphaFold2 weights, use 
+		> ./scripts/download_alphafold_params.sh <dest>
+	 - For OpenFold weights, use : 
+		>  ./scripts/download_openfold_params.sh <dest>
+	 - For OpenFold SoloSeq weights, use: 
+		> ./scripts/download_openfold_soloseq_params.sh <dest>
+
+### Checking your build with unit tests: 
+
+To test your installation, you can run OpenFold unit tests. Make sure that the OpenFold and AlphaFold parameters have been downloaded, and that they are located (or symlinked) in the directory `openfold/resources` 
+
+Run with the following script:
+> scripts/run_unit_tests.sh
+
+The script is a thin wrapper around Python's `unittest` suite, and recognizes `unittest` arguments. E.g., to run a specific test verbosely:
+
+> scripts/run_unit_tests.sh -v tests.test_model
+
+**Alphafold Comparison tests:**
+Certain tests perform equivalence comparisons with the AlphaFold implementation. Instructions to run this level of tests requires an environment with both AlphaFold 2.0.1 and OpenFold installed, and is not covered in this guide. These tests are skipped by default if no installation of AlphaFold is found. 
+
+## Modifications
+
+### CUDA 11 environment
+To use OpenFold on CUDA 11 environment rather than a CUDA 12 environment.
+	In step 1, replace the github repository link with [OpenFold version v2](https://github.com/aqlaboratory/openfold/tree/v2.0.0)
+	Follow the rest of the steps of [Installation Guide](#installation)
+
+```{note}
+Replace with link to last stable pre-update version
+```
+
+### Install OpenFold parameters without aws
+If you don't have access to `aws` on your system, you can use a different download source:
+
+- HuggingFace (requires `git-lts`):	`scripts/download_openfold_params_huggingface.sh`
+- Google Drive: `scripts/download_openfold_params_gdrive.sh`
+
+### Docker setup
+
+```{note}
+Add / check docker installation instructions 
+```
+
+## Troubleshooting FAQ
+
+- In the unit tests, I see an error such as  
+	```
+	ImportError: version GLIBCXX_3.4.30 not found
+	```
+
+	> Solution: Make sure that the `$LD_LIBRARY_PATH` environment has been set to include the conda path, e.g. `export $LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH`
+
+- I see a CUDA mismatch error, eg. 
+```
+The detected CUDA version (11.8) mismatches the version that was used to compile
+PyTorch (12.1). Please make sure to use the same CUDA versions.
+```
+ 
+ > 	Solution: Ensure that your system's CUDA driver and toolkit are is 12.x.  You can check the CUDA driver version with a command such as `nvidia-smi`
+
+- I get some error involving `fatal error: cuda_runtime.h: No such file or directory` and or `ninja: build stopped: subcommand failed.`. 
+
+> Solution: Something went wrong with setting up some of the custom kernels. Try running `install_third_party_dependencies.sh` again or try `python3 setup.py install` from inside the OpenFold folder. Make sure to prepend the conda environment as described above before running this.
\ No newline at end of file