Update readme and docker for v0.2.0 (#100)

* update readme and setup, not finished * update readme and dockerfile Co-authored-by: lc_pro <gyang_lu@foxmail.com>

Update readme and docker for v0.2.0 (#100)
* update readme and setup, not finished * update readme and dockerfile Co-authored-by: lc_pro <gyang_lu@foxmail.com>
930a58ad · oahzxl · GitHub · 8a599895 · 930a58ad · 930a58ad
Unverified Commit 930a58ad authored Oct 27, 2022 by oahzxl Committed by GitHub Oct 27, 2022
Show whitespace changes
Inline Side-by-side

Showing with 30 additions and 7 deletions

README.md README.md +20 -2

docker/Dockerfile docker/Dockerfile +8 -3

setup.py setup.py +2 -2

No files found.
--- a/README.md
+++ b/README.md
@@ -18,7 +18,8 @@ FastFold provides a **high-performance implementation of Evoformer** with the fo
 3. Ease of use
    * Huge performance gains with a few lines changes
    * You don't need to care about how the parallel part is implemented
-4. Faster data processing, about 3x times faster than the original way
+4. Faster data processing, about 3x times faster on monomer, about 3Nx times faster on multimer with N sequence.
+5. Great Reduction on GPU memory, able to inference sequence containing more than **10000** residues.
 ## Installation
@@ -42,9 +43,24 @@ conda activate fastfold
 python setup.py install
 ```
+#### Advanced
+To leverage the power of FastFold, we recommend you build [Triton]() from source.
+**[NVIDIA CUDA](https://developer.nvidia.com/cuda-downloads) 11.4 or above is needed.**
+```bash
+git clone https://github.com/openai/triton.git ~/triton
+cd ~/triton/python
+pip install -e .
+```
 ### Using PyPi
 You can download FastFold with pre-built CUDA extensions.
+Warning, only stable versions available.
 ```shell
 pip install fastfold -f https://release.colossalai.org/fastfold
 ```
@@ -147,7 +163,9 @@ python inference.py target.fasta data/pdb_mmcif/mmcif_files/ \
 Alphafold's embedding presentations take up a lot of memory as the sequence length increases. To reduce memory usage, 
 you should add parameter `--chunk_size [N]` and `--inplace` to cmdline or shell script `./inference.sh`. 
 The smaller you set N, the less memory will be used, but it will affect the speed. We can inference 
-a sequence of length 7000 in fp32 on a 80G A100.
+a sequence of length 10000 in bf16 with 61GB memory on a Nvidia A100(80GB). For fp32, the max length is 8000.
+> You need to set `PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:15000` to inference such an extreme long sequence.
 ```shell
 python inference.py target.fasta data/pdb_mmcif/mmcif_files/ \
    --output_dir ./ \

--- a/docker/Dockerfile
+++ b/docker/Dockerfile
-FROM hpcaitech/colossalai:0.1.8
+FROM hpcaitech/pytorch-cuda:1.12.0-11.3.0
 RUN conda install openmm=7.7.0 pdbfixer -c conda-forge -y \
 && conda install hmmer==3.3.2 hhsuite=3.3.0 kalign2=2.04 -c bioconda -y
@@ -6,7 +6,12 @@ RUN conda install openmm=7.7.0 pdbfixer -c conda-forge -y \
 RUN pip install biopython==1.79 dm-tree==0.1.6 ml-collections==0.1.0 \
 scipy==1.7.1 ray pyarrow pandas einops
-# prepare environment
+RUN pip install colossalai==0.1.10+torch1.12cu11.3 -f https://release.colossalai.org
-Run git clone https://github.com/hpcaitech/FastFold.git\
+RUN git clone https://github.com/openai/triton.git ~/triton \
+ && cd ~/triton/python \
+ && pip install -e .
+Run git clone https://github.com/hpcaitech/FastFold.git \
 && cd ./FastFold \
 && python setup.py install
--- a/setup.py
+++ b/setup.py
@@ -129,7 +129,7 @@ else:
 setup(
    name='fastfold',
-    version='0.1.0',
+    version='0.2.0',
    packages=find_packages(exclude=(
        'assets',
        'benchmark',
@@ -140,5 +140,5 @@ setup(
    ext_modules=ext_modules,
    package_data={'fastfold': ['model/fastnn/kernel/cuda_native/csrc/*']},
    cmdclass={'build_ext': BuildExtension} if ext_modules else {},
-    install_requires=['einops'],
+    install_requires=['einops', 'colossalai'],
 )