use precomputed msas and features.pkl

2590be89 · zhuwenwen · 5ecff046 · 2590be89 · 2590be89 · 2590be89
Commit 2590be89 authored Nov 23, 2023 by zhuwenwen
Show whitespace changes
Inline Side-by-side

Showing with 17 additions and 17 deletions

README.md README.md +8 -11

run_alphafold.py run_alphafold.py +9 -4

run_monomer.sh run_monomer.sh +0 -1

run_multimer.sh run_multimer.sh +0 -1

No files found.
--- a/README.md
+++ b/README.md
@@ -2,7 +2,7 @@
 * @Author: zhuww
 * @email: zhuww@sugon.com
 * @Date: 2023-04-06 18:04:07
- * @LastEditTime: 2023-11-15 17:30:01
+ * @LastEditTime: 2023-11-23 16:01:01
 -->
 # AF2
 ## 论文
@@ -96,7 +96,7 @@ $DOWNLOAD_DIR/
 ```bash
    ./run_monomer.sh
 ```
-单体推理参数说明：download_dir为数据集下载目录，monomer.fasta为推理的单体序列；`--output_dir`为输出目录；`model_names`为推理的模型名称，`--model_preset=monomer`为单体模型配置；`--run_relax=true`为进行relax操作；`--use_gpu_relax=true`为使用gpu进行relax操作（速度更快，但可能不太稳定），`--use_gpu_relax=false`为使用CPU进行relax操作（速度慢，但稳定）；若添加use_precomputed_msas=true则可以加载已经搜索对齐的序列，否则默认进行搜索对齐。
+单体推理参数说明：download_dir为数据集下载目录，monomer.fasta为推理的单体序列；`--output_dir`为输出目录；`model_names`为推理的模型名称，`--model_preset=monomer`为单体模型配置；`--run_relax=true`为进行relax操作；`--use_gpu_relax=true`为使用gpu进行relax操作（速度更快，但可能不太稳定），`--use_gpu_relax=false`为使用CPU进行relax操作（速度慢，但稳定）。

 ### 多体
 ```bash
@@ -129,17 +129,14 @@ $DOWNLOAD_DIR/
 测试数据：[casp14](https://www.predictioncenter.org/casp14/targetlist.cgi)、[uniprot](https://www.uniprot.org/)，
 使用的加速卡:1张 Z100L-32G

-1、lddt
-   见<target_name>/ranking_debug.json中的`plddts`
-
-2、其它精度值计算：[https://zhanggroup.org/TM-score/](https://zhanggroup.org/TM-score/)
+plddts:见<target_name>/ranking_debug.json中的`plddts`

 准确性数据：
-| 数据类型 | 序列类型 | 序列标签 | 序列长度 | GDT-TS | GDT-HA | LDDT | TM score | MaxSub | RMSD |
-| :------: | :------: | :------: | :------: |:------: |:------: | :------: | :------: | :------: |:------: |
-| fp32 | 单体 | T1026 | 172 | 0.849 | 0.658 | 75.050 | 0.901 | 0.851 | 1.6 |
-| fp32 | 单体 | T1053 | 580 | 0.941 | 0.789 | 92.316 | 0.985 | 0.935 | 1.1 |
-| fp32 | 单体 | T1091 | 863 | 0.492 | 0.332 | 85.083 | 0.740 | 0.388 | 6.7 |
+| 数据类型 | 序列类型 | 序列标签 | 序列长度 | LDDT |
+| :------: | :------: | :------: | :------: |:------: |
+| fp32 | 单体 | T1026 | 172 | 75.050 |
+| fp32 | 单体 | T1053 | 580 | 92.316 | 
+| fp32 | 单体 | T1091 | 863 | 85.083 |

 ## 应用场景


--- a/run_alphafold.py
+++ b/run_alphafold.py
@@ -194,13 +194,18 @@ def predict_structure(

  # Get features.
  t_0 = time.time()
+  features_output_path = os.path.join(output_dir, 'features.pkl')
+  if os.path.exists(features_output_path):
+    feature_dict = pickle.load(open(features_output_path, 'rb'))
+  
+  else:
    feature_dict = data_pipeline.process(
        input_fasta_path=fasta_path,
        msa_output_dir=msa_output_dir)
  timings['features'] = time.time() - t_0

  # Write out features as a pickled dictionary.
-  features_output_path = os.path.join(output_dir, 'features.pkl')
+  # features_output_path = os.path.join(output_dir, 'features.pkl')
  with open(features_output_path, 'wb') as f:
    pickle.dump(feature_dict, f, protocol=4)


--- a/run_monomer.sh
+++ b/run_monomer.sh
@@ -2,7 +2,6 @@
 python3 run_alphafold.py \
 --fasta_paths=monomer.fasta \
 --output_dir=./ \
- --use_precomputed_msas=false \
 --data_dir=$download_dir  \
 --model_names="model_1" \
 --uniref90_database_path=$download_dir/uniref90/uniref90.fasta \

--- a/run_multimer.sh
+++ b/run_multimer.sh
@@ -3,7 +3,6 @@ python3 run_alphafold.py \
 --fasta_paths=multimer.fasta \
 --output_dir=./ \
 --num_multimer_predictions_per_model=1 \
- --use_precomputed_msas=false \
 --data_dir=$download_dir  \
 --model_names="model_1_multimer_v3" \
 --uniref90_database_path=$download_dir/uniref90/uniref90.fasta \