Unverified Commit 2864b7ca authored by Gustaf Ahdritz's avatar Gustaf Ahdritz Committed by GitHub
Browse files

Correct description of mmCIF cache

parent 30f92c45
...@@ -178,7 +178,7 @@ where `input.fasta` is a FASTA file containing one or more query sequences. To ...@@ -178,7 +178,7 @@ where `input.fasta` is a FASTA file containing one or more query sequences. To
generate an input FASTA from a directory of mmCIF and/or ProteinNet .core generate an input FASTA from a directory of mmCIF and/or ProteinNet .core
files, we provide `scripts/data_dir_to_fasta.py`. files, we provide `scripts/data_dir_to_fasta.py`.
Next, generate a cache of certain datapoints in the mmCIF files: Next, generate a cache of certain datapoints in the template mmCIF files:
```bash ```bash
python3 scripts/generate_mmcif_cache.py \ python3 scripts/generate_mmcif_cache.py \
...@@ -187,9 +187,10 @@ python3 scripts/generate_mmcif_cache.py \ ...@@ -187,9 +187,10 @@ python3 scripts/generate_mmcif_cache.py \
--no_workers 16 --no_workers 16
``` ```
This cache is used to minimize the number of mmCIF parses performed during This cache is used to pre-filter templates.
training-time data preprocessing. Next, generate a separate chain-level cache
with data used for training-time data filtering: Next, generate a separate chain-level cache with data used for training-time
data filtering:
```bash ```bash
python3 scripts/generate_chain_data_cache.py \ python3 scripts/generate_chain_data_cache.py \
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment