Commit 0b5c9492 authored by Lukas Jarosch's avatar Lukas Jarosch
Browse files

Give script more descriptive name

parent e2479cb5
"""
The RODA database is non-redundant, meaning that it only stores one explicit
representative alignment directory for all PDB chains in a 100% sequence
identity cluster. In order to add explicit alignments for all PDB chains, this
script will add the missing chain directories and symlink them to their
representative alignment directories.
The OpenProteinSet alignment database is non-redundant, meaning that it only
stores one explicit representative alignment directory for all PDB chains in a
100% sequence identity cluster. In order to add explicit alignments for all PDB
chains, this script will add the missing chain directories and symlink them to
their representative alignment directories. This is required in order to train
OpenFold on the full PDB, not just one representative chain per cluster.
"""
from argparse import ArgumentParser
......@@ -52,6 +53,9 @@ def main(alignment_dir: Path, duplicate_chains_file: Path):
with open(duplicate_chains_file, "r") as fp:
duplicate_chains = [list(line.strip().split()) for line in fp]
# convert to absolute path for symlink creation
alignment_dir = alignment_dir.resolve()
create_duplicate_dirs(duplicate_chains, alignment_dir)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment