Commit 0b5c9492 authored by Lukas Jarosch's avatar Lukas Jarosch
Browse files

Give script more descriptive name

parent e2479cb5
""" """
The RODA database is non-redundant, meaning that it only stores one explicit The OpenProteinSet alignment database is non-redundant, meaning that it only
representative alignment directory for all PDB chains in a 100% sequence stores one explicit representative alignment directory for all PDB chains in a
identity cluster. In order to add explicit alignments for all PDB chains, this 100% sequence identity cluster. In order to add explicit alignments for all PDB
script will add the missing chain directories and symlink them to their chains, this script will add the missing chain directories and symlink them to
representative alignment directories. their representative alignment directories. This is required in order to train
OpenFold on the full PDB, not just one representative chain per cluster.
""" """
from argparse import ArgumentParser from argparse import ArgumentParser
...@@ -52,6 +53,9 @@ def main(alignment_dir: Path, duplicate_chains_file: Path): ...@@ -52,6 +53,9 @@ def main(alignment_dir: Path, duplicate_chains_file: Path):
with open(duplicate_chains_file, "r") as fp: with open(duplicate_chains_file, "r") as fp:
duplicate_chains = [list(line.strip().split()) for line in fp] duplicate_chains = [list(line.strip().split()) for line in fp]
# convert to absolute path for symlink creation
alignment_dir = alignment_dir.resolve()
create_duplicate_dirs(duplicate_chains, alignment_dir) create_duplicate_dirs(duplicate_chains, alignment_dir)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment