Commit 8881b210 authored by Gustaf Ahdritz's avatar Gustaf Ahdritz
Browse files

Add RODA flattening script

parent 7e3cd77f
......@@ -326,7 +326,8 @@ script.
If you're using your own MSAs or MSAs from the RODA repository, make sure that
the `alignment_dir` contains one directory per chain and that each of these
contains alignments (.sto, .a3m, and .hhr) corresponding to that chain.
contains alignments (.sto, .a3m, and .hhr) corresponding to that chain. You
can use `scripts/flatten_roda.sh` to reformat RODA downloads in this way.
Note that, despite its variable name, `mmcif_dir` can also contain PDB files
or even ProteinNet .core files. To emulate the AlphaFold training procedure,
......
#!/usr/bin/env sh
#
# Flattens a downloaded RODA database into the format expected by OpenFold
# Args:
# roda_dir:
# The path to the database you want to flatten. E.g. "roda/pdb"
# or "roda/uniclust30". Note that, to save space, this script
# will empty this directory.
# output_dir:
# The directory in which to construct the reformatted data
if [[ $# != 2 ]]; then
echo "usage: ./flatten_roda.sh <roda_dir> <output_dir>"
exit 1
fi
RODA_DIR=$1
OUTPUT_DIR=$2
DATA_DIR="${OUTPUT_DIR}/data"
ALIGNMENT_DIR="${OUTPUT_DIR}/alignments"
mkdir -p "${DATA_DIR}"
mkdir -p "${ALIGNMENT_DIR}"
for chain_dir in $(ls "${RODA_DIR}"); do
CHAIN_DIR_PATH="${RODA_DIR}/${chain_dir}"
for subdir in $(ls "${CHAIN_DIR_PATH}"); do
if [[ $subdir = "pdb" ]] || [[ $subdir = "cif" ]]; then
CHAIN_DATA_DIR="${DATA_DIR}/${chain_dir}"
mkdir -p "${CHAIN_DATA_DIR}"
mv "${CHAIN_DIR_PATH}/${subdir}"/* "${CHAIN_DATA_DIR}"
else
CHAIN_ALIGNMENT_DIR="${ALIGNMENT_DIR}/${chain_dir}"
mkdir -p "${CHAIN_ALIGNMENT_DIR}"
mv "${CHAIN_DIR_PATH}/${subdir}"/* "${CHAIN_ALIGNMENT_DIR}"
fi
done
done
NO_DATA_FILES=$(find "${DATA_DIR}" -type f | wc -l)
if [[ $NO_DATA_FILES = 0 ]]; then
rm -rf ${DATA_DIR}
fi
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment