Unverified Commit 5fc80134 authored by Dingquan Yu's avatar Dingquan Yu Committed by GitHub
Browse files

Update README.md

parent a1ef4c8f
# Permutation code README # Permutation code README
## Overview: ## Overview: before running training script
NB: before running the test codes,please download the procrustes package first: NB: before running the test codes,please download the procrustes package first:
from https://github.com/theochem/procrustes from https://github.com/theochem/procrustes
To test the permutation codes: Make sure that the product of running ```scripts/generate_mmcif_cache.py``` is ready and available in ```tests/test_data```
I have uploaded the json file to owncloud [here](https://oc.embl.de/index.php/s/wVUwc1IHiJUt9sP)
To test the train multimer codes:
```bash ```bash
python -m unittest tests/test_permutation.py python3 train_openfold.py /g/alphafold/AlphaFold_DBs/pdb_mmcif/mmcif_files/ \
tests/test_data/alignments/ \
/g/alphafold/AlphaFold_DBs/pdb_mmcif/mmcif_files/ \
/scratch/gyu/train_openfold_output \
2500-01-01 \
--train_mmcif_data_cache_path=/tests/test_data/train_mmcifs_cache.json \
--template_release_dates_cache_path=tests/test_data/mmcif_cache.json \
--config_preset=model_1_multimer_v3 --seed=42 --gpus=1
``` ```
The files that has been changed is:
[```openfold/utils/loss.py```](https://github.com/dingquanyu/openfold/blob/permutation/openfold/utils/loss.py), in which the forward function is modified in
original ```AlphaFoldLoss``` class;
create a child class called ```AlphaFoldMultimerLoss``` that not only inherited all the loss calculations but also
has multi-chain permutation codes;
some loss calculations have to be modified e.g. in ```fape``` loss, ```tm``` loss calculations, an extra validation was added to check if the input tensor belongs to tensor_7 or tensor_4*4 for example : https://github.com/dingquanyu/openfold/blob/02b008dc4b8c2e9e680826444c605297eeb9ffb4/openfold/utils/loss.py#L190-L193 Unlike training monomer, chain_cache_data is not required but the train_mmcifs_cache is required. In this case, I selected these 9 mmcifs that are already in the previous test_data folder as a training set. ```./tests/test_data/train_mmcifs_cache.json``` in the command above record the information of these 9 structures and is needed to run the training code.
[```openfold/config.py```](https://github.com/dingquanyu/openfold/blob/permutation/openfold/config.py) has seen a couple of modifications as well. Some namings were wrong and previous script forgot to update config.loss with multimer_model_config_update ## Issues
Testing the codes on cpu works fine but when running it on a gpu, it causes ```RuntimeError: CUDA error: device-side assert triggered``` at unexpected steps.
For example, this error was raised while calculating the best rotation matrix that aligns selected anchors during multi-chain permutation steps, I have to use
```torch.masked_select``` and ```torch.index_select``` in https://github.com/dingquanyu/openfold/blob/a1ef4c8fa99da5cff9501051de71be440ca3cedf/openfold/utils/loss.py#L2043 and https://github.com/dingquanyu/openfold/blob/a1ef4c8fa99da5cff9501051de71be440ca3cedf/openfold/utils/loss.py#L2060 instead of simply slicing the matrix like ```matrix[index]```.
These files are newly added: Later on the same ```CUDA error: device-side assert triggered``` error was raised while adding dimensions to the ```atom_pred_positions``` in https://github.com/dingquanyu/openfold/blob/a1ef4c8fa99da5cff9501051de71be440ca3cedf/openfold/utils/loss.py#L989
[```tests/test_permutation.py```](https://github.com/dingquanyu/openfold/blob/permutation/tests/test_permutation.py): A unittest script
that tests permutation functions.
[```tests/test_data/label_1.pkl```](https://github.com/dingquanyu/openfold/blob/permutation/tests/test_data/label_1.pkl) I've dumped the matrices in a pickle and load them individually outside the programme to a GPU then the indexing steps worked without the CUDA error.
and [```tests/test_data/label_2.pkl```](https://github.com/dingquanyu/openfold/blob/permutation/tests/test_data/label_2.pkl) are 2 fake ground truth structures.
```label_1.pkl``` has 9 residues and ```label_2.pkl``` has 13 residues
### Notes
29/06/23 Fill NaN in the lddt scores with the matrix mean for now because the test data are randomly generated and it gives NaN in the lddt score somehow.
**Delete** this step before merging to Multimer branch
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment