FAQ.md 1.97 KB
Newer Older
jnwei's avatar
jnwei committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# FAQ

Frequently asked questions or encountered issues in OpenFold Setup and training.

## Setup

- In the unit tests, I see an error such as  
	```
	ImportError: version GLIBCXX_3.4.30 not found
	```

	> Solution: Make sure that the `$LD_LIBRARY_PATH` environment has been set to include the conda path, e.g. `export $LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH`

- I see a CUDA mismatch error, eg. 
```
The detected CUDA version (11.8) mismatches the version that was used to compile
PyTorch (12.1). Please make sure to use the same CUDA versions.
```
 
 > 	Solution: Ensure that your system's CUDA driver and toolkit match your intended OpenFold installation (CUDA 11 by default).  You can check the CUDA driver version with a command such as `nvidia-smi`

- I get some error involving `fatal error: cuda_runtime.h: No such file or directory` and or `ninja: build stopped: subcommand failed.`. 

> Solution: Something went wrong with setting up some of the custom kernels. Try running `install_third_party_dependencies.sh` again or try `python3 setup.py install` from inside the OpenFold folder. Make sure to prepend the conda environment as described above before running this.

## Training

- My model training is hanging on the data loading step:
	 > Solution: While each system is different, a few general suggestions:
		 - Check your `$KMP_AFFINITY` environment setting and see if it is suitable for your system.
  		 - Adjust the number of data workers used to prepare data with the `--num_workers` setting. Increasing the number could help with dataset processing speed. However, to many workers could cause an OOM issue. 

- When I reload my pretrained model weights or checkpoints, I get `RuntimeError: Error(s) in loading state_dict for OpenFoldWrapper: Unexpected key(s) in state_dict:`
	> Solution: This suggests that your checkpoint / model weights are in OpenFold v1 format with outdated model layer names. Convert your weights/checkpoints following [this guide](convert_of_v1_weights.md).