<!-- Please provide a short and cleaer describtion of the bug. What happens? What should happen? If applicable, please include the full error message. -->
<!-- Please read the FAQ and comming Issues section in the Readme before opening an Issue.
Please provide a short and cleaer describtion of the bug. What happens? What should happen? If applicable, please include the full error message. -->
### Environment
### Environment
Please provide some information about the used environment.
Please provide some information about the used environment.
System NVCC: nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Aug_15_21:14:11_PDT_2021
Cuda compilation tools, release 11.4, V11.4.120
Build cuda_11.4.r11.4/compiler.30300941_0
System Arch List: None
System OMP_NUM_THREADS: 1
System CUDA_HOME is None: True
System CPU Count: 8
Python Version: 3.8.11 (default, Aug 3 2021, 15:09:35)
[GCC 7.5.0]
----- nnDetection Information -----
det_num_threads 6
det_data is set True
det_models is set True
```
Things to look out for:
Make sure that the versions of PyTorch CUDA and NVCC CUDA match (minor version mismatch as in this case, will work without error but could potentially introduce bugs.)
`OMP_NUM_THREADS` should always be set to 1 and `det_num_threads` should always be lower or equal `Systemm CPU Count`.
2. Error persists even after fixing the environment
Make sure to delete the `build` folder before rerunning the installation since it won't recompile the code otherwise.
3. Error: No kernel image is available for execution
You are probably executing the build on a machine with a GPU architecture which was not present/set during the build.
Please check [link](https://developer.nvidia.com/cuda-gpus) to find the correct SM architecture and set `TORCH_CUDA_ARCH_LIST`
approriately (e.g. check Dockefile for example).
As before make sure to delete the `build` folder when rerunning the installation process.
3. Please open an Issue and provide your environment as obtained by `nndet_env`.
</details>
</details>
<detailsclose>
<detailsclose>
<summary>Error: Undefined CUDA symbols when importing `nndet._C`</summary>
<summary>Training doesn't start and is stuck</summary>
<br>
<br>
Please double check CUDA version of your PC, pytorch, torchvision and nnDetection build!
Follow the installation instruction at the beginning!
1. Please run `nndet_env` and make sure `OMP_NUM_THREADS` is set to 1. No other values are supported here. To increase the number of workers used for IO and augmentation adjust `nndet_num_threads`.
2. Try running the training without multiprocessing as a sanity check: `nndet_train XXX -o augment_cfg.multiprocessing=False`. Don't use this for the full training, this is just one step of the debugging process.
3. Please open an Issue and provide your environment as obtained by `nndet_env` and report if the training without multiprocessing started correctly.
</details>
</details>
<detailsclose>
<detailsclose>
<summary>Error: No kernel image is available for execution"</summary>
<summary>GPU requirements</summary>
<br>
<br>
You are probably executing the build on a machine with a GPU architecture which was not present/set during the build.
nnDetection v0.1 was developed for GPUs with at least 11GB of VRAM (e.g. RTX2080TI, TITAN RTX).
All of our experiments were conducted with a RTX2080TI.
Please check [link](https://developer.nvidia.com/cuda-gpus) to find the correct SM architecture and set `TORCH_CUDA_ARCH_LIST`
While the memory can be adjusted by manipulating the correct setting we recommend using the default values for now.
approriately (e.g. check Dockefile for example).
Future releases will refactor the planning stage to improve the VRAM estimation and add support for different memory budgets.