-
Jeff Daily authored
* first step, everything compiles * fix rebuilds; skip cuda version check for rocm * use macro for __shfl_up_sync __shfl_down_sync * add BFloat16 support for ROCm and CUDA * add USE_ROCM definition to setup.py * flake8 fixes
9a651d91