* disable cache hint for CUDA < 11.4 * fix lint * fix lint * fix cuda-11.3 build
* build turbomind * change namespace fastertransformer to turbomind * change logger name
* temp * fix lint * csrc->src * remove clang-format * skip .rst * skip doc * clang-format version version * mat_B
* add ft code * gitignore * fix lint * revert fmha