* support fmha * update sm by cudaarch * update ldscript path * clang-format * clang-format ---------
* add ft code * gitignore * fix lint * revert fmha