-
Shangyan Zhou authored
* Fix hidden_size % 128 != 0 * Add `align_down()` function * Use the full warp to wait TMA store * Support arbitrary hidden sizes in fp8 cast * lint
abba6add
* Fix hidden_size % 128 != 0 * Add `align_down()` function * Use the full warp to wait TMA store * Support arbitrary hidden sizes in fp8 cast * lint