"test/split_k.cpp" did not exist on "6d4450ef155c39af9ede2cd171be40ee06db9939"
FP16 data in-register transpose (#41)
* start fixing 16bit data packing * adding StaticTensor * adding StaticTensor * adding StaticTensor * add missing constexpr * adding static tensor * adding static tensor * adding transpose * add inline asm for transpose 2x2 of half_t * add general transpose_vectors(), but have unnecessary register initialization using v_mov * fix unnecessary register initialization in transpose_vector by using more pass-by-reference * add hardcoded logic for NHWC wrw * improve asm for v_pack * make ThreadwiseTensorSliceTransfer_v3r2 support any tensor * tweak * reorganize file
Showing
script/profile_conv.sh
0 → 100755
script/profile_gemm.sh
0 → 100755
Please register or sign in to comment