"...git@developer.sourcefind.cn:OpenDAS/torch-spline-conv.git" did not exist on "774be3b9d2815e13cc9b9dc18fd31cd7eae354ae"
FP16 data in-register transpose (#41)
* start fixing 16bit data packing * adding StaticTensor * adding StaticTensor * adding StaticTensor * add missing constexpr * adding static tensor * adding static tensor * adding transpose * add inline asm for transpose 2x2 of half_t * add general transpose_vectors(), but have unnecessary register initialization using v_mov * fix unnecessary register initialization in transpose_vector by using more pass-by-reference * add hardcoded logic for NHWC wrw * improve asm for v_pack * make ThreadwiseTensorSliceTransfer_v3r2 support any tensor * tweak * reorganize file
Showing
This diff is collapsed.
Please register or sign in to comment