"src/include/blockwise_2d_tensor_op.hpp" did not exist on "03eef73c5be07a1e02c090eacd24f0a9f6aa850e"
refactored deviceBatchedGemm; removed GridwiseBatchedGemm; added fp32 and int8 to profiler (#120)
changed long_index_t to index_t when computing memory offset uncomment other ops in profiler added test for batched_gemm
Showing
Please register or sign in to comment