refactored deviceBatchedGemm; removed GridwiseBatchedGemm; added fp32 and int8 to profiler
changed long_index_t to index_t when computing memory offset uncomment other ops in profiler added test for batched_gemm
Showing
Please register or sign in to comment