".github/vscode:/vscode.git/clone" did not exist on "2df6905edee6fb2eeb6227017008c2b2732782f4"
  • Paul Fultz II's avatar
    Add lane reduction (#1180) · 4c72cc95
    Paul Fultz II authored
    With reductions such as {2048, 2, 1456} on axes 1, this is 23x faster than using our new block_reduce, and its even over 100x faster than our original reduce_sum:
    
    # lane
    gpu::code_object[code_object=13736,symbol_name=kernel,global=2981888,local=1024,]: 0.0672928ms
    # block
    gpu::code_object[code_object=13800,symbol_name=kernel,global=39321600,local=64,]: 1.46072ms
    # original
    gpu::reduce_sum[axes={1}]: 6.73456ms
    There is some basic logic to pick between lane and block reduce automatically.
    4c72cc95
reduce.cpp 5.66 KB