Introduce nvte_memset to provide a fill kernel that is faster than...
Introduce nvte_memset to provide a fill kernel that is faster than cudaMemsetAsync for small sizes (#1716) * nvte_memset fills single float value Signed-off-by:Jeremy Berchtold <jberchtold@nvidia.com> * Support larger sizes than a single value and add tests Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> --------- Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com>
Showing
Please register or sign in to comment