"vscode:/vscode.git/clone" did not exist on "1dde34e0f8f1021103e62fdf81ae7e89cce5aaf2"
Use 4B vector loads/stores in cast-transpose kernel for small matrices (#101)
Signed-off-by:
Tim Moon <tmoon@nvidia.com>
Showing
Please register or sign in to comment