[macOS GPU Support] Tune dispatching of persistent threads for Apple silicon GPUs (#3978)
* Use 768 instead of 384 threads in generic kernels * Use 1536 instead of 1024 threads in force kernels.
Showing
Please register or sign in to comment