* Add getAvailableComputeUnitCount() interface * Use available number of compute units to set kernel grid size