Optimizations to CPU nonbonded forces: better load balancing between threads, use linear splines instead of cubic